[jira] [Commented] (HADOOP-9103) UTF8 class does not properly decode Unicode characters outside the basic multilingual plane
[ https://issues.apache.org/jira/browse/HADOOP-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506325#comment-13506325 ] Colin Patrick McCabe commented on HADOOP-9103: -- I said: bq. since we always encode/decode using hadoop.io.UTF8, and never anything else, there should be no problem... I take this back; looks like we don't always encode/decode using {{hadoop.io.UTF8}}. D'oh! bq. Attached patch should fix this issue. Nice. Should we test for rejecting 5-byte and 6-byte sequences, since I notice you added some code to do that? I'm also a little scared by the idea that we have differently-encoded byte[] running around for the same file name strings. We have to be very careful about this. Unfortunately, we can't change the decoder to emit real UTF-8 (rather than CESU-8) without making a backwards-incompatible change, since as INode.java reminds us, {code} * The name in HdfsFileStatus should keep the same encoding as this. * if this encoding is changed, implicitly getFileInfo and listStatus in * clientProtocol are changed; The decoding at the client * side should change accordingly. {code} I also wonder if this means that we need to hunt down all the places not using CESU-8. Otherwise older clients are just not going to work with astral plane code points, even after this fix... However, we could do that in a separate JIRA, not here. UTF8 class does not properly decode Unicode characters outside the basic multilingual plane --- Key: HADOOP-9103 URL: https://issues.apache.org/jira/browse/HADOOP-9103 Project: Hadoop Common Issue Type: Bug Components: io Affects Versions: 0.20.1 Environment: SUSE LINUX Reporter: yixiaohua Assignee: Todd Lipcon Attachments: FSImage.java, hadoop-9103.txt, ProblemString.txt, TestUTF8AndStringGetBytes.java, TestUTF8AndStringGetBytes.java Original Estimate: 12h Remaining Estimate: 12h this the log information of the exception from the SecondaryNameNode: 2012-03-28 00:48:42,553 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: java.io.IOException: Found lease for non-existent file /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/@??? ??tor.qzone.qq.com/keypart-00174 at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFilesUnderConstruction(FSImage.java:1211) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:959) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:589) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$000(SecondaryNameNode.java:473) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:350) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:314) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:225) at java.lang.Thread.run(Thread.java:619) this is the log information about the file from namenode: 2012-03-28 00:32:26,528 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=boss,boss ip=/10.131.16.34cmd=create src=/user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/ @?tor.qzone.qq.com/keypart-00174 dst=null perm=boss:boss:rw-r--r-- 2012-03-28 00:37:42,387 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.allocateBlock: /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/ @?tor.qzone.qq.com/keypart-00174. blk_2751836614265659170_184668759 2012-03-28 00:37:42,696 INFO org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.completeFile: file /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/ @?tor.qzone.qq.com/keypart-00174 is closed by DFSClient_attempt_201203271849_0016_r_000174_0 2012-03-28 00:37:50,315 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=boss,boss ip=/10.131.16.34cmd=rename src=/user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/ @?tor.qzone.qq.com/keypart-00174 dst=/user/boss/pgv/fission/task16/split/ @? tor.qzone.qq.com/keypart-00174 perm=boss:boss:rw-r--r-- after check the code that save FSImage,I found there are a problem that maybe a bug of HDFS Code,I past below: -this is the saveFSImage method in FSImage.java, I make some mark at the problem
[jira] [Created] (HADOOP-9104) Should org.apache.hadoop.fs.DelegationTokenRenewer.addRenewAction(T) reject addition if a renew action for this FS is already present in the queue?
Ivan A. Veselovsky created HADOOP-9104: -- Summary: Should org.apache.hadoop.fs.DelegationTokenRenewer.addRenewAction(T) reject addition if a renew action for this FS is already present in the queue? Key: HADOOP-9104 URL: https://issues.apache.org/jira/browse/HADOOP-9104 Project: Hadoop Common Issue Type: Improvement Reporter: Ivan A. Veselovsky The issue extrected from discussion in https://issues.apache.org/jira/browse/HADOOP-9046 . Currently the method org.apache.hadoop.fs.DelegationTokenRenewer.addRenewAction(T) allows to add any number of renew actions for the same FS. Question #1: are there real usecases when this can make sense? Also, when we remove a renew action with org.apache.hadoop.fs.DelegationTokenRenewer.removeRenewAction(T), we iterate over all the actions in the queue, and remove the first one with matching FS, if any. So, in case if several actions submitted for the same FS, not more than one action will be removed upon #removeRenewAction() invocation. So, to remove all them a developer will need a cycle. So, if the answer to the question #1 is true, may be we should change the #removeRenewAction(FS) behavior to remove all actions associated with this FS, or add #removeAllRenewActuions(FS)? This is question #2. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-9104) Should org.apache.hadoop.fs.DelegationTokenRenewer.addRenewAction(T) reject addition if a renew action for this FS is already present in the queue?
[ https://issues.apache.org/jira/browse/HADOOP-9104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan A. Veselovsky updated HADOOP-9104: --- Description: The issue extracted from discussion in https://issues.apache.org/jira/browse/HADOOP-9046 . Currently the method org.apache.hadoop.fs.DelegationTokenRenewer.addRenewAction(T) allows to add any number of renew actions for the same FS. Question #1: are there real use-cases when that makes sense? Also, when we remove a renew action with org.apache.hadoop.fs.DelegationTokenRenewer.removeRenewAction(T), we iterate over all the actions in the queue, and remove the first one with matching FS, if any. So, in case if several actions submitted for the same FS, not more than one action will be removed upon #removeRenewAction() invocation. So, to remove all them for a given FS, a developer will need a cycle. So, if the answer to the question #1 is true, may be we should change the #removeRenewAction(FS) behavior to remove all actions associated with this FS, or add #removeAllRenewActuions(FS)? This is question #2. was: The issue extrected from discussion in https://issues.apache.org/jira/browse/HADOOP-9046 . Currently the method org.apache.hadoop.fs.DelegationTokenRenewer.addRenewAction(T) allows to add any number of renew actions for the same FS. Question #1: are there real usecases when this can make sense? Also, when we remove a renew action with org.apache.hadoop.fs.DelegationTokenRenewer.removeRenewAction(T), we iterate over all the actions in the queue, and remove the first one with matching FS, if any. So, in case if several actions submitted for the same FS, not more than one action will be removed upon #removeRenewAction() invocation. So, to remove all them a developer will need a cycle. So, if the answer to the question #1 is true, may be we should change the #removeRenewAction(FS) behavior to remove all actions associated with this FS, or add #removeAllRenewActuions(FS)? This is question #2. Should org.apache.hadoop.fs.DelegationTokenRenewer.addRenewAction(T) reject addition if a renew action for this FS is already present in the queue? --- Key: HADOOP-9104 URL: https://issues.apache.org/jira/browse/HADOOP-9104 Project: Hadoop Common Issue Type: Improvement Reporter: Ivan A. Veselovsky The issue extracted from discussion in https://issues.apache.org/jira/browse/HADOOP-9046 . Currently the method org.apache.hadoop.fs.DelegationTokenRenewer.addRenewAction(T) allows to add any number of renew actions for the same FS. Question #1: are there real use-cases when that makes sense? Also, when we remove a renew action with org.apache.hadoop.fs.DelegationTokenRenewer.removeRenewAction(T), we iterate over all the actions in the queue, and remove the first one with matching FS, if any. So, in case if several actions submitted for the same FS, not more than one action will be removed upon #removeRenewAction() invocation. So, to remove all them for a given FS, a developer will need a cycle. So, if the answer to the question #1 is true, may be we should change the #removeRenewAction(FS) behavior to remove all actions associated with this FS, or add #removeAllRenewActuions(FS)? This is question #2. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9046) provide unit-test coverage of class org.apache.hadoop.fs.DelegationTokenRenewer.RenewActionT
[ https://issues.apache.org/jira/browse/HADOOP-9046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506447#comment-13506447 ] Ivan A. Veselovsky commented on HADOOP-9046: Hi, Robert, thanks for the comments. 1. Created separate Jira https://issues.apache.org/jira/browse/HADOOP-9104 . The TODO comment is removed. 2. Renamed: lock0 - queueLock, available0 - queueContentChangedCondition. 3. The token cancellation upon removal was introduced in HADOOP-9084, and it appeared to be acsidently overwritten by my changes. I returned those changes back and also added relevant checking to the test. Thanks for this catch. 4. I fixed the problem using java.lang.Thread.getState() method: now, first, we start the thread if needed, and, 2nd, we check if it already died. If the thread is dead, we throw IllegalStateException. This way (1) the thread never attempts to start twice, and (2) any attempt to add an action to the dead thread is rejected. I also added into the test the check to verify that this really the case. The described changes are in patches xxx--d.patch. provide unit-test coverage of class org.apache.hadoop.fs.DelegationTokenRenewer.RenewActionT -- Key: HADOOP-9046 URL: https://issues.apache.org/jira/browse/HADOOP-9046 Project: Hadoop Common Issue Type: Test Affects Versions: 3.0.0, 2.0.3-alpha, 0.23.6 Reporter: Ivan A. Veselovsky Assignee: Ivan A. Veselovsky Priority: Minor Attachments: HADOOP-9046-branch-0.23--c.patch, HADOOP-9046-branch-0.23-over-9049.patch, HADOOP-9046-branch-0.23.patch, HADOOP-9046--c.patch, HADOOP-9046--d.patch, HADOOP-9046-over-9049.patch, HADOOP-9046.patch The class org.apache.hadoop.fs.DelegationTokenRenewer.RenewActionT has zero coverage in entire cumulative test run. Provide test(s) to cover this class. Note: the request submitted to HDFS project because the class likely to be tested by tests in that project. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-9046) provide unit-test coverage of class org.apache.hadoop.fs.DelegationTokenRenewer.RenewActionT
[ https://issues.apache.org/jira/browse/HADOOP-9046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan A. Veselovsky updated HADOOP-9046: --- Attachment: HADOOP-9046--d.patch provide unit-test coverage of class org.apache.hadoop.fs.DelegationTokenRenewer.RenewActionT -- Key: HADOOP-9046 URL: https://issues.apache.org/jira/browse/HADOOP-9046 Project: Hadoop Common Issue Type: Test Affects Versions: 3.0.0, 2.0.3-alpha, 0.23.6 Reporter: Ivan A. Veselovsky Assignee: Ivan A. Veselovsky Priority: Minor Attachments: HADOOP-9046-branch-0.23--c.patch, HADOOP-9046-branch-0.23-over-9049.patch, HADOOP-9046-branch-0.23.patch, HADOOP-9046--c.patch, HADOOP-9046--d.patch, HADOOP-9046-over-9049.patch, HADOOP-9046.patch The class org.apache.hadoop.fs.DelegationTokenRenewer.RenewActionT has zero coverage in entire cumulative test run. Provide test(s) to cover this class. Note: the request submitted to HDFS project because the class likely to be tested by tests in that project. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-9046) provide unit-test coverage of class org.apache.hadoop.fs.DelegationTokenRenewer.RenewActionT
[ https://issues.apache.org/jira/browse/HADOOP-9046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan A. Veselovsky updated HADOOP-9046: --- Attachment: HADOOP-9046-branch-0.23--d.patch the patch HADOOP-9046-branch-0.23--d.patch provides version d of this change for branch branch-0.23. provide unit-test coverage of class org.apache.hadoop.fs.DelegationTokenRenewer.RenewActionT -- Key: HADOOP-9046 URL: https://issues.apache.org/jira/browse/HADOOP-9046 Project: Hadoop Common Issue Type: Test Affects Versions: 3.0.0, 2.0.3-alpha, 0.23.6 Reporter: Ivan A. Veselovsky Assignee: Ivan A. Veselovsky Priority: Minor Attachments: HADOOP-9046-branch-0.23--c.patch, HADOOP-9046-branch-0.23--d.patch, HADOOP-9046-branch-0.23-over-9049.patch, HADOOP-9046-branch-0.23.patch, HADOOP-9046--c.patch, HADOOP-9046--d.patch, HADOOP-9046-over-9049.patch, HADOOP-9046.patch The class org.apache.hadoop.fs.DelegationTokenRenewer.RenewActionT has zero coverage in entire cumulative test run. Provide test(s) to cover this class. Note: the request submitted to HDFS project because the class likely to be tested by tests in that project. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-9105) FsShell -moreFromLocal erroneously fails
Daryn Sharp created HADOOP-9105: --- Summary: FsShell -moreFromLocal erroneously fails Key: HADOOP-9105 URL: https://issues.apache.org/jira/browse/HADOOP-9105 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 2.0.0-alpha, 0.23.0, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp The move successfully completes, but then reports error upon trying to delete the local source directory even though it succeeded. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8615) EOFException in DecompressorStream.java needs to be more verbose
[ https://issues.apache.org/jira/browse/HADOOP-8615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506498#comment-13506498 ] thomastechs commented on HADOOP-8615: - Hi, Please treat this as a gentle reminder on further procedures. Thanks, Thomas. EOFException in DecompressorStream.java needs to be more verbose Key: HADOOP-8615 URL: https://issues.apache.org/jira/browse/HADOOP-8615 Project: Hadoop Common Issue Type: Bug Components: io Affects Versions: 0.20.2 Reporter: Jeff Lord Labels: patch Attachments: HADOOP-8615.patch, HADOOP-8615-release-0.20.2.patch, HADOOP-8615-ver2.patch, HADOOP-8615-ver3.patch In ./src/core/org/apache/hadoop/io/compress/DecompressorStream.java The following exception should at least pass back the file that it encounters this error in relation to: protected void getCompressedData() throws IOException { checkStream(); int n = in.read(buffer, 0, buffer.length); if (n == -1) { throw new EOFException(Unexpected end of input stream); } This would help greatly to debug bad/corrupt files. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-9105) FsShell -moveFromLocal erroneously fails
[ https://issues.apache.org/jira/browse/HADOOP-9105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated HADOOP-9105: --- Summary: FsShell -moveFromLocal erroneously fails (was: FsShell -moreFromLocal erroneously fails) FsShell -moveFromLocal erroneously fails Key: HADOOP-9105 URL: https://issues.apache.org/jira/browse/HADOOP-9105 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp The move successfully completes, but then reports error upon trying to delete the local source directory even though it succeeded. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9090) Refactor MetricsSystemImpl to allow for an on-demand publish system
[ https://issues.apache.org/jira/browse/HADOOP-9090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506582#comment-13506582 ] Ravi Prakash commented on HADOOP-9090: -- This could be very useful. Thanks for taking this up Mostafa. Minor nit. In MetricsSystem.java: {code} public abstract void publishMetricsNow(); {code} IMHO we shouldn't put that method that high in the heirarchy. How would implementations of MetricsSystem not concerned with real-time implement this method? Does the description of this JIRA need an update? The classes aren't abstract after your patch. Otherwise code looks good to me. Refactor MetricsSystemImpl to allow for an on-demand publish system --- Key: HADOOP-9090 URL: https://issues.apache.org/jira/browse/HADOOP-9090 Project: Hadoop Common Issue Type: New Feature Components: metrics Reporter: Mostafa Elhemali Priority: Minor Attachments: HADOOP-9090.2.patch, HADOOP-9090.justEnhanceDefaultImpl.2.patch, HADOOP-9090.justEnhanceDefaultImpl.3.patch, HADOOP-9090.justEnhanceDefaultImpl.4.patch, HADOOP-9090.justEnhanceDefaultImpl.patch, HADOOP-9090.patch We have a need to publish metrics out of some short-living processes, which is not really well-suited to the current metrics system implementation which periodically publishes metrics asynchronously (a behavior that works great for long-living processes). Of course I could write my own metrics system, but it seems like such a waste to rewrite all the awesome code currently in the MetricsSystemImpl and supporting classes. The way I'm proposing to solve this is to: 1. Refactor the MetricsSystemImpl class into an abstract base MetricsSystemImpl class (common configuration and other code) and a concrete PeriodicPublishMetricsSystemImpl class (timer thread). 2. Refactor the MetricsSinkAdapter class into an abstract base MetricsSinkAdapter class (common configuration and other code) and a concrete AsyncMetricsSinkAdapter class (asynchronous publishing using the SinkQueue). 3. Derive a new simple class OnDemandPublishMetricsSystemImpl from MetricsSystemImpl, that just exposes a synchronous publish() method to do all the work. 4. Derive a SyncMetricsSinkAdapter class from MetricsSinkAdapter to just synchronously push metrics to the underlying sink. Does that sound reasonable? I'll attach the patch with all this coded up and simple tests (could use some polish I guess, but wanted to get everyone's opinion first). Notice that this is somewhat of a breaking change since MetricsSystemImpl is public (although it's marked with InterfaceAudience.Private); if the breaking change is a problem I could just rename the refactored classes so that PeriodicPublishMetricsSystemImpl is still called MetricsSystemImpl (and MetricsSystemImpl - BaseMetricsSystemImpl). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-9090) Refactor MetricsSystemImpl to allow for an on-demand publish system
[ https://issues.apache.org/jira/browse/HADOOP-9090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Elhemali updated HADOOP-9090: - Description: Updated description based on feedback: We have a need to publish metrics out of some short-living processes, which is not really well-suited to the current metrics system implementation which periodically publishes metrics asynchronously (a behavior that works great for long-living processes). Of course I could write my own metrics system, but it seems like such a waste to rewrite all the awesome code currently in the MetricsSystemImpl and supporting classes. The way this JIRA solves this problem is adding a new method publishMetricsNow() to the MetricsSystemImpl() class, that does a synchronous out-of-band push of the metrics from the sources to the sink. I also add a method to MetricsSinkAdapter (putMetricsImmediate) to support that change. was: We have a need to publish metrics out of some short-living processes, which is not really well-suited to the current metrics system implementation which periodically publishes metrics asynchronously (a behavior that works great for long-living processes). Of course I could write my own metrics system, but it seems like such a waste to rewrite all the awesome code currently in the MetricsSystemImpl and supporting classes. The way I'm proposing to solve this is to: 1. Refactor the MetricsSystemImpl class into an abstract base MetricsSystemImpl class (common configuration and other code) and a concrete PeriodicPublishMetricsSystemImpl class (timer thread). 2. Refactor the MetricsSinkAdapter class into an abstract base MetricsSinkAdapter class (common configuration and other code) and a concrete AsyncMetricsSinkAdapter class (asynchronous publishing using the SinkQueue). 3. Derive a new simple class OnDemandPublishMetricsSystemImpl from MetricsSystemImpl, that just exposes a synchronous publish() method to do all the work. 4. Derive a SyncMetricsSinkAdapter class from MetricsSinkAdapter to just synchronously push metrics to the underlying sink. Does that sound reasonable? I'll attach the patch with all this coded up and simple tests (could use some polish I guess, but wanted to get everyone's opinion first). Notice that this is somewhat of a breaking change since MetricsSystemImpl is public (although it's marked with InterfaceAudience.Private); if the breaking change is a problem I could just rename the refactored classes so that PeriodicPublishMetricsSystemImpl is still called MetricsSystemImpl (and MetricsSystemImpl - BaseMetricsSystemImpl). Refactor MetricsSystemImpl to allow for an on-demand publish system --- Key: HADOOP-9090 URL: https://issues.apache.org/jira/browse/HADOOP-9090 Project: Hadoop Common Issue Type: New Feature Components: metrics Reporter: Mostafa Elhemali Priority: Minor Attachments: HADOOP-9090.2.patch, HADOOP-9090.justEnhanceDefaultImpl.2.patch, HADOOP-9090.justEnhanceDefaultImpl.3.patch, HADOOP-9090.justEnhanceDefaultImpl.4.patch, HADOOP-9090.justEnhanceDefaultImpl.patch, HADOOP-9090.patch Updated description based on feedback: We have a need to publish metrics out of some short-living processes, which is not really well-suited to the current metrics system implementation which periodically publishes metrics asynchronously (a behavior that works great for long-living processes). Of course I could write my own metrics system, but it seems like such a waste to rewrite all the awesome code currently in the MetricsSystemImpl and supporting classes. The way this JIRA solves this problem is adding a new method publishMetricsNow() to the MetricsSystemImpl() class, that does a synchronous out-of-band push of the metrics from the sources to the sink. I also add a method to MetricsSinkAdapter (putMetricsImmediate) to support that change. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8615) EOFException in DecompressorStream.java needs to be more verbose
[ https://issues.apache.org/jira/browse/HADOOP-8615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506611#comment-13506611 ] Harsh J commented on HADOOP-8615: - Hi, I noted that this changes the CompressionCodec interface, which would make it an incompatible change for its users (as older code, downstream, would fail to compile as they now may be missing a few method implementations). Is it absolutely necessary to break compatibility to have just some information over this exception? EOFException in DecompressorStream.java needs to be more verbose Key: HADOOP-8615 URL: https://issues.apache.org/jira/browse/HADOOP-8615 Project: Hadoop Common Issue Type: Bug Components: io Affects Versions: 0.20.2 Reporter: Jeff Lord Labels: patch Attachments: HADOOP-8615.patch, HADOOP-8615-release-0.20.2.patch, HADOOP-8615-ver2.patch, HADOOP-8615-ver3.patch In ./src/core/org/apache/hadoop/io/compress/DecompressorStream.java The following exception should at least pass back the file that it encounters this error in relation to: protected void getCompressedData() throws IOException { checkStream(); int n = in.read(buffer, 0, buffer.length); if (n == -1) { throw new EOFException(Unexpected end of input stream); } This would help greatly to debug bad/corrupt files. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-9090) Refactor MetricsSystemImpl to allow for an on-demand publish system
[ https://issues.apache.org/jira/browse/HADOOP-9090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Elhemali updated HADOOP-9090: - Attachment: HADOOP-9090.justEnhanceDefaultImpl.5.patch Thanks Ravi for the feedback - I knew it was a bit controversial to put this method in the MetricsSystem interface and require it from other systems, but I figured it's the only way for outside customer to really take advantage of this since MetricsSystemImpl is not intended for out-of-Hadoop consumption. Having said that, my immediate need would be met without putting it in the interface so I took that out for now (we can always add it in another explicit JIRA if needed). I've also added a new multi-threaded test in the new patch to make sure everything is alright there. Refactor MetricsSystemImpl to allow for an on-demand publish system --- Key: HADOOP-9090 URL: https://issues.apache.org/jira/browse/HADOOP-9090 Project: Hadoop Common Issue Type: New Feature Components: metrics Reporter: Mostafa Elhemali Priority: Minor Attachments: HADOOP-9090.2.patch, HADOOP-9090.justEnhanceDefaultImpl.2.patch, HADOOP-9090.justEnhanceDefaultImpl.3.patch, HADOOP-9090.justEnhanceDefaultImpl.4.patch, HADOOP-9090.justEnhanceDefaultImpl.5.patch, HADOOP-9090.justEnhanceDefaultImpl.patch, HADOOP-9090.patch Updated description based on feedback: We have a need to publish metrics out of some short-living processes, which is not really well-suited to the current metrics system implementation which periodically publishes metrics asynchronously (a behavior that works great for long-living processes). Of course I could write my own metrics system, but it seems like such a waste to rewrite all the awesome code currently in the MetricsSystemImpl and supporting classes. The way this JIRA solves this problem is adding a new method publishMetricsNow() to the MetricsSystemImpl() class, that does a synchronous out-of-band push of the metrics from the sources to the sink. I also add a method to MetricsSinkAdapter (putMetricsImmediate) to support that change. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-8615) EOFException in DecompressorStream.java needs to be more verbose
[ https://issues.apache.org/jira/browse/HADOOP-8615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated HADOOP-8615: Target Version/s: 3.0.0 (was: 2.0.0-alpha) EOFException in DecompressorStream.java needs to be more verbose Key: HADOOP-8615 URL: https://issues.apache.org/jira/browse/HADOOP-8615 Project: Hadoop Common Issue Type: Bug Components: io Affects Versions: 0.20.2 Reporter: Jeff Lord Labels: patch Attachments: HADOOP-8615.patch, HADOOP-8615-release-0.20.2.patch, HADOOP-8615-ver2.patch, HADOOP-8615-ver3.patch In ./src/core/org/apache/hadoop/io/compress/DecompressorStream.java The following exception should at least pass back the file that it encounters this error in relation to: protected void getCompressedData() throws IOException { checkStream(); int n = in.read(buffer, 0, buffer.length); if (n == -1) { throw new EOFException(Unexpected end of input stream); } This would help greatly to debug bad/corrupt files. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9090) Refactor MetricsSystemImpl to allow for an on-demand publish system
[ https://issues.apache.org/jira/browse/HADOOP-9090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506654#comment-13506654 ] Luke Lu commented on HADOOP-9090: - Adding a publishMetricsNow method to the MetricsSystem is reasonable, as the interface is considered Evolving and the requirement has universal utility (I actually thought about adding it in the beginning but there was no such requirement then). bq. I figured the way you had it may end up in race conditions if multiple threads are calling publishMetricsNow() at the same time. The _sketch_ was meant to be simple and the race is considered harmless: it's ok to potentially exit before one of the metrics buffer that's almost the same time with the last one is flushed. OTOH, if you want to wait for individual metrics buffer you can do the following without a new wrapper: {code} // in putMetricsImediate synchronized(buffer) { buffer.wait(oobTimeout); } // in consume synchronized(buffer) { buffer.notify(); } {code} Refactor MetricsSystemImpl to allow for an on-demand publish system --- Key: HADOOP-9090 URL: https://issues.apache.org/jira/browse/HADOOP-9090 Project: Hadoop Common Issue Type: New Feature Components: metrics Reporter: Mostafa Elhemali Priority: Minor Attachments: HADOOP-9090.2.patch, HADOOP-9090.justEnhanceDefaultImpl.2.patch, HADOOP-9090.justEnhanceDefaultImpl.3.patch, HADOOP-9090.justEnhanceDefaultImpl.4.patch, HADOOP-9090.justEnhanceDefaultImpl.5.patch, HADOOP-9090.justEnhanceDefaultImpl.patch, HADOOP-9090.patch Updated description based on feedback: We have a need to publish metrics out of some short-living processes, which is not really well-suited to the current metrics system implementation which periodically publishes metrics asynchronously (a behavior that works great for long-living processes). Of course I could write my own metrics system, but it seems like such a waste to rewrite all the awesome code currently in the MetricsSystemImpl and supporting classes. The way this JIRA solves this problem is adding a new method publishMetricsNow() to the MetricsSystemImpl() class, that does a synchronous out-of-band push of the metrics from the sources to the sink. I also add a method to MetricsSinkAdapter (putMetricsImmediate) to support that change. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9090) Refactor MetricsSystemImpl to allow for an on-demand publish system
[ https://issues.apache.org/jira/browse/HADOOP-9090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506686#comment-13506686 ] Mostafa Elhemali commented on HADOOP-9090: -- *Point about how to synchronize putMetricsImmediate* Thanks Luke. Yeah I considered waiting on the buffer itself before creating the wrapper, but there are a couple of reasons I didn't end up doing that: 1. (Main reason) The sink doesn't own the buffer object, so it doesn't know who else is waiting on it or notifying it. Seems wrong to presume to wait on it. 2. Object.wait(timeout) doesn't return the result of the wait, so I wouldn't know if that succeeded or failed without additional complex logic. As for the race being harmless: I'm not sure it's that harmless. For all we know the buffers that were just processed from the queue were from ages ago, and the values in the new buffer are completely different. I'd much rather play it safe and give it an honest attempt to publish what I've just been given. So, for the reasons above I'd rather go with the wrapper despite the added code complexity. *Point about putting the method in the interface* OK since me Luke are two votes to put the method in the interface, and Luke made a good point about the interface being evolving, I'll put the method back into the interface in a subsequent patch unless anyone else objects (or Ravi presses the point with other reasons). Thanks all. Refactor MetricsSystemImpl to allow for an on-demand publish system --- Key: HADOOP-9090 URL: https://issues.apache.org/jira/browse/HADOOP-9090 Project: Hadoop Common Issue Type: New Feature Components: metrics Reporter: Mostafa Elhemali Priority: Minor Attachments: HADOOP-9090.2.patch, HADOOP-9090.justEnhanceDefaultImpl.2.patch, HADOOP-9090.justEnhanceDefaultImpl.3.patch, HADOOP-9090.justEnhanceDefaultImpl.4.patch, HADOOP-9090.justEnhanceDefaultImpl.5.patch, HADOOP-9090.justEnhanceDefaultImpl.patch, HADOOP-9090.patch Updated description based on feedback: We have a need to publish metrics out of some short-living processes, which is not really well-suited to the current metrics system implementation which periodically publishes metrics asynchronously (a behavior that works great for long-living processes). Of course I could write my own metrics system, but it seems like such a waste to rewrite all the awesome code currently in the MetricsSystemImpl and supporting classes. The way this JIRA solves this problem is adding a new method publishMetricsNow() to the MetricsSystemImpl() class, that does a synchronous out-of-band push of the metrics from the sources to the sink. I also add a method to MetricsSinkAdapter (putMetricsImmediate) to support that change. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9103) UTF8 class does not properly decode Unicode characters outside the basic multilingual plane
[ https://issues.apache.org/jira/browse/HADOOP-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506727#comment-13506727 ] Todd Lipcon commented on HADOOP-9103: - bq. Nice. Should we test for rejecting 5-byte and 6-byte sequences, since I notice you added some code to do that? I added a test for an invalid sequence. I didn't think it was necessary to add a separate test for a 5-byte sequence, since it would trigger the same invalid code path. Got an example hex sequence you think we should test against? bq. I'm also a little scared by the idea that we have differently-encoded byte[] running around for the same file name strings. We have to be very careful about this. bq. ...However, we could do that in a separate JIRA, not here Agreed. Let's open a separate HDFS JIRA and use this for the Common-side fix. This patch alone was enough to successfully restart a NN which had an open file with a 4-byte codepoint. UTF8 class does not properly decode Unicode characters outside the basic multilingual plane --- Key: HADOOP-9103 URL: https://issues.apache.org/jira/browse/HADOOP-9103 Project: Hadoop Common Issue Type: Bug Components: io Affects Versions: 0.20.1 Environment: SUSE LINUX Reporter: yixiaohua Assignee: Todd Lipcon Attachments: FSImage.java, hadoop-9103.txt, ProblemString.txt, TestUTF8AndStringGetBytes.java, TestUTF8AndStringGetBytes.java Original Estimate: 12h Remaining Estimate: 12h this the log information of the exception from the SecondaryNameNode: 2012-03-28 00:48:42,553 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: java.io.IOException: Found lease for non-existent file /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/@??? ??tor.qzone.qq.com/keypart-00174 at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFilesUnderConstruction(FSImage.java:1211) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:959) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:589) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$000(SecondaryNameNode.java:473) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:350) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:314) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:225) at java.lang.Thread.run(Thread.java:619) this is the log information about the file from namenode: 2012-03-28 00:32:26,528 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=boss,boss ip=/10.131.16.34cmd=create src=/user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/ @?tor.qzone.qq.com/keypart-00174 dst=null perm=boss:boss:rw-r--r-- 2012-03-28 00:37:42,387 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.allocateBlock: /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/ @?tor.qzone.qq.com/keypart-00174. blk_2751836614265659170_184668759 2012-03-28 00:37:42,696 INFO org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.completeFile: file /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/ @?tor.qzone.qq.com/keypart-00174 is closed by DFSClient_attempt_201203271849_0016_r_000174_0 2012-03-28 00:37:50,315 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=boss,boss ip=/10.131.16.34cmd=rename src=/user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/ @?tor.qzone.qq.com/keypart-00174 dst=/user/boss/pgv/fission/task16/split/ @? tor.qzone.qq.com/keypart-00174 perm=boss:boss:rw-r--r-- after check the code that save FSImage,I found there are a problem that maybe a bug of HDFS Code,I past below: -this is the saveFSImage method in FSImage.java, I make some mark at the problem code /** * Save the contents of the FS image to the file. */ void saveFSImage(File newFile) throws IOException { FSNamesystem fsNamesys = FSNamesystem.getFSNamesystem(); FSDirectory fsDir = fsNamesys.dir; long startTime = FSNamesystem.now(); // // Write out data // DataOutputStream out = new DataOutputStream( new BufferedOutputStream(
[jira] [Commented] (HADOOP-9090) Refactor MetricsSystemImpl to allow for an on-demand publish system
[ https://issues.apache.org/jira/browse/HADOOP-9090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506725#comment-13506725 ] Luke Lu commented on HADOOP-9090: - Good points, Mostafa. I should know that MetricsBuffer though immutable can be shared. I guess it was a kneejerk reaction java verbosity :) Anyway the new logic looks solid to me. Thanks! Refactor MetricsSystemImpl to allow for an on-demand publish system --- Key: HADOOP-9090 URL: https://issues.apache.org/jira/browse/HADOOP-9090 Project: Hadoop Common Issue Type: New Feature Components: metrics Reporter: Mostafa Elhemali Priority: Minor Attachments: HADOOP-9090.2.patch, HADOOP-9090.justEnhanceDefaultImpl.2.patch, HADOOP-9090.justEnhanceDefaultImpl.3.patch, HADOOP-9090.justEnhanceDefaultImpl.4.patch, HADOOP-9090.justEnhanceDefaultImpl.5.patch, HADOOP-9090.justEnhanceDefaultImpl.patch, HADOOP-9090.patch Updated description based on feedback: We have a need to publish metrics out of some short-living processes, which is not really well-suited to the current metrics system implementation which periodically publishes metrics asynchronously (a behavior that works great for long-living processes). Of course I could write my own metrics system, but it seems like such a waste to rewrite all the awesome code currently in the MetricsSystemImpl and supporting classes. The way this JIRA solves this problem is adding a new method publishMetricsNow() to the MetricsSystemImpl() class, that does a synchronous out-of-band push of the metrics from the sources to the sink. I also add a method to MetricsSinkAdapter (putMetricsImmediate) to support that change. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-9106) Allow configuration of IPC connect timeout
Todd Lipcon created HADOOP-9106: --- Summary: Allow configuration of IPC connect timeout Key: HADOOP-9106 URL: https://issues.apache.org/jira/browse/HADOOP-9106 Project: Hadoop Common Issue Type: Improvement Components: ipc Affects Versions: 3.0.0 Reporter: Todd Lipcon Currently the connection timeout in Client.setupConnection() is hard coded to 20seconds. This is unreasonable in some scenarios, such as HA failover, if we want a faster failover time. We should allow this to be configured per-client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9090) Refactor MetricsSystemImpl to allow for an on-demand publish system
[ https://issues.apache.org/jira/browse/HADOOP-9090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506759#comment-13506759 ] Ravi Prakash commented on HADOOP-9090: -- Thanks Luke and Mostafa bq. and the requirement has universal utility Agreed. But it also places restrictions on how scalable the system can be. I'm flexible with where you want to introduce that method. Even then, I would like the behavior of that method javadoc'ed explicitly stating what the expectation is if a MetricsSystem cannot provide real-time guarantees. Refactor MetricsSystemImpl to allow for an on-demand publish system --- Key: HADOOP-9090 URL: https://issues.apache.org/jira/browse/HADOOP-9090 Project: Hadoop Common Issue Type: New Feature Components: metrics Reporter: Mostafa Elhemali Priority: Minor Attachments: HADOOP-9090.2.patch, HADOOP-9090.justEnhanceDefaultImpl.2.patch, HADOOP-9090.justEnhanceDefaultImpl.3.patch, HADOOP-9090.justEnhanceDefaultImpl.4.patch, HADOOP-9090.justEnhanceDefaultImpl.5.patch, HADOOP-9090.justEnhanceDefaultImpl.patch, HADOOP-9090.patch Updated description based on feedback: We have a need to publish metrics out of some short-living processes, which is not really well-suited to the current metrics system implementation which periodically publishes metrics asynchronously (a behavior that works great for long-living processes). Of course I could write my own metrics system, but it seems like such a waste to rewrite all the awesome code currently in the MetricsSystemImpl and supporting classes. The way this JIRA solves this problem is adding a new method publishMetricsNow() to the MetricsSystemImpl() class, that does a synchronous out-of-band push of the metrics from the sources to the sink. I also add a method to MetricsSinkAdapter (putMetricsImmediate) to support that change. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-9090) Refactor MetricsSystemImpl to allow for an on-demand publish system
[ https://issues.apache.org/jira/browse/HADOOP-9090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Elhemali updated HADOOP-9090: - Attachment: HADOOP-9090.justEnhanceDefaultImpl.6.patch That's a fair request Ravi. I took a shot at documenting that in Javadoc on the method - does the wording look reasonable? /** * Requests an immediate publish of all metrics from sources to sinks. * * This is a soft request: the expectation is that a best effort will be * done to synchronously snapshot the metrics from all the sources and put * them in all the sinks (including flushing the sinks) before returning to * the caller. If this can't be accomplished in reasonable time it's OK to * return to the caller before everything is done. */ Refactor MetricsSystemImpl to allow for an on-demand publish system --- Key: HADOOP-9090 URL: https://issues.apache.org/jira/browse/HADOOP-9090 Project: Hadoop Common Issue Type: New Feature Components: metrics Reporter: Mostafa Elhemali Priority: Minor Attachments: HADOOP-9090.2.patch, HADOOP-9090.justEnhanceDefaultImpl.2.patch, HADOOP-9090.justEnhanceDefaultImpl.3.patch, HADOOP-9090.justEnhanceDefaultImpl.4.patch, HADOOP-9090.justEnhanceDefaultImpl.5.patch, HADOOP-9090.justEnhanceDefaultImpl.6.patch, HADOOP-9090.justEnhanceDefaultImpl.patch, HADOOP-9090.patch Updated description based on feedback: We have a need to publish metrics out of some short-living processes, which is not really well-suited to the current metrics system implementation which periodically publishes metrics asynchronously (a behavior that works great for long-living processes). Of course I could write my own metrics system, but it seems like such a waste to rewrite all the awesome code currently in the MetricsSystemImpl and supporting classes. The way this JIRA solves this problem is adding a new method publishMetricsNow() to the MetricsSystemImpl() class, that does a synchronous out-of-band push of the metrics from the sources to the sink. I also add a method to MetricsSinkAdapter (putMetricsImmediate) to support that change. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-9107) Hadoop IPC client eats InterruptedException and sets interrupt on the thread which is not documented
Hari Shreedharan created HADOOP-9107: Summary: Hadoop IPC client eats InterruptedException and sets interrupt on the thread which is not documented Key: HADOOP-9107 URL: https://issues.apache.org/jira/browse/HADOOP-9107 Project: Hadoop Common Issue Type: Bug Components: ipc Affects Versions: 2.0.2-alpha Reporter: Hari Shreedharan This code in Client.java looks fishy: {code} public Writable call(RPC.RpcKind rpcKind, Writable rpcRequest, ConnectionId remoteId) throws InterruptedException, IOException { Call call = new Call(rpcKind, rpcRequest); Connection connection = getConnection(remoteId, call); connection.sendParam(call); // send the parameter boolean interrupted = false; synchronized (call) { while (!call.done) { try { call.wait(); // wait for the result } catch (InterruptedException ie) { // save the fact that we were interrupted interrupted = true; } } if (interrupted) { // set the interrupt flag now that we are done waiting Thread.currentThread().interrupt(); } if (call.error != null) { if (call.error instanceof RemoteException) { call.error.fillInStackTrace(); throw call.error; } else { // local exception InetSocketAddress address = connection.getRemoteAddress(); throw NetUtils.wrapException(address.getHostName(), address.getPort(), NetUtils.getHostname(), 0, call.error); } } else { return call.getRpcResult(); } } } {code} Blocking calls are expected to throw InterruptedException if that is interrupted. Also it seems like this method waits on the call objects even if it is interrupted. Currently, this method does not throw an InterruptedException, nor is it documented that this method interrupts the thread calling it. If it is interrupted, this method should still throw InterruptedException, it should not matter if the call was successful or not. This is a major issue for clients which do not call this directly, but call HDFS client API methods to write to HDFS, which may be interrupted by the client due to timeouts, but does not throw InterruptedException. Any HDFS client calls can interrupt the thread but it is not documented anywhere. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-9107) Hadoop IPC client eats InterruptedException and sets interrupt on the thread which is not documented
[ https://issues.apache.org/jira/browse/HADOOP-9107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Shreedharan updated HADOOP-9107: - Affects Version/s: 1.1.0 Hadoop IPC client eats InterruptedException and sets interrupt on the thread which is not documented Key: HADOOP-9107 URL: https://issues.apache.org/jira/browse/HADOOP-9107 Project: Hadoop Common Issue Type: Bug Components: ipc Affects Versions: 1.1.0, 2.0.2-alpha Reporter: Hari Shreedharan This code in Client.java looks fishy: {code} public Writable call(RPC.RpcKind rpcKind, Writable rpcRequest, ConnectionId remoteId) throws InterruptedException, IOException { Call call = new Call(rpcKind, rpcRequest); Connection connection = getConnection(remoteId, call); connection.sendParam(call); // send the parameter boolean interrupted = false; synchronized (call) { while (!call.done) { try { call.wait(); // wait for the result } catch (InterruptedException ie) { // save the fact that we were interrupted interrupted = true; } } if (interrupted) { // set the interrupt flag now that we are done waiting Thread.currentThread().interrupt(); } if (call.error != null) { if (call.error instanceof RemoteException) { call.error.fillInStackTrace(); throw call.error; } else { // local exception InetSocketAddress address = connection.getRemoteAddress(); throw NetUtils.wrapException(address.getHostName(), address.getPort(), NetUtils.getHostname(), 0, call.error); } } else { return call.getRpcResult(); } } } {code} Blocking calls are expected to throw InterruptedException if that is interrupted. Also it seems like this method waits on the call objects even if it is interrupted. Currently, this method does not throw an InterruptedException, nor is it documented that this method interrupts the thread calling it. If it is interrupted, this method should still throw InterruptedException, it should not matter if the call was successful or not. This is a major issue for clients which do not call this directly, but call HDFS client API methods to write to HDFS, which may be interrupted by the client due to timeouts, but does not throw InterruptedException. Any HDFS client calls can interrupt the thread but it is not documented anywhere. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9090) Refactor MetricsSystemImpl to allow for an on-demand publish system
[ https://issues.apache.org/jira/browse/HADOOP-9090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506794#comment-13506794 ] Ravi Prakash commented on HADOOP-9090: -- Thanks Mostafa! That works for me. +1 from my side. Refactor MetricsSystemImpl to allow for an on-demand publish system --- Key: HADOOP-9090 URL: https://issues.apache.org/jira/browse/HADOOP-9090 Project: Hadoop Common Issue Type: New Feature Components: metrics Reporter: Mostafa Elhemali Priority: Minor Attachments: HADOOP-9090.2.patch, HADOOP-9090.justEnhanceDefaultImpl.2.patch, HADOOP-9090.justEnhanceDefaultImpl.3.patch, HADOOP-9090.justEnhanceDefaultImpl.4.patch, HADOOP-9090.justEnhanceDefaultImpl.5.patch, HADOOP-9090.justEnhanceDefaultImpl.6.patch, HADOOP-9090.justEnhanceDefaultImpl.patch, HADOOP-9090.patch Updated description based on feedback: We have a need to publish metrics out of some short-living processes, which is not really well-suited to the current metrics system implementation which periodically publishes metrics asynchronously (a behavior that works great for long-living processes). Of course I could write my own metrics system, but it seems like such a waste to rewrite all the awesome code currently in the MetricsSystemImpl and supporting classes. The way this JIRA solves this problem is adding a new method publishMetricsNow() to the MetricsSystemImpl() class, that does a synchronous out-of-band push of the metrics from the sources to the sink. I also add a method to MetricsSinkAdapter (putMetricsImmediate) to support that change. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9107) Hadoop IPC client eats InterruptedException and sets interrupt on the thread which is not documented
[ https://issues.apache.org/jira/browse/HADOOP-9107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506853#comment-13506853 ] Karthik Kambatla commented on HADOOP-9107: -- The things to fix look like: # document that the method eats up {{InterruptedException}} # break after setting interrupted to true in the catch block # throw appropriate exception in the {{else}} branch of {{if (call.error != null)}} Hadoop IPC client eats InterruptedException and sets interrupt on the thread which is not documented Key: HADOOP-9107 URL: https://issues.apache.org/jira/browse/HADOOP-9107 Project: Hadoop Common Issue Type: Bug Components: ipc Affects Versions: 1.1.0, 2.0.2-alpha Reporter: Hari Shreedharan This code in Client.java looks fishy: {code} public Writable call(RPC.RpcKind rpcKind, Writable rpcRequest, ConnectionId remoteId) throws InterruptedException, IOException { Call call = new Call(rpcKind, rpcRequest); Connection connection = getConnection(remoteId, call); connection.sendParam(call); // send the parameter boolean interrupted = false; synchronized (call) { while (!call.done) { try { call.wait(); // wait for the result } catch (InterruptedException ie) { // save the fact that we were interrupted interrupted = true; } } if (interrupted) { // set the interrupt flag now that we are done waiting Thread.currentThread().interrupt(); } if (call.error != null) { if (call.error instanceof RemoteException) { call.error.fillInStackTrace(); throw call.error; } else { // local exception InetSocketAddress address = connection.getRemoteAddress(); throw NetUtils.wrapException(address.getHostName(), address.getPort(), NetUtils.getHostname(), 0, call.error); } } else { return call.getRpcResult(); } } } {code} Blocking calls are expected to throw InterruptedException if that is interrupted. Also it seems like this method waits on the call objects even if it is interrupted. Currently, this method does not throw an InterruptedException, nor is it documented that this method interrupts the thread calling it. If it is interrupted, this method should still throw InterruptedException, it should not matter if the call was successful or not. This is a major issue for clients which do not call this directly, but call HDFS client API methods to write to HDFS, which may be interrupted by the client due to timeouts, but does not throw InterruptedException. Any HDFS client calls can interrupt the thread but it is not documented anywhere. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9103) UTF8 class does not properly decode Unicode characters outside the basic multilingual plane
[ https://issues.apache.org/jira/browse/HADOOP-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506855#comment-13506855 ] Colin Patrick McCabe commented on HADOOP-9103: -- bq. Got an example hex sequence you think we should test against? Here is a 5-byte sequence that used to be valid UTF-8, before the 4-byte max rule was put into place: {{0xF8 0x88 0x80 0x80 0x80}} Source: http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt UTF8 class does not properly decode Unicode characters outside the basic multilingual plane --- Key: HADOOP-9103 URL: https://issues.apache.org/jira/browse/HADOOP-9103 Project: Hadoop Common Issue Type: Bug Components: io Affects Versions: 0.20.1 Environment: SUSE LINUX Reporter: yixiaohua Assignee: Todd Lipcon Attachments: FSImage.java, hadoop-9103.txt, ProblemString.txt, TestUTF8AndStringGetBytes.java, TestUTF8AndStringGetBytes.java Original Estimate: 12h Remaining Estimate: 12h this the log information of the exception from the SecondaryNameNode: 2012-03-28 00:48:42,553 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: java.io.IOException: Found lease for non-existent file /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/@??? ??tor.qzone.qq.com/keypart-00174 at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFilesUnderConstruction(FSImage.java:1211) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:959) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:589) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$000(SecondaryNameNode.java:473) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:350) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:314) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:225) at java.lang.Thread.run(Thread.java:619) this is the log information about the file from namenode: 2012-03-28 00:32:26,528 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=boss,boss ip=/10.131.16.34cmd=create src=/user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/ @?tor.qzone.qq.com/keypart-00174 dst=null perm=boss:boss:rw-r--r-- 2012-03-28 00:37:42,387 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.allocateBlock: /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/ @?tor.qzone.qq.com/keypart-00174. blk_2751836614265659170_184668759 2012-03-28 00:37:42,696 INFO org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.completeFile: file /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/ @?tor.qzone.qq.com/keypart-00174 is closed by DFSClient_attempt_201203271849_0016_r_000174_0 2012-03-28 00:37:50,315 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=boss,boss ip=/10.131.16.34cmd=rename src=/user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/ @?tor.qzone.qq.com/keypart-00174 dst=/user/boss/pgv/fission/task16/split/ @? tor.qzone.qq.com/keypart-00174 perm=boss:boss:rw-r--r-- after check the code that save FSImage,I found there are a problem that maybe a bug of HDFS Code,I past below: -this is the saveFSImage method in FSImage.java, I make some mark at the problem code /** * Save the contents of the FS image to the file. */ void saveFSImage(File newFile) throws IOException { FSNamesystem fsNamesys = FSNamesystem.getFSNamesystem(); FSDirectory fsDir = fsNamesys.dir; long startTime = FSNamesystem.now(); // // Write out data // DataOutputStream out = new DataOutputStream( new BufferedOutputStream( new FileOutputStream(newFile))); try { . // save the rest of the nodes saveImage(strbuf, 0, fsDir.rootDir, out);--problem fsNamesys.saveFilesUnderConstruction(out);--problem detail is below strbuf = null; } finally { out.close(); } LOG.info(Image file of size + newFile.length() + saved in + (FSNamesystem.now() -
[jira] [Updated] (HADOOP-9103) UTF8 class does not properly decode Unicode characters outside the basic multilingual plane
[ https://issues.apache.org/jira/browse/HADOOP-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HADOOP-9103: Attachment: hadoop-9103.txt Attached patch includes the test sequence Colin provided above. UTF8 class does not properly decode Unicode characters outside the basic multilingual plane --- Key: HADOOP-9103 URL: https://issues.apache.org/jira/browse/HADOOP-9103 Project: Hadoop Common Issue Type: Bug Components: io Affects Versions: 0.20.1 Environment: SUSE LINUX Reporter: yixiaohua Assignee: Todd Lipcon Attachments: FSImage.java, hadoop-9103.txt, hadoop-9103.txt, ProblemString.txt, TestUTF8AndStringGetBytes.java, TestUTF8AndStringGetBytes.java Original Estimate: 12h Remaining Estimate: 12h this the log information of the exception from the SecondaryNameNode: 2012-03-28 00:48:42,553 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: java.io.IOException: Found lease for non-existent file /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/@??? ??tor.qzone.qq.com/keypart-00174 at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFilesUnderConstruction(FSImage.java:1211) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:959) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:589) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$000(SecondaryNameNode.java:473) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:350) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:314) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:225) at java.lang.Thread.run(Thread.java:619) this is the log information about the file from namenode: 2012-03-28 00:32:26,528 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=boss,boss ip=/10.131.16.34cmd=create src=/user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/ @?tor.qzone.qq.com/keypart-00174 dst=null perm=boss:boss:rw-r--r-- 2012-03-28 00:37:42,387 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.allocateBlock: /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/ @?tor.qzone.qq.com/keypart-00174. blk_2751836614265659170_184668759 2012-03-28 00:37:42,696 INFO org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.completeFile: file /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/ @?tor.qzone.qq.com/keypart-00174 is closed by DFSClient_attempt_201203271849_0016_r_000174_0 2012-03-28 00:37:50,315 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=boss,boss ip=/10.131.16.34cmd=rename src=/user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/ @?tor.qzone.qq.com/keypart-00174 dst=/user/boss/pgv/fission/task16/split/ @? tor.qzone.qq.com/keypart-00174 perm=boss:boss:rw-r--r-- after check the code that save FSImage,I found there are a problem that maybe a bug of HDFS Code,I past below: -this is the saveFSImage method in FSImage.java, I make some mark at the problem code /** * Save the contents of the FS image to the file. */ void saveFSImage(File newFile) throws IOException { FSNamesystem fsNamesys = FSNamesystem.getFSNamesystem(); FSDirectory fsDir = fsNamesys.dir; long startTime = FSNamesystem.now(); // // Write out data // DataOutputStream out = new DataOutputStream( new BufferedOutputStream( new FileOutputStream(newFile))); try { . // save the rest of the nodes saveImage(strbuf, 0, fsDir.rootDir, out);--problem fsNamesys.saveFilesUnderConstruction(out);--problem detail is below strbuf = null; } finally { out.close(); } LOG.info(Image file of size + newFile.length() + saved in + (FSNamesystem.now() - startTime)/1000 + seconds.); } /** * Save file tree image starting from the given root. * This is a recursive procedure, which first saves all children of * a current directory and then moves inside the
[jira] [Commented] (HADOOP-9103) UTF8 class does not properly decode Unicode characters outside the basic multilingual plane
[ https://issues.apache.org/jira/browse/HADOOP-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506894#comment-13506894 ] Andy Isaacson commented on HADOOP-9103: --- bq. + * This is a regression est for HDFS-3307. test, not est. Since this jira has moved to HADOOP-9103, update the reference. {code} + * Note that this decodes UTF-8 but actually encodes CESU-8, a variant of + * UTF-8: see http://en.wikipedia.org/wiki/CESU-8 {code} Rather than adding a comment saying this code is buggy, how about we fix the bug? Outputting proper 4-byte UTF8 sequences for a given UTF-16 surrogate pair is a much better solution than the current behavior. So as far as it goes the patch looks good. I'll look into the surrogate pair stuff. UTF8 class does not properly decode Unicode characters outside the basic multilingual plane --- Key: HADOOP-9103 URL: https://issues.apache.org/jira/browse/HADOOP-9103 Project: Hadoop Common Issue Type: Bug Components: io Affects Versions: 0.20.1 Environment: SUSE LINUX Reporter: yixiaohua Assignee: Todd Lipcon Attachments: FSImage.java, hadoop-9103.txt, hadoop-9103.txt, ProblemString.txt, TestUTF8AndStringGetBytes.java, TestUTF8AndStringGetBytes.java Original Estimate: 12h Remaining Estimate: 12h this the log information of the exception from the SecondaryNameNode: 2012-03-28 00:48:42,553 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: java.io.IOException: Found lease for non-existent file /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/@??? ??tor.qzone.qq.com/keypart-00174 at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFilesUnderConstruction(FSImage.java:1211) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:959) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:589) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$000(SecondaryNameNode.java:473) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:350) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:314) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:225) at java.lang.Thread.run(Thread.java:619) this is the log information about the file from namenode: 2012-03-28 00:32:26,528 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=boss,boss ip=/10.131.16.34cmd=create src=/user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/ @?tor.qzone.qq.com/keypart-00174 dst=null perm=boss:boss:rw-r--r-- 2012-03-28 00:37:42,387 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.allocateBlock: /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/ @?tor.qzone.qq.com/keypart-00174. blk_2751836614265659170_184668759 2012-03-28 00:37:42,696 INFO org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.completeFile: file /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/ @?tor.qzone.qq.com/keypart-00174 is closed by DFSClient_attempt_201203271849_0016_r_000174_0 2012-03-28 00:37:50,315 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=boss,boss ip=/10.131.16.34cmd=rename src=/user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/ @?tor.qzone.qq.com/keypart-00174 dst=/user/boss/pgv/fission/task16/split/ @? tor.qzone.qq.com/keypart-00174 perm=boss:boss:rw-r--r-- after check the code that save FSImage,I found there are a problem that maybe a bug of HDFS Code,I past below: -this is the saveFSImage method in FSImage.java, I make some mark at the problem code /** * Save the contents of the FS image to the file. */ void saveFSImage(File newFile) throws IOException { FSNamesystem fsNamesys = FSNamesystem.getFSNamesystem(); FSDirectory fsDir = fsNamesys.dir; long startTime = FSNamesystem.now(); // // Write out data // DataOutputStream out = new DataOutputStream( new BufferedOutputStream( new FileOutputStream(newFile))); try { . // save the rest of the nodes saveImage(strbuf, 0,
[jira] [Commented] (HADOOP-9103) UTF8 class does not properly decode Unicode characters outside the basic multilingual plane
[ https://issues.apache.org/jira/browse/HADOOP-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506912#comment-13506912 ] Colin Patrick McCabe commented on HADOOP-9103: -- bq. Rather than adding a comment saying this code is buggy, how about we fix the bug? Outputting proper 4-byte UTF8 sequences for a given UTF-16 surrogate pair is a much better solution than the current behavior. That would be an incompatible change. Consider what happens when the server hands back 4-byte UTF-8 sequences to existing DFSClients. Boom, they fall over. UTF8 class does not properly decode Unicode characters outside the basic multilingual plane --- Key: HADOOP-9103 URL: https://issues.apache.org/jira/browse/HADOOP-9103 Project: Hadoop Common Issue Type: Bug Components: io Affects Versions: 0.20.1 Environment: SUSE LINUX Reporter: yixiaohua Assignee: Todd Lipcon Attachments: FSImage.java, hadoop-9103.txt, hadoop-9103.txt, ProblemString.txt, TestUTF8AndStringGetBytes.java, TestUTF8AndStringGetBytes.java Original Estimate: 12h Remaining Estimate: 12h this the log information of the exception from the SecondaryNameNode: 2012-03-28 00:48:42,553 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: java.io.IOException: Found lease for non-existent file /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/@??? ??tor.qzone.qq.com/keypart-00174 at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFilesUnderConstruction(FSImage.java:1211) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:959) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:589) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$000(SecondaryNameNode.java:473) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:350) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:314) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:225) at java.lang.Thread.run(Thread.java:619) this is the log information about the file from namenode: 2012-03-28 00:32:26,528 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=boss,boss ip=/10.131.16.34cmd=create src=/user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/ @?tor.qzone.qq.com/keypart-00174 dst=null perm=boss:boss:rw-r--r-- 2012-03-28 00:37:42,387 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.allocateBlock: /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/ @?tor.qzone.qq.com/keypart-00174. blk_2751836614265659170_184668759 2012-03-28 00:37:42,696 INFO org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.completeFile: file /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/ @?tor.qzone.qq.com/keypart-00174 is closed by DFSClient_attempt_201203271849_0016_r_000174_0 2012-03-28 00:37:50,315 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=boss,boss ip=/10.131.16.34cmd=rename src=/user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/ @?tor.qzone.qq.com/keypart-00174 dst=/user/boss/pgv/fission/task16/split/ @? tor.qzone.qq.com/keypart-00174 perm=boss:boss:rw-r--r-- after check the code that save FSImage,I found there are a problem that maybe a bug of HDFS Code,I past below: -this is the saveFSImage method in FSImage.java, I make some mark at the problem code /** * Save the contents of the FS image to the file. */ void saveFSImage(File newFile) throws IOException { FSNamesystem fsNamesys = FSNamesystem.getFSNamesystem(); FSDirectory fsDir = fsNamesys.dir; long startTime = FSNamesystem.now(); // // Write out data // DataOutputStream out = new DataOutputStream( new BufferedOutputStream( new FileOutputStream(newFile))); try { . // save the rest of the nodes saveImage(strbuf, 0, fsDir.rootDir, out);--problem fsNamesys.saveFilesUnderConstruction(out);--problem detail is below strbuf = null; } finally {
[jira] [Updated] (HADOOP-9056) Build native library on Windows
[ https://issues.apache.org/jira/browse/HADOOP-9056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HADOOP-9056: -- Attachment: HADOOP-9056.1.patch Build native library on Windows --- Key: HADOOP-9056 URL: https://issues.apache.org/jira/browse/HADOOP-9056 Project: Hadoop Common Issue Type: Improvement Components: native Affects Versions: trunk-win Reporter: Arpit Agarwal Assignee: Arpit Agarwal Fix For: trunk-win Attachments: HADOOP-9056.1.patch, HADOOP-9056.patch Original Estimate: 168h Remaining Estimate: 168h The native library (hadoop.dll) must be compiled on Windows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-8982) TestSocketIOWithTimeout fails on Windows
[ https://issues.apache.org/jira/browse/HADOOP-8982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506928#comment-13506928 ] Xuan Gong commented on HADOOP-8982: --- The failure of this test case is, i think, because the partial write is handled differently by mac and windows. We actually write the bytes to channels, the Pipe.SinkChannel we are using here implements a interface WritableByteChannel, from Java Doc about write function of this interface, it said Some types of channels, depending upon their state, may write only some of the bytes or possibly none at all. That is one reason why I think the differnet OS may cause the failure. Bacially, this test opens a pipe channel, and the sink will keep writing to the channel with 4192 bytes each time. When the channel is full, sink will do the parial write(write 3000 bytes to the channel, the channel is full, then the remaining bytes in the Bytebuffer is 1192) when the test is running on the mac environment, on the other hand, when we run this test on the windows environment, if the channel can not fit for the full Bytebuffer size, it will not allow us to write part of it. That means, when we try to write 4192 bytes to the channel when the channel still has 3000 bytes empty size. We can not write at all. The remaining bytes in the Bytebuffer is still 4192. When this partial wirte happens, we check the condition buf.capacity buf.remaining or not, if yes, we will close the stream. So, that is why the stream is close on Mac environment, but still open in windows environment. So, the next time, when we try to write, we will not go expected stream is close exception at Windows environment. So far, this is from my observations. So, the questions is whether the windows and mac handle parital write as I decribed previous ? If it is true, in order to fix this test failure. What we can do is add the function called tryToWriteOneByte() in SocketOutPutStream.java file, this function is only for test purpose. public void tryToWriteOneByte(){ try{ write(1); writer.close(); }catch(IOException e){ //do nothing } } Calling this function will insert a byte to the channel, if we can do that, that means the channel is not full, and the partial write happens, so we need to close the stream. If we can not do that, we will catch a exception, that means last time we got the exception is not because the partial write is happened, is because the channel is full before we do the next 4192 bytes write. Since this test failure is happened on Windows, in the TestSocketIOWithTimeout.java, we can check whether the environment is Windows before we call this function. After doIO(null,out,TIMEOUT), we can do if(System.getProperty(os.name).toLowerCase().indexOf(win)=0){ out.tryToWriteOneByte(); } TestSocketIOWithTimeout fails on Windows Key: HADOOP-8982 URL: https://issues.apache.org/jira/browse/HADOOP-8982 Project: Hadoop Common Issue Type: Bug Components: net Affects Versions: trunk-win Reporter: Chris Nauroth Assignee: Chris Nauroth This is a possible race condition or difference in socket handling on Windows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-9103) UTF8 class does not properly decode Unicode characters outside the basic multilingual plane
[ https://issues.apache.org/jira/browse/HADOOP-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HADOOP-9103: Attachment: hadoop-9103.txt Fixed typo in the test javadoc UTF8 class does not properly decode Unicode characters outside the basic multilingual plane --- Key: HADOOP-9103 URL: https://issues.apache.org/jira/browse/HADOOP-9103 Project: Hadoop Common Issue Type: Bug Components: io Affects Versions: 0.20.1 Environment: SUSE LINUX Reporter: yixiaohua Assignee: Todd Lipcon Attachments: FSImage.java, hadoop-9103.txt, hadoop-9103.txt, hadoop-9103.txt, ProblemString.txt, TestUTF8AndStringGetBytes.java, TestUTF8AndStringGetBytes.java Original Estimate: 12h Remaining Estimate: 12h this the log information of the exception from the SecondaryNameNode: 2012-03-28 00:48:42,553 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: java.io.IOException: Found lease for non-existent file /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/@??? ??tor.qzone.qq.com/keypart-00174 at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFilesUnderConstruction(FSImage.java:1211) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:959) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:589) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$000(SecondaryNameNode.java:473) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:350) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:314) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:225) at java.lang.Thread.run(Thread.java:619) this is the log information about the file from namenode: 2012-03-28 00:32:26,528 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=boss,boss ip=/10.131.16.34cmd=create src=/user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/ @?tor.qzone.qq.com/keypart-00174 dst=null perm=boss:boss:rw-r--r-- 2012-03-28 00:37:42,387 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.allocateBlock: /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/ @?tor.qzone.qq.com/keypart-00174. blk_2751836614265659170_184668759 2012-03-28 00:37:42,696 INFO org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.completeFile: file /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/ @?tor.qzone.qq.com/keypart-00174 is closed by DFSClient_attempt_201203271849_0016_r_000174_0 2012-03-28 00:37:50,315 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=boss,boss ip=/10.131.16.34cmd=rename src=/user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/ @?tor.qzone.qq.com/keypart-00174 dst=/user/boss/pgv/fission/task16/split/ @? tor.qzone.qq.com/keypart-00174 perm=boss:boss:rw-r--r-- after check the code that save FSImage,I found there are a problem that maybe a bug of HDFS Code,I past below: -this is the saveFSImage method in FSImage.java, I make some mark at the problem code /** * Save the contents of the FS image to the file. */ void saveFSImage(File newFile) throws IOException { FSNamesystem fsNamesys = FSNamesystem.getFSNamesystem(); FSDirectory fsDir = fsNamesys.dir; long startTime = FSNamesystem.now(); // // Write out data // DataOutputStream out = new DataOutputStream( new BufferedOutputStream( new FileOutputStream(newFile))); try { . // save the rest of the nodes saveImage(strbuf, 0, fsDir.rootDir, out);--problem fsNamesys.saveFilesUnderConstruction(out);--problem detail is below strbuf = null; } finally { out.close(); } LOG.info(Image file of size + newFile.length() + saved in + (FSNamesystem.now() - startTime)/1000 + seconds.); } /** * Save file tree image starting from the given root. * This is a recursive procedure, which first saves all children of * a current directory and then moves inside the sub-directories. */
[jira] [Commented] (HADOOP-9103) UTF8 class does not properly decode Unicode characters outside the basic multilingual plane
[ https://issues.apache.org/jira/browse/HADOOP-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506933#comment-13506933 ] Todd Lipcon commented on HADOOP-9103: - bq. Rather than adding a comment saying this code is buggy, how about we fix the bug? Outputting proper 4-byte UTF8 sequences for a given UTF-16 surrogate pair is a much better solution than the current behavior. It's not buggy it's just different (reminds me of something my elementary school teachers used to say). But on a serious note, yea, what Colin said above -- it could break existing clients of the code who are using the old code to _decode_, and were relying on the fact that we are able to round-trip non-BMP characters through UTF8.java. UTF8 class does not properly decode Unicode characters outside the basic multilingual plane --- Key: HADOOP-9103 URL: https://issues.apache.org/jira/browse/HADOOP-9103 Project: Hadoop Common Issue Type: Bug Components: io Affects Versions: 0.20.1 Environment: SUSE LINUX Reporter: yixiaohua Assignee: Todd Lipcon Attachments: FSImage.java, hadoop-9103.txt, hadoop-9103.txt, hadoop-9103.txt, ProblemString.txt, TestUTF8AndStringGetBytes.java, TestUTF8AndStringGetBytes.java Original Estimate: 12h Remaining Estimate: 12h this the log information of the exception from the SecondaryNameNode: 2012-03-28 00:48:42,553 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: java.io.IOException: Found lease for non-existent file /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/@??? ??tor.qzone.qq.com/keypart-00174 at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFilesUnderConstruction(FSImage.java:1211) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:959) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:589) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$000(SecondaryNameNode.java:473) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:350) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:314) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:225) at java.lang.Thread.run(Thread.java:619) this is the log information about the file from namenode: 2012-03-28 00:32:26,528 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=boss,boss ip=/10.131.16.34cmd=create src=/user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/ @?tor.qzone.qq.com/keypart-00174 dst=null perm=boss:boss:rw-r--r-- 2012-03-28 00:37:42,387 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.allocateBlock: /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/ @?tor.qzone.qq.com/keypart-00174. blk_2751836614265659170_184668759 2012-03-28 00:37:42,696 INFO org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.completeFile: file /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/ @?tor.qzone.qq.com/keypart-00174 is closed by DFSClient_attempt_201203271849_0016_r_000174_0 2012-03-28 00:37:50,315 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=boss,boss ip=/10.131.16.34cmd=rename src=/user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/ @?tor.qzone.qq.com/keypart-00174 dst=/user/boss/pgv/fission/task16/split/ @? tor.qzone.qq.com/keypart-00174 perm=boss:boss:rw-r--r-- after check the code that save FSImage,I found there are a problem that maybe a bug of HDFS Code,I past below: -this is the saveFSImage method in FSImage.java, I make some mark at the problem code /** * Save the contents of the FS image to the file. */ void saveFSImage(File newFile) throws IOException { FSNamesystem fsNamesys = FSNamesystem.getFSNamesystem(); FSDirectory fsDir = fsNamesys.dir; long startTime = FSNamesystem.now(); // // Write out data // DataOutputStream out = new DataOutputStream( new BufferedOutputStream( new FileOutputStream(newFile))); try { . // save the rest of the nodes saveImage(strbuf, 0,
[jira] [Commented] (HADOOP-9056) Build native library on Windows
[ https://issues.apache.org/jira/browse/HADOOP-9056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506943#comment-13506943 ] Arpit Agarwal commented on HADOOP-9056: --- Thanks for the feedback Chuan. I have addressed most of your comments. Most of the SecureIOUtils changes don't seem to be applicable in trunk. There is no equivalent to posix_fadvise in Win32 but we may be able to achieve a similar effect by passing flags to CreateFile, we can address that in a separate patch. Build native library on Windows --- Key: HADOOP-9056 URL: https://issues.apache.org/jira/browse/HADOOP-9056 Project: Hadoop Common Issue Type: Improvement Components: native Affects Versions: trunk-win Reporter: Arpit Agarwal Assignee: Arpit Agarwal Fix For: trunk-win Attachments: HADOOP-9056.1.patch, HADOOP-9056.patch Original Estimate: 168h Remaining Estimate: 168h The native library (hadoop.dll) must be compiled on Windows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9099) NetUtils.normalizeHostName fails on domains where UnknownHost resolves to an IP address
[ https://issues.apache.org/jira/browse/HADOOP-9099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506950#comment-13506950 ] Ivan Mitic commented on HADOOP-9099: Thanks Mostafa and Nicholas for the review! NetUtils.normalizeHostName fails on domains where UnknownHost resolves to an IP address --- Key: HADOOP-9099 URL: https://issues.apache.org/jira/browse/HADOOP-9099 Project: Hadoop Common Issue Type: Bug Components: test Affects Versions: 1-win Reporter: Ivan Mitic Assignee: Ivan Mitic Priority: Minor Fix For: 1.2.0, 1-win Attachments: HADOOP-9099.branch-1-win.patch I just hit this failure. We should use some more unique string for UnknownHost: Testcase: testNormalizeHostName took 0.007 sec FAILED expected:[65.53.5.181] but was:[UnknownHost] junit.framework.AssertionFailedError: expected:[65.53.5.181] but was:[UnknownHost] at org.apache.hadoop.net.TestNetUtils.testNormalizeHostName(TestNetUtils.java:347) Will post a patch in a bit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9093) Move all the Exception in PathExceptions to o.a.h.fs package
[ https://issues.apache.org/jira/browse/HADOOP-9093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506952#comment-13506952 ] Suresh Srinivas commented on HADOOP-9093: - Daryn, I posted your comment in HADOOP-9094. Lets move the conversation to that jira. Move all the Exception in PathExceptions to o.a.h.fs package Key: HADOOP-9093 URL: https://issues.apache.org/jira/browse/HADOOP-9093 Project: Hadoop Common Issue Type: Improvement Affects Versions: 2.0.2-alpha Reporter: Suresh Srinivas Assignee: Suresh Srinivas Fix For: 2.0.3-alpha Attachments: HADOOP-9093.patch The exceptions in PathExceptions are useful for non shell related functionality as well. Making this available as exceptions under fs will help move some of the HDFS implementation code throw more specific exception than throwing IOException (for example see HDFS-4209). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9094) Add interface audience and stability annotation to PathExceptions
[ https://issues.apache.org/jira/browse/HADOOP-9094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506954#comment-13506954 ] Suresh Srinivas commented on HADOOP-9094: - bq. I propose using FileNotFoundException instead of PathNotFoundException as it is already extensively used. Similarly use AccessControlException instead of PathAccessException. If folks agree, I will make that change in the next patch. Alternatively we could at least make these exceptions subclasses of the exception that I am proposing replacing them with. Daryn's comment from HADOOP-9093: bq. I had considered that when I created these exceptions, but wanted all path exceptions to derive from a common class. I suppose PathException could be an interface and we copy-n-paste the base code - which is the main factor I chose to derive from a base class. Add interface audience and stability annotation to PathExceptions - Key: HADOOP-9094 URL: https://issues.apache.org/jira/browse/HADOOP-9094 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 3.0.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas HADOOP-9093 moved path related exceptions to o.a.h.fs. This jira tracks adding interface audience and stability to notation to those exceptions. It also tracks the comment from HADOOP-9093: bq. I propose using FileNotFoundException instead of PathNotFoundException as it is already extensively used. Similarly use AccessControlException instead of PathAccessException. If folks agree, I will make that change in the next patch. Alternatively we could at least make these exceptions subclasses of the exception that I am proposing replacing them with. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9094) Add interface audience and stability annotation to PathExceptions
[ https://issues.apache.org/jira/browse/HADOOP-9094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506957#comment-13506957 ] Suresh Srinivas commented on HADOOP-9094: - bq. I had considered that when I created these exceptions, but wanted all path exceptions to derive from a common class. I suppose PathException could be an interface and we copy-n-paste the base code - which is the main factor I chose to derive from a base class. Given that the new exceptions format the exception message in certain way, making the following change: # Move the message formatting to a static method # Have PathNotFoundException subclass FileNotFoundException. It formats the exception message using the utility. # PathAccessException - rename it as PathAccessControlException. Make it a subclass of AccessControlException. It also formats the exception message using the utility. Other alternative is to blow away the Path*Exception in above cases and use the super class I have propose. The message in the exception can still use the utility to format the message. I am leaning towards the second option. Add interface audience and stability annotation to PathExceptions - Key: HADOOP-9094 URL: https://issues.apache.org/jira/browse/HADOOP-9094 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 3.0.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas HADOOP-9093 moved path related exceptions to o.a.h.fs. This jira tracks adding interface audience and stability to notation to those exceptions. It also tracks the comment from HADOOP-9093: bq. I propose using FileNotFoundException instead of PathNotFoundException as it is already extensively used. Similarly use AccessControlException instead of PathAccessException. If folks agree, I will make that change in the next patch. Alternatively we could at least make these exceptions subclasses of the exception that I am proposing replacing them with. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9107) Hadoop IPC client eats InterruptedException and sets interrupt on the thread which is not documented
[ https://issues.apache.org/jira/browse/HADOOP-9107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506980#comment-13506980 ] Hari Shreedharan commented on HADOOP-9107: -- (1) is insufficient since clients often do not directly call this method. I believe that if this method gets interrupted: * Clean up the call object - seems like some clean up is required in the Connection object. * throw InterruptedException, regardless of whether the calls complete successfully or not. Hadoop IPC client eats InterruptedException and sets interrupt on the thread which is not documented Key: HADOOP-9107 URL: https://issues.apache.org/jira/browse/HADOOP-9107 Project: Hadoop Common Issue Type: Bug Components: ipc Affects Versions: 1.1.0, 2.0.2-alpha Reporter: Hari Shreedharan This code in Client.java looks fishy: {code} public Writable call(RPC.RpcKind rpcKind, Writable rpcRequest, ConnectionId remoteId) throws InterruptedException, IOException { Call call = new Call(rpcKind, rpcRequest); Connection connection = getConnection(remoteId, call); connection.sendParam(call); // send the parameter boolean interrupted = false; synchronized (call) { while (!call.done) { try { call.wait(); // wait for the result } catch (InterruptedException ie) { // save the fact that we were interrupted interrupted = true; } } if (interrupted) { // set the interrupt flag now that we are done waiting Thread.currentThread().interrupt(); } if (call.error != null) { if (call.error instanceof RemoteException) { call.error.fillInStackTrace(); throw call.error; } else { // local exception InetSocketAddress address = connection.getRemoteAddress(); throw NetUtils.wrapException(address.getHostName(), address.getPort(), NetUtils.getHostname(), 0, call.error); } } else { return call.getRpcResult(); } } } {code} Blocking calls are expected to throw InterruptedException if that is interrupted. Also it seems like this method waits on the call objects even if it is interrupted. Currently, this method does not throw an InterruptedException, nor is it documented that this method interrupts the thread calling it. If it is interrupted, this method should still throw InterruptedException, it should not matter if the call was successful or not. This is a major issue for clients which do not call this directly, but call HDFS client API methods to write to HDFS, which may be interrupted by the client due to timeouts, but does not throw InterruptedException. Any HDFS client calls can interrupt the thread but it is not documented anywhere. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9107) Hadoop IPC client eats InterruptedException and sets interrupt on the thread which is not documented
[ https://issues.apache.org/jira/browse/HADOOP-9107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506983#comment-13506983 ] Hari Shreedharan commented on HADOOP-9107: -- To ensure that the real client that calls this should know that the call was interrupted, rather than forcing it to check the thread's interrupt flag. Hadoop IPC client eats InterruptedException and sets interrupt on the thread which is not documented Key: HADOOP-9107 URL: https://issues.apache.org/jira/browse/HADOOP-9107 Project: Hadoop Common Issue Type: Bug Components: ipc Affects Versions: 1.1.0, 2.0.2-alpha Reporter: Hari Shreedharan This code in Client.java looks fishy: {code} public Writable call(RPC.RpcKind rpcKind, Writable rpcRequest, ConnectionId remoteId) throws InterruptedException, IOException { Call call = new Call(rpcKind, rpcRequest); Connection connection = getConnection(remoteId, call); connection.sendParam(call); // send the parameter boolean interrupted = false; synchronized (call) { while (!call.done) { try { call.wait(); // wait for the result } catch (InterruptedException ie) { // save the fact that we were interrupted interrupted = true; } } if (interrupted) { // set the interrupt flag now that we are done waiting Thread.currentThread().interrupt(); } if (call.error != null) { if (call.error instanceof RemoteException) { call.error.fillInStackTrace(); throw call.error; } else { // local exception InetSocketAddress address = connection.getRemoteAddress(); throw NetUtils.wrapException(address.getHostName(), address.getPort(), NetUtils.getHostname(), 0, call.error); } } else { return call.getRpcResult(); } } } {code} Blocking calls are expected to throw InterruptedException if that is interrupted. Also it seems like this method waits on the call objects even if it is interrupted. Currently, this method does not throw an InterruptedException, nor is it documented that this method interrupts the thread calling it. If it is interrupted, this method should still throw InterruptedException, it should not matter if the call was successful or not. This is a major issue for clients which do not call this directly, but call HDFS client API methods to write to HDFS, which may be interrupted by the client due to timeouts, but does not throw InterruptedException. Any HDFS client calls can interrupt the thread but it is not documented anywhere. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9103) UTF8 class does not properly decode Unicode characters outside the basic multilingual plane
[ https://issues.apache.org/jira/browse/HADOOP-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13507039#comment-13507039 ] Andy Isaacson commented on HADOOP-9103: --- bq. It's not buggy it's just different It's buggy if we ever end up writing a CESU-8 bytestream where someone else expects UTF-8. For example, {{dfs -ls}} writing CESU-8 to stdout wouldn't work properly, because other programs such as {{xterm}} or {{putty}} don't implement the CESU-8 decoding rules. (This example doesn't happen currently, because the CESU-8 filename is deserialized into a String, where it's interpreted as a surrogate pair, which is then written, and the correct surrogate pair - UTF-8 encoding happens on the output side.) Hopefully we haven't overlooked any such existing bugs and nobody accidentally uses UTF8.java in the future. (At least it's marked @Deprecated.) Agreed that as long as UTF8.java is the thing that reads the bytestream, we can continue to implement CESU-8 and it can remain partially backwards compatible with previous versions of UTF8.java. UTF8 class does not properly decode Unicode characters outside the basic multilingual plane --- Key: HADOOP-9103 URL: https://issues.apache.org/jira/browse/HADOOP-9103 Project: Hadoop Common Issue Type: Bug Components: io Affects Versions: 0.20.1 Environment: SUSE LINUX Reporter: yixiaohua Assignee: Todd Lipcon Attachments: FSImage.java, hadoop-9103.txt, hadoop-9103.txt, hadoop-9103.txt, ProblemString.txt, TestUTF8AndStringGetBytes.java, TestUTF8AndStringGetBytes.java Original Estimate: 12h Remaining Estimate: 12h this the log information of the exception from the SecondaryNameNode: 2012-03-28 00:48:42,553 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: java.io.IOException: Found lease for non-existent file /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/@??? ??tor.qzone.qq.com/keypart-00174 at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFilesUnderConstruction(FSImage.java:1211) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:959) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:589) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$000(SecondaryNameNode.java:473) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:350) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:314) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:225) at java.lang.Thread.run(Thread.java:619) this is the log information about the file from namenode: 2012-03-28 00:32:26,528 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=boss,boss ip=/10.131.16.34cmd=create src=/user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/ @?tor.qzone.qq.com/keypart-00174 dst=null perm=boss:boss:rw-r--r-- 2012-03-28 00:37:42,387 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.allocateBlock: /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/ @?tor.qzone.qq.com/keypart-00174. blk_2751836614265659170_184668759 2012-03-28 00:37:42,696 INFO org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.completeFile: file /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/ @?tor.qzone.qq.com/keypart-00174 is closed by DFSClient_attempt_201203271849_0016_r_000174_0 2012-03-28 00:37:50,315 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=boss,boss ip=/10.131.16.34cmd=rename src=/user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/ @?tor.qzone.qq.com/keypart-00174 dst=/user/boss/pgv/fission/task16/split/ @? tor.qzone.qq.com/keypart-00174 perm=boss:boss:rw-r--r-- after check the code that save FSImage,I found there are a problem that maybe a bug of HDFS Code,I past below: -this is the saveFSImage method in FSImage.java, I make some mark at the problem code /** * Save the contents of the FS image to the file. */ void saveFSImage(File newFile) throws IOException { FSNamesystem fsNamesys = FSNamesystem.getFSNamesystem(); FSDirectory fsDir = fsNamesys.dir; long startTime = FSNamesystem.now(); // // Write out
[jira] [Created] (HADOOP-9108) Add a method to clear terminateCalled to ExitUtil for test cases
Kihwal Lee created HADOOP-9108: -- Summary: Add a method to clear terminateCalled to ExitUtil for test cases Key: HADOOP-9108 URL: https://issues.apache.org/jira/browse/HADOOP-9108 Project: Hadoop Common Issue Type: Improvement Components: util Affects Versions: 0.23.5, 2.0.2-alpha Reporter: Kihwal Lee Currently once terminateCalled is set, it will stay set since it's a class static variable. This can break test cases where multiple test cases run in one jvm. In MiniDfsCluster, it should be cleared during shutdown for the next test case to run properly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HADOOP-9108) Add a method to clear terminateCalled to ExitUtil for test cases
[ https://issues.apache.org/jira/browse/HADOOP-9108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee resolved HADOOP-9108. Resolution: Invalid Add a method to clear terminateCalled to ExitUtil for test cases Key: HADOOP-9108 URL: https://issues.apache.org/jira/browse/HADOOP-9108 Project: Hadoop Common Issue Type: Improvement Components: util Affects Versions: 2.0.2-alpha, 0.23.5 Reporter: Kihwal Lee Currently once terminateCalled is set, it will stay set since it's a class static variable. This can break test cases where multiple test cases run in one jvm. In MiniDfsCluster, it should be cleared during shutdown for the next test case to run properly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (HADOOP-9108) Add a method to clear terminateCalled to ExitUtil for test cases
[ https://issues.apache.org/jira/browse/HADOOP-9108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee reopened HADOOP-9108: Assignee: Kihwal Lee I found out the necessary changes have already been made in trunk and branch-2 by HDFS-3663 and HDFS-3765. But we cannot simply pull these patches to branch-0.23 because HDFS-3765 contains more than just ExitUtil change. I will use this jira to implement something equivalent for branch-0.23. Since this is for tests, a slight divergence should be of no concern. Add a method to clear terminateCalled to ExitUtil for test cases Key: HADOOP-9108 URL: https://issues.apache.org/jira/browse/HADOOP-9108 Project: Hadoop Common Issue Type: Improvement Components: util Affects Versions: 2.0.2-alpha, 0.23.5 Reporter: Kihwal Lee Assignee: Kihwal Lee Currently once terminateCalled is set, it will stay set since it's a class static variable. This can break test cases where multiple test cases run in one jvm. In MiniDfsCluster, it should be cleared during shutdown for the next test case to run properly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-9108) Add a method to clear terminateCalled to ExitUtil for test cases
[ https://issues.apache.org/jira/browse/HADOOP-9108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HADOOP-9108: --- Target Version/s: 0.23.6 (was: 3.0.0, 2.0.3-alpha, 0.23.6) Affects Version/s: (was: 2.0.2-alpha) Add a method to clear terminateCalled to ExitUtil for test cases Key: HADOOP-9108 URL: https://issues.apache.org/jira/browse/HADOOP-9108 Project: Hadoop Common Issue Type: Improvement Components: util Affects Versions: 0.23.5 Reporter: Kihwal Lee Assignee: Kihwal Lee Currently once terminateCalled is set, it will stay set since it's a class static variable. This can break test cases where multiple test cases run in one jvm. In MiniDfsCluster, it should be cleared during shutdown for the next test case to run properly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9108) Add a method to clear terminateCalled to ExitUtil for test cases
[ https://issues.apache.org/jira/browse/HADOOP-9108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13507068#comment-13507068 ] Hadoop QA commented on HADOOP-9108: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12555457/hadoop-9108.branch-0.23.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1835//console This message is automatically generated. Add a method to clear terminateCalled to ExitUtil for test cases Key: HADOOP-9108 URL: https://issues.apache.org/jira/browse/HADOOP-9108 Project: Hadoop Common Issue Type: Improvement Components: util Affects Versions: 0.23.5 Reporter: Kihwal Lee Assignee: Kihwal Lee Attachments: hadoop-9108.branch-0.23.patch Currently once terminateCalled is set, it will stay set since it's a class static variable. This can break test cases where multiple test cases run in one jvm. In MiniDfsCluster, it should be cleared during shutdown for the next test case to run properly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-9108) Add a method to clear terminateCalled to ExitUtil for test cases
[ https://issues.apache.org/jira/browse/HADOOP-9108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated HADOOP-9108: -- Resolution: Fixed Fix Version/s: 0.23.6 Status: Resolved (was: Patch Available) This patch only applies to branch-0.23, hence the jenkins failures. I committed it only to branch-0.23 since trunk and branch-2 already have similar functionality. Add a method to clear terminateCalled to ExitUtil for test cases Key: HADOOP-9108 URL: https://issues.apache.org/jira/browse/HADOOP-9108 Project: Hadoop Common Issue Type: Improvement Components: util Affects Versions: 0.23.5 Reporter: Kihwal Lee Assignee: Kihwal Lee Fix For: 0.23.6 Attachments: hadoop-9108.branch-0.23.patch Currently once terminateCalled is set, it will stay set since it's a class static variable. This can break test cases where multiple test cases run in one jvm. In MiniDfsCluster, it should be cleared during shutdown for the next test case to run properly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9107) Hadoop IPC client eats InterruptedException and sets interrupt on the thread which is not documented
[ https://issues.apache.org/jira/browse/HADOOP-9107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13507081#comment-13507081 ] Hari Shreedharan commented on HADOOP-9107: -- My take on what should really happen in the catch block: * call.setException() * Remove call from the calls table. * In the receiveResponse method, check if calls.get(callId) returns null before proceeding. * throw the InterruptedException (or wrap it and then throw), so client code can know something went wrong and the call failed. Hadoop IPC client eats InterruptedException and sets interrupt on the thread which is not documented Key: HADOOP-9107 URL: https://issues.apache.org/jira/browse/HADOOP-9107 Project: Hadoop Common Issue Type: Bug Components: ipc Affects Versions: 1.1.0, 2.0.2-alpha Reporter: Hari Shreedharan This code in Client.java looks fishy: {code} public Writable call(RPC.RpcKind rpcKind, Writable rpcRequest, ConnectionId remoteId) throws InterruptedException, IOException { Call call = new Call(rpcKind, rpcRequest); Connection connection = getConnection(remoteId, call); connection.sendParam(call); // send the parameter boolean interrupted = false; synchronized (call) { while (!call.done) { try { call.wait(); // wait for the result } catch (InterruptedException ie) { // save the fact that we were interrupted interrupted = true; } } if (interrupted) { // set the interrupt flag now that we are done waiting Thread.currentThread().interrupt(); } if (call.error != null) { if (call.error instanceof RemoteException) { call.error.fillInStackTrace(); throw call.error; } else { // local exception InetSocketAddress address = connection.getRemoteAddress(); throw NetUtils.wrapException(address.getHostName(), address.getPort(), NetUtils.getHostname(), 0, call.error); } } else { return call.getRpcResult(); } } } {code} Blocking calls are expected to throw InterruptedException if that is interrupted. Also it seems like this method waits on the call objects even if it is interrupted. Currently, this method does not throw an InterruptedException, nor is it documented that this method interrupts the thread calling it. If it is interrupted, this method should still throw InterruptedException, it should not matter if the call was successful or not. This is a major issue for clients which do not call this directly, but call HDFS client API methods to write to HDFS, which may be interrupted by the client due to timeouts, but does not throw InterruptedException. Any HDFS client calls can interrupt the thread but it is not documented anywhere. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9108) Add a method to clear terminateCalled to ExitUtil for test cases
[ https://issues.apache.org/jira/browse/HADOOP-9108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13507104#comment-13507104 ] Suresh Srinivas commented on HADOOP-9108: - +1 for the patch. Add a method to clear terminateCalled to ExitUtil for test cases Key: HADOOP-9108 URL: https://issues.apache.org/jira/browse/HADOOP-9108 Project: Hadoop Common Issue Type: Improvement Components: util Affects Versions: 0.23.5 Reporter: Kihwal Lee Assignee: Kihwal Lee Fix For: 0.23.6 Attachments: hadoop-9108.branch-0.23.patch Currently once terminateCalled is set, it will stay set since it's a class static variable. This can break test cases where multiple test cases run in one jvm. In MiniDfsCluster, it should be cleared during shutdown for the next test case to run properly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HADOOP-9108) Add a method to clear terminateCalled to ExitUtil for test cases
[ https://issues.apache.org/jira/browse/HADOOP-9108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HADOOP-9108: Issue Type: Bug (was: Improvement) Add a method to clear terminateCalled to ExitUtil for test cases Key: HADOOP-9108 URL: https://issues.apache.org/jira/browse/HADOOP-9108 Project: Hadoop Common Issue Type: Bug Components: util Affects Versions: 0.23.5 Reporter: Kihwal Lee Assignee: Kihwal Lee Fix For: 0.23.6 Attachments: hadoop-9108.branch-0.23.patch Currently once terminateCalled is set, it will stay set since it's a class static variable. This can break test cases where multiple test cases run in one jvm. In MiniDfsCluster, it should be cleared during shutdown for the next test case to run properly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9083) Port HADOOP-9020 Add a SASL PLAIN server to branch 1
[ https://issues.apache.org/jira/browse/HADOOP-9083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13507140#comment-13507140 ] Yu Gao commented on HADOOP-9083: This jira does not intend to introduce SASL PLAIN as a new auth method to 1.x, but just add the SASL PLAIN server implementation, so that it can convenient components that want to use SASL PLAIN mechanism, like Hive. Hive thrift server depends on thrift library which has provided TSaslTransport implementation, so with this PLAIN server registered, TSaslTransport can use it to do PLAIN auth. Port HADOOP-9020 Add a SASL PLAIN server to branch 1 Key: HADOOP-9083 URL: https://issues.apache.org/jira/browse/HADOOP-9083 Project: Hadoop Common Issue Type: Task Components: ipc, security Affects Versions: 1.0.3 Reporter: Yu Gao Assignee: Yu Gao Attachments: HADOOP-9020-branch-1.patch, test-patch.result, test-TestSaslRPC.result It would be good if the patch of HADOOP-9020 for adding SASL PLAIN server implementation could be ported to branch 1 as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9082) Select and document a platform-independent scripting language for use in Hadoop environment
[ https://issues.apache.org/jira/browse/HADOOP-9082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13507163#comment-13507163 ] Allen Wittenauer commented on HADOOP-9082: -- (I know this is mostly going to get ignored because a) it's from me, b) it's more than 3 lines, and c) we've already proven that we only care about Linux despite people wanting support for other platforms, but here we go anyway.) While I can understand the build-time issues, I'm not sure I understand the run-time issues. If you are running on a system that doesn't have libhadoop or want to launch a task, you're going to hit a fork() and that's going to call bash (or potentially sh). Or are we planning on replacing taskjvm.sh as well? So the bash requirement doesn't go away. At run-time, the whole purpose of these scripts is to launch Java. That's it. The problem that we have is that our current scripts are extremely convoluted, wrap into themselves, and fundamentally aren't written very well. Arguing that we can make our launcher scripts object oriented or using an IDE to debug them seems like we're expecting to raise the complexity to even more ludicrous levels. One thing I'm very curious about is if we'll lose the ${BASH_SOURCE} functionality, something I considering absolutely critical, by moving to Python. (It allows one to run without setting *any* environment variables. I think I submitted that as a patch years ago, but well...) Let's say we pick Python. Which version are we going to target? From a support perspective, we could very easily end up asking about not only the Java version but the Python version. Do we really want that? bq. The alternative would be to maintain two complete suites of scripts, one for Linux and one for Windows (and perhaps others in the future). This is what most projects do that have Windows and UNIX functionality, from what I've seen. This is because things are in different locations, delimiters, etc, etc and if you merge them, you end up with a lot of if this then that, or if this2, then that2 to the point that you essentially have two different suites of scripts but just stored in one anyway. bq. We want to avoid the need to update dual modules in two different languages when functionality changes, especially given that many Linux developers are not familiar with powershell or bat, and many Windows developers are not familiar with shell or bash. I think this is the real message: the Linux developers.. which should be read as Java developers who work on Hadoop don't know bash and fundamentally ignore most attempts from outside to improve them. Switching to something else isn't going to change this problem. Instead, it'll just allow for them to continue ignoring the community in favor of their own changes. Perhaps the fundamental problem is this: Why are so many launcher changes even necessary? Why isn't Hadoop smart enough to figure out some of these things after Java is launched? Have we even seriously attempted a simplification of the scripts? (I suspect just using functions instead of the craziness around exported variables would make a world of difference.) Has there been any thought about actually creating real configuration files built by installers so we don't have to recompute a half-dozen things at every run time? Side-note: it would be interesting to see the memory footprint requirement differences on something like one of Yahoo!'s gateways. Sure, individually it isn't much. But at scale... Anyway, I've given my $0.02. Do what you want, I won't stop you. But I do question the thinking behind it. Select and document a platform-independent scripting language for use in Hadoop environment --- Key: HADOOP-9082 URL: https://issues.apache.org/jira/browse/HADOOP-9082 Project: Hadoop Common Issue Type: Bug Reporter: Matt Foley This issue is going to be discussed at length in the common-dev@ mailing list, under topic [PROPOSAL] introduce Python as build-time and run-time dependency for Hadoop and throughout Hadoop stack. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira