[jira] [Commented] (HADOOP-9103) UTF8 class does not properly decode Unicode characters outside the basic multilingual plane

2012-11-29 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506325#comment-13506325
 ] 

Colin Patrick McCabe commented on HADOOP-9103:
--

I said:

bq. since we always encode/decode using hadoop.io.UTF8, and never anything 
else, there should be no problem...

I take this back; looks like we don't always encode/decode using 
{{hadoop.io.UTF8}}.  D'oh!

bq. Attached patch should fix this issue.

Nice.  Should we test for rejecting 5-byte and 6-byte sequences, since I notice 
you added some code to do that?

I'm also a little scared by the idea that we have differently-encoded byte[] 
running around for the same file name strings.  We have to be very careful 
about this.  Unfortunately, we can't change the decoder to emit real UTF-8 
(rather than CESU-8) without making a backwards-incompatible change, since as 
INode.java reminds us, 

{code}
   *  The name in HdfsFileStatus should keep the same encoding as this.
   *  if this encoding is changed, implicitly getFileInfo and listStatus in
   *  clientProtocol are changed; The decoding at the client
   *  side should change accordingly.
{code}

I also wonder if this means that we need to hunt down all the places not using 
CESU-8.  Otherwise older clients are just not going to work with astral plane 
code points, even after this fix... However, we could do that in a separate 
JIRA, not here.

 UTF8 class does not properly decode Unicode characters outside the basic 
 multilingual plane
 ---

 Key: HADOOP-9103
 URL: https://issues.apache.org/jira/browse/HADOOP-9103
 Project: Hadoop Common
  Issue Type: Bug
  Components: io
Affects Versions: 0.20.1
 Environment: SUSE LINUX
Reporter: yixiaohua
Assignee: Todd Lipcon
 Attachments: FSImage.java, hadoop-9103.txt, ProblemString.txt, 
 TestUTF8AndStringGetBytes.java, TestUTF8AndStringGetBytes.java

   Original Estimate: 12h
  Remaining Estimate: 12h

 this the log information  of the  exception  from the SecondaryNameNode: 
 2012-03-28 00:48:42,553 ERROR 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: 
 java.io.IOException: Found lease for
  non-existent file 
 /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/@???
 ??tor.qzone.qq.com/keypart-00174
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFilesUnderConstruction(FSImage.java:1211)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:959)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:589)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$000(SecondaryNameNode.java:473)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:350)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:314)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:225)
 at java.lang.Thread.run(Thread.java:619)
 this is the log information  about the file from namenode:
 2012-03-28 00:32:26,528 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=boss,boss 
 ip=/10.131.16.34cmd=create  
 src=/user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/
   @?tor.qzone.qq.com/keypart-00174 dst=null
 perm=boss:boss:rw-r--r--
 2012-03-28 00:37:42,387 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 NameSystem.allocateBlock: 
 /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/
   @?tor.qzone.qq.com/keypart-00174. 
 blk_2751836614265659170_184668759
 2012-03-28 00:37:42,696 INFO org.apache.hadoop.hdfs.StateChange: DIR* 
 NameSystem.completeFile: file 
 /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/
   @?tor.qzone.qq.com/keypart-00174 is closed by 
 DFSClient_attempt_201203271849_0016_r_000174_0
 2012-03-28 00:37:50,315 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=boss,boss 
 ip=/10.131.16.34cmd=rename  
 src=/user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/
   @?tor.qzone.qq.com/keypart-00174 
 dst=/user/boss/pgv/fission/task16/split/  @?
 tor.qzone.qq.com/keypart-00174  perm=boss:boss:rw-r--r--
 after check the code that save FSImage,I found there are a problem that maybe 
 a bug of HDFS Code,I past below:
 -this is the saveFSImage method  in  FSImage.java, I make some 
 mark at the problem 

[jira] [Created] (HADOOP-9104) Should org.apache.hadoop.fs.DelegationTokenRenewer.addRenewAction(T) reject addition if a renew action for this FS is already present in the queue?

2012-11-29 Thread Ivan A. Veselovsky (JIRA)
Ivan A. Veselovsky created HADOOP-9104:
--

 Summary: Should 
org.apache.hadoop.fs.DelegationTokenRenewer.addRenewAction(T) reject addition 
if a renew action for this FS is already present in the queue?
 Key: HADOOP-9104
 URL: https://issues.apache.org/jira/browse/HADOOP-9104
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Ivan A. Veselovsky


The issue extrected from discussion in 
https://issues.apache.org/jira/browse/HADOOP-9046 .

Currently the method 
org.apache.hadoop.fs.DelegationTokenRenewer.addRenewAction(T) allows to add any 
number of renew actions for the same FS.
Question #1: are there real usecases when this can make sense?

Also, when we remove a renew action with 
org.apache.hadoop.fs.DelegationTokenRenewer.removeRenewAction(T), we iterate 
over all the actions in the queue, and remove the first one with matching FS, 
if any. So, in case if several actions submitted for the same FS, not more than 
one action will be removed upon #removeRenewAction() invocation. So, to remove 
all them a developer will need a cycle. So, if the answer to the question #1 is 
true, may be we should change the #removeRenewAction(FS) behavior to remove all 
actions associated with this FS, or add #removeAllRenewActuions(FS)? This is 
question #2.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-9104) Should org.apache.hadoop.fs.DelegationTokenRenewer.addRenewAction(T) reject addition if a renew action for this FS is already present in the queue?

2012-11-29 Thread Ivan A. Veselovsky (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan A. Veselovsky updated HADOOP-9104:
---

Description: 
The issue extracted from discussion in 
https://issues.apache.org/jira/browse/HADOOP-9046 .

Currently the method 
org.apache.hadoop.fs.DelegationTokenRenewer.addRenewAction(T) allows to add any 
number of renew actions for the same FS.
Question #1: are there real use-cases when that makes sense?

Also, when we remove a renew action with 
org.apache.hadoop.fs.DelegationTokenRenewer.removeRenewAction(T), we iterate 
over all the actions in the queue, and remove the first one with matching FS, 
if any. So, in case if several actions submitted for the same FS, not more than 
one action will be removed upon #removeRenewAction() invocation. So, to remove 
all them for a given FS, a developer will need a cycle. So, if the answer to 
the question #1 is true, may be we should change the #removeRenewAction(FS) 
behavior to remove all actions associated with this FS, or add 
#removeAllRenewActuions(FS)? This is question #2.

  was:
The issue extrected from discussion in 
https://issues.apache.org/jira/browse/HADOOP-9046 .

Currently the method 
org.apache.hadoop.fs.DelegationTokenRenewer.addRenewAction(T) allows to add any 
number of renew actions for the same FS.
Question #1: are there real usecases when this can make sense?

Also, when we remove a renew action with 
org.apache.hadoop.fs.DelegationTokenRenewer.removeRenewAction(T), we iterate 
over all the actions in the queue, and remove the first one with matching FS, 
if any. So, in case if several actions submitted for the same FS, not more than 
one action will be removed upon #removeRenewAction() invocation. So, to remove 
all them a developer will need a cycle. So, if the answer to the question #1 is 
true, may be we should change the #removeRenewAction(FS) behavior to remove all 
actions associated with this FS, or add #removeAllRenewActuions(FS)? This is 
question #2.


 Should org.apache.hadoop.fs.DelegationTokenRenewer.addRenewAction(T) reject 
 addition if a renew action for this FS is already present in the queue?
 ---

 Key: HADOOP-9104
 URL: https://issues.apache.org/jira/browse/HADOOP-9104
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Ivan A. Veselovsky

 The issue extracted from discussion in 
 https://issues.apache.org/jira/browse/HADOOP-9046 .
 Currently the method 
 org.apache.hadoop.fs.DelegationTokenRenewer.addRenewAction(T) allows to add 
 any number of renew actions for the same FS.
 Question #1: are there real use-cases when that makes sense?
 Also, when we remove a renew action with 
 org.apache.hadoop.fs.DelegationTokenRenewer.removeRenewAction(T), we iterate 
 over all the actions in the queue, and remove the first one with matching FS, 
 if any. So, in case if several actions submitted for the same FS, not more 
 than one action will be removed upon #removeRenewAction() invocation. So, to 
 remove all them for a given FS, a developer will need a cycle. So, if the 
 answer to the question #1 is true, may be we should change the 
 #removeRenewAction(FS) behavior to remove all actions associated with this 
 FS, or add #removeAllRenewActuions(FS)? This is question #2.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-9046) provide unit-test coverage of class org.apache.hadoop.fs.DelegationTokenRenewer.RenewActionT

2012-11-29 Thread Ivan A. Veselovsky (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506447#comment-13506447
 ] 

Ivan A. Veselovsky commented on HADOOP-9046:


Hi, Robert, thanks for the comments.

1. Created separate Jira https://issues.apache.org/jira/browse/HADOOP-9104 . 
The TODO comment is removed.

2. Renamed: lock0 - queueLock, available0 - queueContentChangedCondition.  

3. The token cancellation upon removal was introduced in HADOOP-9084, and it 
appeared to be acsidently overwritten by my changes. I returned those changes 
back and also added relevant checking to the test. Thanks for this catch. 

4. I fixed the problem using java.lang.Thread.getState() method: now, first, we 
start the thread if needed, and, 2nd, we check if it already died. 
If the thread is dead, we throw IllegalStateException. 
This way (1) the thread never attempts to start twice, and (2) any attempt to 
add an action to the dead thread is rejected.
I also added into the test the check to verify that this really the case.

The described changes are in patches xxx--d.patch.

 provide unit-test coverage of class 
 org.apache.hadoop.fs.DelegationTokenRenewer.RenewActionT
 --

 Key: HADOOP-9046
 URL: https://issues.apache.org/jira/browse/HADOOP-9046
 Project: Hadoop Common
  Issue Type: Test
Affects Versions: 3.0.0, 2.0.3-alpha, 0.23.6
Reporter: Ivan A. Veselovsky
Assignee: Ivan A. Veselovsky
Priority: Minor
 Attachments: HADOOP-9046-branch-0.23--c.patch, 
 HADOOP-9046-branch-0.23-over-9049.patch, HADOOP-9046-branch-0.23.patch, 
 HADOOP-9046--c.patch, HADOOP-9046--d.patch, HADOOP-9046-over-9049.patch, 
 HADOOP-9046.patch


 The class org.apache.hadoop.fs.DelegationTokenRenewer.RenewActionT has zero 
 coverage in entire cumulative test run. Provide test(s) to cover this class.
 Note: the request submitted to HDFS project because the class likely to be 
 tested by tests in that project.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-9046) provide unit-test coverage of class org.apache.hadoop.fs.DelegationTokenRenewer.RenewActionT

2012-11-29 Thread Ivan A. Veselovsky (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan A. Veselovsky updated HADOOP-9046:
---

Attachment: HADOOP-9046--d.patch

 provide unit-test coverage of class 
 org.apache.hadoop.fs.DelegationTokenRenewer.RenewActionT
 --

 Key: HADOOP-9046
 URL: https://issues.apache.org/jira/browse/HADOOP-9046
 Project: Hadoop Common
  Issue Type: Test
Affects Versions: 3.0.0, 2.0.3-alpha, 0.23.6
Reporter: Ivan A. Veselovsky
Assignee: Ivan A. Veselovsky
Priority: Minor
 Attachments: HADOOP-9046-branch-0.23--c.patch, 
 HADOOP-9046-branch-0.23-over-9049.patch, HADOOP-9046-branch-0.23.patch, 
 HADOOP-9046--c.patch, HADOOP-9046--d.patch, HADOOP-9046-over-9049.patch, 
 HADOOP-9046.patch


 The class org.apache.hadoop.fs.DelegationTokenRenewer.RenewActionT has zero 
 coverage in entire cumulative test run. Provide test(s) to cover this class.
 Note: the request submitted to HDFS project because the class likely to be 
 tested by tests in that project.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-9046) provide unit-test coverage of class org.apache.hadoop.fs.DelegationTokenRenewer.RenewActionT

2012-11-29 Thread Ivan A. Veselovsky (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan A. Veselovsky updated HADOOP-9046:
---

Attachment: HADOOP-9046-branch-0.23--d.patch

the patch HADOOP-9046-branch-0.23--d.patch provides version d of this change 
for branch branch-0.23.

 provide unit-test coverage of class 
 org.apache.hadoop.fs.DelegationTokenRenewer.RenewActionT
 --

 Key: HADOOP-9046
 URL: https://issues.apache.org/jira/browse/HADOOP-9046
 Project: Hadoop Common
  Issue Type: Test
Affects Versions: 3.0.0, 2.0.3-alpha, 0.23.6
Reporter: Ivan A. Veselovsky
Assignee: Ivan A. Veselovsky
Priority: Minor
 Attachments: HADOOP-9046-branch-0.23--c.patch, 
 HADOOP-9046-branch-0.23--d.patch, HADOOP-9046-branch-0.23-over-9049.patch, 
 HADOOP-9046-branch-0.23.patch, HADOOP-9046--c.patch, HADOOP-9046--d.patch, 
 HADOOP-9046-over-9049.patch, HADOOP-9046.patch


 The class org.apache.hadoop.fs.DelegationTokenRenewer.RenewActionT has zero 
 coverage in entire cumulative test run. Provide test(s) to cover this class.
 Note: the request submitted to HDFS project because the class likely to be 
 tested by tests in that project.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HADOOP-9105) FsShell -moreFromLocal erroneously fails

2012-11-29 Thread Daryn Sharp (JIRA)
Daryn Sharp created HADOOP-9105:
---

 Summary: FsShell -moreFromLocal erroneously fails
 Key: HADOOP-9105
 URL: https://issues.apache.org/jira/browse/HADOOP-9105
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 2.0.0-alpha, 0.23.0, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp


The move successfully completes, but then reports error upon trying to delete 
the local source directory even though it succeeded.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8615) EOFException in DecompressorStream.java needs to be more verbose

2012-11-29 Thread thomastechs (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506498#comment-13506498
 ] 

thomastechs commented on HADOOP-8615:
-

Hi, 
Please treat this as a gentle reminder on further procedures.
Thanks, 
Thomas.

 EOFException in DecompressorStream.java needs to be more verbose
 

 Key: HADOOP-8615
 URL: https://issues.apache.org/jira/browse/HADOOP-8615
 Project: Hadoop Common
  Issue Type: Bug
  Components: io
Affects Versions: 0.20.2
Reporter: Jeff Lord
  Labels: patch
 Attachments: HADOOP-8615.patch, HADOOP-8615-release-0.20.2.patch, 
 HADOOP-8615-ver2.patch, HADOOP-8615-ver3.patch


 In ./src/core/org/apache/hadoop/io/compress/DecompressorStream.java
 The following exception should at least pass back the file that it encounters 
 this error in relation to:
   protected void getCompressedData() throws IOException {
 checkStream();
 int n = in.read(buffer, 0, buffer.length);
 if (n == -1) {
   throw new EOFException(Unexpected end of input stream);
 }
 This would help greatly to debug bad/corrupt files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-9105) FsShell -moveFromLocal erroneously fails

2012-11-29 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated HADOOP-9105:
---

Summary: FsShell -moveFromLocal erroneously fails  (was: FsShell 
-moreFromLocal erroneously fails)

 FsShell -moveFromLocal erroneously fails
 

 Key: HADOOP-9105
 URL: https://issues.apache.org/jira/browse/HADOOP-9105
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp

 The move successfully completes, but then reports error upon trying to delete 
 the local source directory even though it succeeded.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-9090) Refactor MetricsSystemImpl to allow for an on-demand publish system

2012-11-29 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506582#comment-13506582
 ] 

Ravi Prakash commented on HADOOP-9090:
--

This could be very useful. Thanks for taking this up Mostafa.

Minor nit. In MetricsSystem.java:
{code}
public abstract void publishMetricsNow();
{code}
IMHO we shouldn't put that method that high in the heirarchy. How would 
implementations of MetricsSystem not concerned with real-time implement this 
method?

Does the description of this JIRA need an update? The classes aren't abstract 
after your patch.

Otherwise code looks good to me.

 Refactor MetricsSystemImpl to allow for an on-demand publish system
 ---

 Key: HADOOP-9090
 URL: https://issues.apache.org/jira/browse/HADOOP-9090
 Project: Hadoop Common
  Issue Type: New Feature
  Components: metrics
Reporter: Mostafa Elhemali
Priority: Minor
 Attachments: HADOOP-9090.2.patch, 
 HADOOP-9090.justEnhanceDefaultImpl.2.patch, 
 HADOOP-9090.justEnhanceDefaultImpl.3.patch, 
 HADOOP-9090.justEnhanceDefaultImpl.4.patch, 
 HADOOP-9090.justEnhanceDefaultImpl.patch, HADOOP-9090.patch


 We have a need to publish metrics out of some short-living processes, which 
 is not really well-suited to the current metrics system implementation which 
 periodically publishes metrics asynchronously (a behavior that works great 
 for long-living processes). Of course I could write my own metrics system, 
 but it seems like such a waste to rewrite all the awesome code currently in 
 the MetricsSystemImpl and supporting classes.
 The way I'm proposing to solve this is to:
 1. Refactor the MetricsSystemImpl class into an abstract base 
 MetricsSystemImpl class (common configuration and other code) and a concrete 
 PeriodicPublishMetricsSystemImpl class (timer thread).
 2. Refactor the MetricsSinkAdapter class into an abstract base 
 MetricsSinkAdapter class (common configuration and other code) and a concrete 
 AsyncMetricsSinkAdapter class (asynchronous publishing using the SinkQueue).
 3. Derive a new simple class OnDemandPublishMetricsSystemImpl from 
 MetricsSystemImpl, that just exposes a synchronous publish() method to do all 
 the work.
 4. Derive a SyncMetricsSinkAdapter class from MetricsSinkAdapter to just 
 synchronously push metrics to the underlying sink.
 Does that sound reasonable? I'll attach the patch with all this coded up and 
 simple tests (could use some polish I guess, but wanted to get everyone's 
 opinion first). Notice that this is somewhat of a breaking change since 
 MetricsSystemImpl is public (although it's marked with 
 InterfaceAudience.Private); if the breaking change is a problem I could just 
 rename the refactored classes so that PeriodicPublishMetricsSystemImpl is 
 still called MetricsSystemImpl (and MetricsSystemImpl - 
 BaseMetricsSystemImpl).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-9090) Refactor MetricsSystemImpl to allow for an on-demand publish system

2012-11-29 Thread Mostafa Elhemali (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Elhemali updated HADOOP-9090:
-

Description: 
Updated description based on feedback:

We have a need to publish metrics out of some short-living processes, which is 
not really well-suited to the current metrics system implementation which 
periodically publishes metrics asynchronously (a behavior that works great for 
long-living processes). Of course I could write my own metrics system, but it 
seems like such a waste to rewrite all the awesome code currently in the 
MetricsSystemImpl and supporting classes.
The way this JIRA solves this problem is adding a new method 
publishMetricsNow() to the MetricsSystemImpl() class, that does a synchronous 
out-of-band push of the metrics from the sources to the sink. I also add a 
method to MetricsSinkAdapter (putMetricsImmediate) to support that change.

  was:
We have a need to publish metrics out of some short-living processes, which is 
not really well-suited to the current metrics system implementation which 
periodically publishes metrics asynchronously (a behavior that works great for 
long-living processes). Of course I could write my own metrics system, but it 
seems like such a waste to rewrite all the awesome code currently in the 
MetricsSystemImpl and supporting classes.

The way I'm proposing to solve this is to:
1. Refactor the MetricsSystemImpl class into an abstract base MetricsSystemImpl 
class (common configuration and other code) and a concrete 
PeriodicPublishMetricsSystemImpl class (timer thread).
2. Refactor the MetricsSinkAdapter class into an abstract base 
MetricsSinkAdapter class (common configuration and other code) and a concrete 
AsyncMetricsSinkAdapter class (asynchronous publishing using the SinkQueue).
3. Derive a new simple class OnDemandPublishMetricsSystemImpl from 
MetricsSystemImpl, that just exposes a synchronous publish() method to do all 
the work.
4. Derive a SyncMetricsSinkAdapter class from MetricsSinkAdapter to just 
synchronously push metrics to the underlying sink.

Does that sound reasonable? I'll attach the patch with all this coded up and 
simple tests (could use some polish I guess, but wanted to get everyone's 
opinion first). Notice that this is somewhat of a breaking change since 
MetricsSystemImpl is public (although it's marked with 
InterfaceAudience.Private); if the breaking change is a problem I could just 
rename the refactored classes so that PeriodicPublishMetricsSystemImpl is still 
called MetricsSystemImpl (and MetricsSystemImpl - BaseMetricsSystemImpl).


 Refactor MetricsSystemImpl to allow for an on-demand publish system
 ---

 Key: HADOOP-9090
 URL: https://issues.apache.org/jira/browse/HADOOP-9090
 Project: Hadoop Common
  Issue Type: New Feature
  Components: metrics
Reporter: Mostafa Elhemali
Priority: Minor
 Attachments: HADOOP-9090.2.patch, 
 HADOOP-9090.justEnhanceDefaultImpl.2.patch, 
 HADOOP-9090.justEnhanceDefaultImpl.3.patch, 
 HADOOP-9090.justEnhanceDefaultImpl.4.patch, 
 HADOOP-9090.justEnhanceDefaultImpl.patch, HADOOP-9090.patch


 Updated description based on feedback:
 We have a need to publish metrics out of some short-living processes, which 
 is not really well-suited to the current metrics system implementation which 
 periodically publishes metrics asynchronously (a behavior that works great 
 for long-living processes). Of course I could write my own metrics system, 
 but it seems like such a waste to rewrite all the awesome code currently in 
 the MetricsSystemImpl and supporting classes.
 The way this JIRA solves this problem is adding a new method 
 publishMetricsNow() to the MetricsSystemImpl() class, that does a synchronous 
 out-of-band push of the metrics from the sources to the sink. I also add a 
 method to MetricsSinkAdapter (putMetricsImmediate) to support that change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8615) EOFException in DecompressorStream.java needs to be more verbose

2012-11-29 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506611#comment-13506611
 ] 

Harsh J commented on HADOOP-8615:
-

Hi,

I noted that this changes the CompressionCodec interface, which would make it 
an incompatible change for its users (as older code, downstream, would fail to 
compile as they now may be missing a few method implementations).

Is it absolutely necessary to break compatibility to have just some information 
over this exception?

 EOFException in DecompressorStream.java needs to be more verbose
 

 Key: HADOOP-8615
 URL: https://issues.apache.org/jira/browse/HADOOP-8615
 Project: Hadoop Common
  Issue Type: Bug
  Components: io
Affects Versions: 0.20.2
Reporter: Jeff Lord
  Labels: patch
 Attachments: HADOOP-8615.patch, HADOOP-8615-release-0.20.2.patch, 
 HADOOP-8615-ver2.patch, HADOOP-8615-ver3.patch


 In ./src/core/org/apache/hadoop/io/compress/DecompressorStream.java
 The following exception should at least pass back the file that it encounters 
 this error in relation to:
   protected void getCompressedData() throws IOException {
 checkStream();
 int n = in.read(buffer, 0, buffer.length);
 if (n == -1) {
   throw new EOFException(Unexpected end of input stream);
 }
 This would help greatly to debug bad/corrupt files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-9090) Refactor MetricsSystemImpl to allow for an on-demand publish system

2012-11-29 Thread Mostafa Elhemali (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Elhemali updated HADOOP-9090:
-

Attachment: HADOOP-9090.justEnhanceDefaultImpl.5.patch

Thanks Ravi for the feedback - I knew it was a bit controversial to put this 
method in the MetricsSystem interface and require it from other systems, but I 
figured it's the only way for outside customer to really take advantage of this 
since MetricsSystemImpl is not intended for out-of-Hadoop consumption. Having 
said that, my immediate need would be met without putting it in the interface 
so I took that out for now (we can always add it in another explicit JIRA if 
needed).

I've also added a new multi-threaded test in the new patch to make sure 
everything is alright there.

 Refactor MetricsSystemImpl to allow for an on-demand publish system
 ---

 Key: HADOOP-9090
 URL: https://issues.apache.org/jira/browse/HADOOP-9090
 Project: Hadoop Common
  Issue Type: New Feature
  Components: metrics
Reporter: Mostafa Elhemali
Priority: Minor
 Attachments: HADOOP-9090.2.patch, 
 HADOOP-9090.justEnhanceDefaultImpl.2.patch, 
 HADOOP-9090.justEnhanceDefaultImpl.3.patch, 
 HADOOP-9090.justEnhanceDefaultImpl.4.patch, 
 HADOOP-9090.justEnhanceDefaultImpl.5.patch, 
 HADOOP-9090.justEnhanceDefaultImpl.patch, HADOOP-9090.patch


 Updated description based on feedback:
 We have a need to publish metrics out of some short-living processes, which 
 is not really well-suited to the current metrics system implementation which 
 periodically publishes metrics asynchronously (a behavior that works great 
 for long-living processes). Of course I could write my own metrics system, 
 but it seems like such a waste to rewrite all the awesome code currently in 
 the MetricsSystemImpl and supporting classes.
 The way this JIRA solves this problem is adding a new method 
 publishMetricsNow() to the MetricsSystemImpl() class, that does a synchronous 
 out-of-band push of the metrics from the sources to the sink. I also add a 
 method to MetricsSinkAdapter (putMetricsImmediate) to support that change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-8615) EOFException in DecompressorStream.java needs to be more verbose

2012-11-29 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HADOOP-8615:


Target Version/s: 3.0.0  (was: 2.0.0-alpha)

 EOFException in DecompressorStream.java needs to be more verbose
 

 Key: HADOOP-8615
 URL: https://issues.apache.org/jira/browse/HADOOP-8615
 Project: Hadoop Common
  Issue Type: Bug
  Components: io
Affects Versions: 0.20.2
Reporter: Jeff Lord
  Labels: patch
 Attachments: HADOOP-8615.patch, HADOOP-8615-release-0.20.2.patch, 
 HADOOP-8615-ver2.patch, HADOOP-8615-ver3.patch


 In ./src/core/org/apache/hadoop/io/compress/DecompressorStream.java
 The following exception should at least pass back the file that it encounters 
 this error in relation to:
   protected void getCompressedData() throws IOException {
 checkStream();
 int n = in.read(buffer, 0, buffer.length);
 if (n == -1) {
   throw new EOFException(Unexpected end of input stream);
 }
 This would help greatly to debug bad/corrupt files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-9090) Refactor MetricsSystemImpl to allow for an on-demand publish system

2012-11-29 Thread Luke Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506654#comment-13506654
 ] 

Luke Lu commented on HADOOP-9090:
-

Adding a publishMetricsNow method to the MetricsSystem is reasonable, as the 
interface is considered Evolving and the requirement has universal utility (I 
actually thought about adding it in the beginning but there was no such 
requirement then).

bq. I figured the way you had it may end up in race conditions if multiple 
threads are calling publishMetricsNow() at the same time.

The _sketch_ was meant to be simple and the race is considered harmless: it's 
ok to potentially exit before one of the metrics buffer that's almost the same 
time with the last one is flushed. OTOH, if you want to wait for individual 
metrics buffer you can do the following without a new wrapper:
{code}
// in putMetricsImediate
synchronized(buffer) {
  buffer.wait(oobTimeout);
}

// in consume
synchronized(buffer) {
  buffer.notify();
}
{code}

 Refactor MetricsSystemImpl to allow for an on-demand publish system
 ---

 Key: HADOOP-9090
 URL: https://issues.apache.org/jira/browse/HADOOP-9090
 Project: Hadoop Common
  Issue Type: New Feature
  Components: metrics
Reporter: Mostafa Elhemali
Priority: Minor
 Attachments: HADOOP-9090.2.patch, 
 HADOOP-9090.justEnhanceDefaultImpl.2.patch, 
 HADOOP-9090.justEnhanceDefaultImpl.3.patch, 
 HADOOP-9090.justEnhanceDefaultImpl.4.patch, 
 HADOOP-9090.justEnhanceDefaultImpl.5.patch, 
 HADOOP-9090.justEnhanceDefaultImpl.patch, HADOOP-9090.patch


 Updated description based on feedback:
 We have a need to publish metrics out of some short-living processes, which 
 is not really well-suited to the current metrics system implementation which 
 periodically publishes metrics asynchronously (a behavior that works great 
 for long-living processes). Of course I could write my own metrics system, 
 but it seems like such a waste to rewrite all the awesome code currently in 
 the MetricsSystemImpl and supporting classes.
 The way this JIRA solves this problem is adding a new method 
 publishMetricsNow() to the MetricsSystemImpl() class, that does a synchronous 
 out-of-band push of the metrics from the sources to the sink. I also add a 
 method to MetricsSinkAdapter (putMetricsImmediate) to support that change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-9090) Refactor MetricsSystemImpl to allow for an on-demand publish system

2012-11-29 Thread Mostafa Elhemali (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506686#comment-13506686
 ] 

Mostafa Elhemali commented on HADOOP-9090:
--

*Point about how to synchronize putMetricsImmediate*
Thanks Luke. Yeah I considered waiting on the buffer itself before creating the 
wrapper, but there are a couple of reasons I didn't end up doing that:
1. (Main reason) The sink doesn't own the buffer object, so it doesn't know who 
else is waiting on it or notifying it. Seems wrong to presume to wait on it.
2. Object.wait(timeout) doesn't return the result of the wait, so I wouldn't 
know if that succeeded or failed without additional complex logic.

As for the race being harmless: I'm not sure it's that harmless. For all we 
know the buffers that were just processed from the queue were from ages ago, 
and the values in the new buffer are completely different. I'd much rather play 
it safe and give it an honest attempt to publish what I've just been given.

So, for the reasons above I'd rather go with the wrapper despite the added code 
complexity.

*Point about putting the method in the interface*
OK since me  Luke are two votes to put the method in the interface, and Luke 
made a good point about the interface being evolving, I'll put the method back 
into the interface in a subsequent patch unless anyone else objects (or Ravi 
presses the point with other reasons). Thanks all.

 Refactor MetricsSystemImpl to allow for an on-demand publish system
 ---

 Key: HADOOP-9090
 URL: https://issues.apache.org/jira/browse/HADOOP-9090
 Project: Hadoop Common
  Issue Type: New Feature
  Components: metrics
Reporter: Mostafa Elhemali
Priority: Minor
 Attachments: HADOOP-9090.2.patch, 
 HADOOP-9090.justEnhanceDefaultImpl.2.patch, 
 HADOOP-9090.justEnhanceDefaultImpl.3.patch, 
 HADOOP-9090.justEnhanceDefaultImpl.4.patch, 
 HADOOP-9090.justEnhanceDefaultImpl.5.patch, 
 HADOOP-9090.justEnhanceDefaultImpl.patch, HADOOP-9090.patch


 Updated description based on feedback:
 We have a need to publish metrics out of some short-living processes, which 
 is not really well-suited to the current metrics system implementation which 
 periodically publishes metrics asynchronously (a behavior that works great 
 for long-living processes). Of course I could write my own metrics system, 
 but it seems like such a waste to rewrite all the awesome code currently in 
 the MetricsSystemImpl and supporting classes.
 The way this JIRA solves this problem is adding a new method 
 publishMetricsNow() to the MetricsSystemImpl() class, that does a synchronous 
 out-of-band push of the metrics from the sources to the sink. I also add a 
 method to MetricsSinkAdapter (putMetricsImmediate) to support that change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-9103) UTF8 class does not properly decode Unicode characters outside the basic multilingual plane

2012-11-29 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506727#comment-13506727
 ] 

Todd Lipcon commented on HADOOP-9103:
-

bq. Nice. Should we test for rejecting 5-byte and 6-byte sequences, since I 
notice you added some code to do that?

I added a test for an invalid sequence. I didn't think it was necessary to add 
a separate test for a 5-byte sequence, since it would trigger the same 
invalid code path. Got an example hex sequence you think we should test 
against?

bq. I'm also a little scared by the idea that we have differently-encoded 
byte[] running around for the same file name strings. We have to be very 
careful about this. 
bq. ...However, we could do that in a separate JIRA, not here
Agreed. Let's open a separate HDFS JIRA and use this for the Common-side fix. 
This patch alone was enough to successfully restart a NN which had an open file 
with a 4-byte codepoint.

 UTF8 class does not properly decode Unicode characters outside the basic 
 multilingual plane
 ---

 Key: HADOOP-9103
 URL: https://issues.apache.org/jira/browse/HADOOP-9103
 Project: Hadoop Common
  Issue Type: Bug
  Components: io
Affects Versions: 0.20.1
 Environment: SUSE LINUX
Reporter: yixiaohua
Assignee: Todd Lipcon
 Attachments: FSImage.java, hadoop-9103.txt, ProblemString.txt, 
 TestUTF8AndStringGetBytes.java, TestUTF8AndStringGetBytes.java

   Original Estimate: 12h
  Remaining Estimate: 12h

 this the log information  of the  exception  from the SecondaryNameNode: 
 2012-03-28 00:48:42,553 ERROR 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: 
 java.io.IOException: Found lease for
  non-existent file 
 /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/@???
 ??tor.qzone.qq.com/keypart-00174
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFilesUnderConstruction(FSImage.java:1211)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:959)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:589)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$000(SecondaryNameNode.java:473)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:350)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:314)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:225)
 at java.lang.Thread.run(Thread.java:619)
 this is the log information  about the file from namenode:
 2012-03-28 00:32:26,528 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=boss,boss 
 ip=/10.131.16.34cmd=create  
 src=/user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/
   @?tor.qzone.qq.com/keypart-00174 dst=null
 perm=boss:boss:rw-r--r--
 2012-03-28 00:37:42,387 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 NameSystem.allocateBlock: 
 /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/
   @?tor.qzone.qq.com/keypart-00174. 
 blk_2751836614265659170_184668759
 2012-03-28 00:37:42,696 INFO org.apache.hadoop.hdfs.StateChange: DIR* 
 NameSystem.completeFile: file 
 /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/
   @?tor.qzone.qq.com/keypart-00174 is closed by 
 DFSClient_attempt_201203271849_0016_r_000174_0
 2012-03-28 00:37:50,315 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=boss,boss 
 ip=/10.131.16.34cmd=rename  
 src=/user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/
   @?tor.qzone.qq.com/keypart-00174 
 dst=/user/boss/pgv/fission/task16/split/  @?
 tor.qzone.qq.com/keypart-00174  perm=boss:boss:rw-r--r--
 after check the code that save FSImage,I found there are a problem that maybe 
 a bug of HDFS Code,I past below:
 -this is the saveFSImage method  in  FSImage.java, I make some 
 mark at the problem code
 /**
* Save the contents of the FS image to the file.
*/
   void saveFSImage(File newFile) throws IOException {
 FSNamesystem fsNamesys = FSNamesystem.getFSNamesystem();
 FSDirectory fsDir = fsNamesys.dir;
 long startTime = FSNamesystem.now();
 //
 // Write out data
 //
 DataOutputStream out = new DataOutputStream(
 new BufferedOutputStream(

[jira] [Commented] (HADOOP-9090) Refactor MetricsSystemImpl to allow for an on-demand publish system

2012-11-29 Thread Luke Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506725#comment-13506725
 ] 

Luke Lu commented on HADOOP-9090:
-

Good points, Mostafa. I should know that MetricsBuffer though immutable can 
be shared. I guess it was a kneejerk reaction java verbosity :)

Anyway the new logic looks solid to me. Thanks!

 Refactor MetricsSystemImpl to allow for an on-demand publish system
 ---

 Key: HADOOP-9090
 URL: https://issues.apache.org/jira/browse/HADOOP-9090
 Project: Hadoop Common
  Issue Type: New Feature
  Components: metrics
Reporter: Mostafa Elhemali
Priority: Minor
 Attachments: HADOOP-9090.2.patch, 
 HADOOP-9090.justEnhanceDefaultImpl.2.patch, 
 HADOOP-9090.justEnhanceDefaultImpl.3.patch, 
 HADOOP-9090.justEnhanceDefaultImpl.4.patch, 
 HADOOP-9090.justEnhanceDefaultImpl.5.patch, 
 HADOOP-9090.justEnhanceDefaultImpl.patch, HADOOP-9090.patch


 Updated description based on feedback:
 We have a need to publish metrics out of some short-living processes, which 
 is not really well-suited to the current metrics system implementation which 
 periodically publishes metrics asynchronously (a behavior that works great 
 for long-living processes). Of course I could write my own metrics system, 
 but it seems like such a waste to rewrite all the awesome code currently in 
 the MetricsSystemImpl and supporting classes.
 The way this JIRA solves this problem is adding a new method 
 publishMetricsNow() to the MetricsSystemImpl() class, that does a synchronous 
 out-of-band push of the metrics from the sources to the sink. I also add a 
 method to MetricsSinkAdapter (putMetricsImmediate) to support that change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HADOOP-9106) Allow configuration of IPC connect timeout

2012-11-29 Thread Todd Lipcon (JIRA)
Todd Lipcon created HADOOP-9106:
---

 Summary: Allow configuration of IPC connect timeout
 Key: HADOOP-9106
 URL: https://issues.apache.org/jira/browse/HADOOP-9106
 Project: Hadoop Common
  Issue Type: Improvement
  Components: ipc
Affects Versions: 3.0.0
Reporter: Todd Lipcon


Currently the connection timeout in Client.setupConnection() is hard coded to 
20seconds. This is unreasonable in some scenarios, such as HA failover, if we 
want a faster failover time. We should allow this to be configured per-client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-9090) Refactor MetricsSystemImpl to allow for an on-demand publish system

2012-11-29 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506759#comment-13506759
 ] 

Ravi Prakash commented on HADOOP-9090:
--

Thanks Luke and Mostafa
bq. and the requirement has universal utility 
Agreed. But it also places restrictions on how scalable the system can be. 

I'm flexible with where you want to introduce that method. Even then, I would 
like the behavior of that method javadoc'ed explicitly stating what the 
expectation is if a MetricsSystem cannot provide real-time guarantees.


 Refactor MetricsSystemImpl to allow for an on-demand publish system
 ---

 Key: HADOOP-9090
 URL: https://issues.apache.org/jira/browse/HADOOP-9090
 Project: Hadoop Common
  Issue Type: New Feature
  Components: metrics
Reporter: Mostafa Elhemali
Priority: Minor
 Attachments: HADOOP-9090.2.patch, 
 HADOOP-9090.justEnhanceDefaultImpl.2.patch, 
 HADOOP-9090.justEnhanceDefaultImpl.3.patch, 
 HADOOP-9090.justEnhanceDefaultImpl.4.patch, 
 HADOOP-9090.justEnhanceDefaultImpl.5.patch, 
 HADOOP-9090.justEnhanceDefaultImpl.patch, HADOOP-9090.patch


 Updated description based on feedback:
 We have a need to publish metrics out of some short-living processes, which 
 is not really well-suited to the current metrics system implementation which 
 periodically publishes metrics asynchronously (a behavior that works great 
 for long-living processes). Of course I could write my own metrics system, 
 but it seems like such a waste to rewrite all the awesome code currently in 
 the MetricsSystemImpl and supporting classes.
 The way this JIRA solves this problem is adding a new method 
 publishMetricsNow() to the MetricsSystemImpl() class, that does a synchronous 
 out-of-band push of the metrics from the sources to the sink. I also add a 
 method to MetricsSinkAdapter (putMetricsImmediate) to support that change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-9090) Refactor MetricsSystemImpl to allow for an on-demand publish system

2012-11-29 Thread Mostafa Elhemali (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mostafa Elhemali updated HADOOP-9090:
-

Attachment: HADOOP-9090.justEnhanceDefaultImpl.6.patch

That's a fair request Ravi. I took a shot at documenting that in Javadoc on the 
method - does the wording look reasonable?


  /**
   * Requests an immediate publish of all metrics from sources to sinks.
   * 
   * This is a soft request: the expectation is that a best effort will be
   * done to synchronously snapshot the metrics from all the sources and put
   * them in all the sinks (including flushing the sinks) before returning to
   * the caller. If this can't be accomplished in reasonable time it's OK to
   * return to the caller before everything is done. 
   */


 Refactor MetricsSystemImpl to allow for an on-demand publish system
 ---

 Key: HADOOP-9090
 URL: https://issues.apache.org/jira/browse/HADOOP-9090
 Project: Hadoop Common
  Issue Type: New Feature
  Components: metrics
Reporter: Mostafa Elhemali
Priority: Minor
 Attachments: HADOOP-9090.2.patch, 
 HADOOP-9090.justEnhanceDefaultImpl.2.patch, 
 HADOOP-9090.justEnhanceDefaultImpl.3.patch, 
 HADOOP-9090.justEnhanceDefaultImpl.4.patch, 
 HADOOP-9090.justEnhanceDefaultImpl.5.patch, 
 HADOOP-9090.justEnhanceDefaultImpl.6.patch, 
 HADOOP-9090.justEnhanceDefaultImpl.patch, HADOOP-9090.patch


 Updated description based on feedback:
 We have a need to publish metrics out of some short-living processes, which 
 is not really well-suited to the current metrics system implementation which 
 periodically publishes metrics asynchronously (a behavior that works great 
 for long-living processes). Of course I could write my own metrics system, 
 but it seems like such a waste to rewrite all the awesome code currently in 
 the MetricsSystemImpl and supporting classes.
 The way this JIRA solves this problem is adding a new method 
 publishMetricsNow() to the MetricsSystemImpl() class, that does a synchronous 
 out-of-band push of the metrics from the sources to the sink. I also add a 
 method to MetricsSinkAdapter (putMetricsImmediate) to support that change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HADOOP-9107) Hadoop IPC client eats InterruptedException and sets interrupt on the thread which is not documented

2012-11-29 Thread Hari Shreedharan (JIRA)
Hari Shreedharan created HADOOP-9107:


 Summary: Hadoop IPC client eats InterruptedException and sets 
interrupt on the thread which is not documented
 Key: HADOOP-9107
 URL: https://issues.apache.org/jira/browse/HADOOP-9107
 Project: Hadoop Common
  Issue Type: Bug
  Components: ipc
Affects Versions: 2.0.2-alpha
Reporter: Hari Shreedharan


This code in Client.java looks fishy:

{code}
  public Writable call(RPC.RpcKind rpcKind, Writable rpcRequest,
  ConnectionId remoteId) throws InterruptedException, IOException {
Call call = new Call(rpcKind, rpcRequest);
Connection connection = getConnection(remoteId, call);
connection.sendParam(call); // send the parameter
boolean interrupted = false;
synchronized (call) {
  while (!call.done) {
try {
  call.wait();   // wait for the result
} catch (InterruptedException ie) {
  // save the fact that we were interrupted
  interrupted = true;
}
  }

  if (interrupted) {
// set the interrupt flag now that we are done waiting
Thread.currentThread().interrupt();
  }

  if (call.error != null) {
if (call.error instanceof RemoteException) {
  call.error.fillInStackTrace();
  throw call.error;
} else { // local exception
  InetSocketAddress address = connection.getRemoteAddress();
  throw NetUtils.wrapException(address.getHostName(),
  address.getPort(),
  NetUtils.getHostname(),
  0,
  call.error);
}
  } else {
return call.getRpcResult();
  }
}
  }
{code}

Blocking calls are expected to throw InterruptedException if that is 
interrupted. Also it seems like this method waits on the call objects even if 
it  is interrupted. Currently, this method does not throw an 
InterruptedException, nor is it documented that this method interrupts the 
thread calling it. If it is interrupted, this method should still throw 
InterruptedException, it should not matter if the call was successful or not.

This is a major issue for clients which do not call this directly, but call 
HDFS client API methods to write to HDFS, which may be interrupted by the 
client due to timeouts, but does not throw InterruptedException. Any HDFS 
client calls can interrupt the thread but it is not documented anywhere. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-9107) Hadoop IPC client eats InterruptedException and sets interrupt on the thread which is not documented

2012-11-29 Thread Hari Shreedharan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Shreedharan updated HADOOP-9107:
-

Affects Version/s: 1.1.0

 Hadoop IPC client eats InterruptedException and sets interrupt on the thread 
 which is not documented
 

 Key: HADOOP-9107
 URL: https://issues.apache.org/jira/browse/HADOOP-9107
 Project: Hadoop Common
  Issue Type: Bug
  Components: ipc
Affects Versions: 1.1.0, 2.0.2-alpha
Reporter: Hari Shreedharan

 This code in Client.java looks fishy:
 {code}
   public Writable call(RPC.RpcKind rpcKind, Writable rpcRequest,
   ConnectionId remoteId) throws InterruptedException, IOException {
 Call call = new Call(rpcKind, rpcRequest);
 Connection connection = getConnection(remoteId, call);
 connection.sendParam(call); // send the parameter
 boolean interrupted = false;
 synchronized (call) {
   while (!call.done) {
 try {
   call.wait();   // wait for the result
 } catch (InterruptedException ie) {
   // save the fact that we were interrupted
   interrupted = true;
 }
   }
   if (interrupted) {
 // set the interrupt flag now that we are done waiting
 Thread.currentThread().interrupt();
   }
   if (call.error != null) {
 if (call.error instanceof RemoteException) {
   call.error.fillInStackTrace();
   throw call.error;
 } else { // local exception
   InetSocketAddress address = connection.getRemoteAddress();
   throw NetUtils.wrapException(address.getHostName(),
   address.getPort(),
   NetUtils.getHostname(),
   0,
   call.error);
 }
   } else {
 return call.getRpcResult();
   }
 }
   }
 {code}
 Blocking calls are expected to throw InterruptedException if that is 
 interrupted. Also it seems like this method waits on the call objects even if 
 it  is interrupted. Currently, this method does not throw an 
 InterruptedException, nor is it documented that this method interrupts the 
 thread calling it. If it is interrupted, this method should still throw 
 InterruptedException, it should not matter if the call was successful or not.
 This is a major issue for clients which do not call this directly, but call 
 HDFS client API methods to write to HDFS, which may be interrupted by the 
 client due to timeouts, but does not throw InterruptedException. Any HDFS 
 client calls can interrupt the thread but it is not documented anywhere. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-9090) Refactor MetricsSystemImpl to allow for an on-demand publish system

2012-11-29 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506794#comment-13506794
 ] 

Ravi Prakash commented on HADOOP-9090:
--

Thanks Mostafa! That works for me. +1 from my side.

 Refactor MetricsSystemImpl to allow for an on-demand publish system
 ---

 Key: HADOOP-9090
 URL: https://issues.apache.org/jira/browse/HADOOP-9090
 Project: Hadoop Common
  Issue Type: New Feature
  Components: metrics
Reporter: Mostafa Elhemali
Priority: Minor
 Attachments: HADOOP-9090.2.patch, 
 HADOOP-9090.justEnhanceDefaultImpl.2.patch, 
 HADOOP-9090.justEnhanceDefaultImpl.3.patch, 
 HADOOP-9090.justEnhanceDefaultImpl.4.patch, 
 HADOOP-9090.justEnhanceDefaultImpl.5.patch, 
 HADOOP-9090.justEnhanceDefaultImpl.6.patch, 
 HADOOP-9090.justEnhanceDefaultImpl.patch, HADOOP-9090.patch


 Updated description based on feedback:
 We have a need to publish metrics out of some short-living processes, which 
 is not really well-suited to the current metrics system implementation which 
 periodically publishes metrics asynchronously (a behavior that works great 
 for long-living processes). Of course I could write my own metrics system, 
 but it seems like such a waste to rewrite all the awesome code currently in 
 the MetricsSystemImpl and supporting classes.
 The way this JIRA solves this problem is adding a new method 
 publishMetricsNow() to the MetricsSystemImpl() class, that does a synchronous 
 out-of-band push of the metrics from the sources to the sink. I also add a 
 method to MetricsSinkAdapter (putMetricsImmediate) to support that change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-9107) Hadoop IPC client eats InterruptedException and sets interrupt on the thread which is not documented

2012-11-29 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506853#comment-13506853
 ] 

Karthik Kambatla commented on HADOOP-9107:
--

The things to fix look like:
# document that the method eats up {{InterruptedException}}
# break after setting interrupted to true in the catch block
# throw appropriate exception in the {{else}} branch of {{if (call.error != 
null)}}

 Hadoop IPC client eats InterruptedException and sets interrupt on the thread 
 which is not documented
 

 Key: HADOOP-9107
 URL: https://issues.apache.org/jira/browse/HADOOP-9107
 Project: Hadoop Common
  Issue Type: Bug
  Components: ipc
Affects Versions: 1.1.0, 2.0.2-alpha
Reporter: Hari Shreedharan

 This code in Client.java looks fishy:
 {code}
   public Writable call(RPC.RpcKind rpcKind, Writable rpcRequest,
   ConnectionId remoteId) throws InterruptedException, IOException {
 Call call = new Call(rpcKind, rpcRequest);
 Connection connection = getConnection(remoteId, call);
 connection.sendParam(call); // send the parameter
 boolean interrupted = false;
 synchronized (call) {
   while (!call.done) {
 try {
   call.wait();   // wait for the result
 } catch (InterruptedException ie) {
   // save the fact that we were interrupted
   interrupted = true;
 }
   }
   if (interrupted) {
 // set the interrupt flag now that we are done waiting
 Thread.currentThread().interrupt();
   }
   if (call.error != null) {
 if (call.error instanceof RemoteException) {
   call.error.fillInStackTrace();
   throw call.error;
 } else { // local exception
   InetSocketAddress address = connection.getRemoteAddress();
   throw NetUtils.wrapException(address.getHostName(),
   address.getPort(),
   NetUtils.getHostname(),
   0,
   call.error);
 }
   } else {
 return call.getRpcResult();
   }
 }
   }
 {code}
 Blocking calls are expected to throw InterruptedException if that is 
 interrupted. Also it seems like this method waits on the call objects even if 
 it  is interrupted. Currently, this method does not throw an 
 InterruptedException, nor is it documented that this method interrupts the 
 thread calling it. If it is interrupted, this method should still throw 
 InterruptedException, it should not matter if the call was successful or not.
 This is a major issue for clients which do not call this directly, but call 
 HDFS client API methods to write to HDFS, which may be interrupted by the 
 client due to timeouts, but does not throw InterruptedException. Any HDFS 
 client calls can interrupt the thread but it is not documented anywhere. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-9103) UTF8 class does not properly decode Unicode characters outside the basic multilingual plane

2012-11-29 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506855#comment-13506855
 ] 

Colin Patrick McCabe commented on HADOOP-9103:
--

bq. Got an example hex sequence you think we should test against?

Here is a 5-byte sequence that used to be valid UTF-8, before the 4-byte max 
rule was put into place:

{{0xF8 0x88 0x80 0x80 0x80}}

Source: http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt

 UTF8 class does not properly decode Unicode characters outside the basic 
 multilingual plane
 ---

 Key: HADOOP-9103
 URL: https://issues.apache.org/jira/browse/HADOOP-9103
 Project: Hadoop Common
  Issue Type: Bug
  Components: io
Affects Versions: 0.20.1
 Environment: SUSE LINUX
Reporter: yixiaohua
Assignee: Todd Lipcon
 Attachments: FSImage.java, hadoop-9103.txt, ProblemString.txt, 
 TestUTF8AndStringGetBytes.java, TestUTF8AndStringGetBytes.java

   Original Estimate: 12h
  Remaining Estimate: 12h

 this the log information  of the  exception  from the SecondaryNameNode: 
 2012-03-28 00:48:42,553 ERROR 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: 
 java.io.IOException: Found lease for
  non-existent file 
 /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/@???
 ??tor.qzone.qq.com/keypart-00174
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFilesUnderConstruction(FSImage.java:1211)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:959)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:589)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$000(SecondaryNameNode.java:473)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:350)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:314)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:225)
 at java.lang.Thread.run(Thread.java:619)
 this is the log information  about the file from namenode:
 2012-03-28 00:32:26,528 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=boss,boss 
 ip=/10.131.16.34cmd=create  
 src=/user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/
   @?tor.qzone.qq.com/keypart-00174 dst=null
 perm=boss:boss:rw-r--r--
 2012-03-28 00:37:42,387 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 NameSystem.allocateBlock: 
 /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/
   @?tor.qzone.qq.com/keypart-00174. 
 blk_2751836614265659170_184668759
 2012-03-28 00:37:42,696 INFO org.apache.hadoop.hdfs.StateChange: DIR* 
 NameSystem.completeFile: file 
 /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/
   @?tor.qzone.qq.com/keypart-00174 is closed by 
 DFSClient_attempt_201203271849_0016_r_000174_0
 2012-03-28 00:37:50,315 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=boss,boss 
 ip=/10.131.16.34cmd=rename  
 src=/user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/
   @?tor.qzone.qq.com/keypart-00174 
 dst=/user/boss/pgv/fission/task16/split/  @?
 tor.qzone.qq.com/keypart-00174  perm=boss:boss:rw-r--r--
 after check the code that save FSImage,I found there are a problem that maybe 
 a bug of HDFS Code,I past below:
 -this is the saveFSImage method  in  FSImage.java, I make some 
 mark at the problem code
 /**
* Save the contents of the FS image to the file.
*/
   void saveFSImage(File newFile) throws IOException {
 FSNamesystem fsNamesys = FSNamesystem.getFSNamesystem();
 FSDirectory fsDir = fsNamesys.dir;
 long startTime = FSNamesystem.now();
 //
 // Write out data
 //
 DataOutputStream out = new DataOutputStream(
 new BufferedOutputStream(
  new 
 FileOutputStream(newFile)));
 try {
   .
 
   // save the rest of the nodes
   saveImage(strbuf, 0, fsDir.rootDir, out);--problem
   fsNamesys.saveFilesUnderConstruction(out);--problem  
 detail is below
   strbuf = null;
 } finally {
   out.close();
 }
 LOG.info(Image file of size  + newFile.length() +  saved in  
 + (FSNamesystem.now() - 

[jira] [Updated] (HADOOP-9103) UTF8 class does not properly decode Unicode characters outside the basic multilingual plane

2012-11-29 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HADOOP-9103:


Attachment: hadoop-9103.txt

Attached patch includes the test sequence Colin provided above.

 UTF8 class does not properly decode Unicode characters outside the basic 
 multilingual plane
 ---

 Key: HADOOP-9103
 URL: https://issues.apache.org/jira/browse/HADOOP-9103
 Project: Hadoop Common
  Issue Type: Bug
  Components: io
Affects Versions: 0.20.1
 Environment: SUSE LINUX
Reporter: yixiaohua
Assignee: Todd Lipcon
 Attachments: FSImage.java, hadoop-9103.txt, hadoop-9103.txt, 
 ProblemString.txt, TestUTF8AndStringGetBytes.java, 
 TestUTF8AndStringGetBytes.java

   Original Estimate: 12h
  Remaining Estimate: 12h

 this the log information  of the  exception  from the SecondaryNameNode: 
 2012-03-28 00:48:42,553 ERROR 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: 
 java.io.IOException: Found lease for
  non-existent file 
 /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/@???
 ??tor.qzone.qq.com/keypart-00174
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFilesUnderConstruction(FSImage.java:1211)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:959)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:589)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$000(SecondaryNameNode.java:473)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:350)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:314)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:225)
 at java.lang.Thread.run(Thread.java:619)
 this is the log information  about the file from namenode:
 2012-03-28 00:32:26,528 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=boss,boss 
 ip=/10.131.16.34cmd=create  
 src=/user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/
   @?tor.qzone.qq.com/keypart-00174 dst=null
 perm=boss:boss:rw-r--r--
 2012-03-28 00:37:42,387 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 NameSystem.allocateBlock: 
 /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/
   @?tor.qzone.qq.com/keypart-00174. 
 blk_2751836614265659170_184668759
 2012-03-28 00:37:42,696 INFO org.apache.hadoop.hdfs.StateChange: DIR* 
 NameSystem.completeFile: file 
 /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/
   @?tor.qzone.qq.com/keypart-00174 is closed by 
 DFSClient_attempt_201203271849_0016_r_000174_0
 2012-03-28 00:37:50,315 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=boss,boss 
 ip=/10.131.16.34cmd=rename  
 src=/user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/
   @?tor.qzone.qq.com/keypart-00174 
 dst=/user/boss/pgv/fission/task16/split/  @?
 tor.qzone.qq.com/keypart-00174  perm=boss:boss:rw-r--r--
 after check the code that save FSImage,I found there are a problem that maybe 
 a bug of HDFS Code,I past below:
 -this is the saveFSImage method  in  FSImage.java, I make some 
 mark at the problem code
 /**
* Save the contents of the FS image to the file.
*/
   void saveFSImage(File newFile) throws IOException {
 FSNamesystem fsNamesys = FSNamesystem.getFSNamesystem();
 FSDirectory fsDir = fsNamesys.dir;
 long startTime = FSNamesystem.now();
 //
 // Write out data
 //
 DataOutputStream out = new DataOutputStream(
 new BufferedOutputStream(
  new 
 FileOutputStream(newFile)));
 try {
   .
 
   // save the rest of the nodes
   saveImage(strbuf, 0, fsDir.rootDir, out);--problem
   fsNamesys.saveFilesUnderConstruction(out);--problem  
 detail is below
   strbuf = null;
 } finally {
   out.close();
 }
 LOG.info(Image file of size  + newFile.length() +  saved in  
 + (FSNamesystem.now() - startTime)/1000 +  seconds.);
   }
  /**
* Save file tree image starting from the given root.
* This is a recursive procedure, which first saves all children of
* a current directory and then moves inside the 

[jira] [Commented] (HADOOP-9103) UTF8 class does not properly decode Unicode characters outside the basic multilingual plane

2012-11-29 Thread Andy Isaacson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506894#comment-13506894
 ] 

Andy Isaacson commented on HADOOP-9103:
---

bq. +   * This is a regression est for HDFS-3307.

test, not est.  Since this jira has moved to HADOOP-9103, update the reference.

{code}
+ * Note that this decodes UTF-8 but actually encodes CESU-8, a variant of
+ * UTF-8: see http://en.wikipedia.org/wiki/CESU-8
{code}
Rather than adding a comment saying this code is buggy, how about we fix the 
bug?  Outputting proper 4-byte UTF8 sequences for a given UTF-16 surrogate pair 
is a much better solution than the current behavior.

So as far as it goes the patch looks good.  I'll look into the surrogate pair 
stuff.

 UTF8 class does not properly decode Unicode characters outside the basic 
 multilingual plane
 ---

 Key: HADOOP-9103
 URL: https://issues.apache.org/jira/browse/HADOOP-9103
 Project: Hadoop Common
  Issue Type: Bug
  Components: io
Affects Versions: 0.20.1
 Environment: SUSE LINUX
Reporter: yixiaohua
Assignee: Todd Lipcon
 Attachments: FSImage.java, hadoop-9103.txt, hadoop-9103.txt, 
 ProblemString.txt, TestUTF8AndStringGetBytes.java, 
 TestUTF8AndStringGetBytes.java

   Original Estimate: 12h
  Remaining Estimate: 12h

 this the log information  of the  exception  from the SecondaryNameNode: 
 2012-03-28 00:48:42,553 ERROR 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: 
 java.io.IOException: Found lease for
  non-existent file 
 /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/@???
 ??tor.qzone.qq.com/keypart-00174
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFilesUnderConstruction(FSImage.java:1211)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:959)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:589)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$000(SecondaryNameNode.java:473)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:350)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:314)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:225)
 at java.lang.Thread.run(Thread.java:619)
 this is the log information  about the file from namenode:
 2012-03-28 00:32:26,528 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=boss,boss 
 ip=/10.131.16.34cmd=create  
 src=/user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/
   @?tor.qzone.qq.com/keypart-00174 dst=null
 perm=boss:boss:rw-r--r--
 2012-03-28 00:37:42,387 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 NameSystem.allocateBlock: 
 /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/
   @?tor.qzone.qq.com/keypart-00174. 
 blk_2751836614265659170_184668759
 2012-03-28 00:37:42,696 INFO org.apache.hadoop.hdfs.StateChange: DIR* 
 NameSystem.completeFile: file 
 /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/
   @?tor.qzone.qq.com/keypart-00174 is closed by 
 DFSClient_attempt_201203271849_0016_r_000174_0
 2012-03-28 00:37:50,315 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=boss,boss 
 ip=/10.131.16.34cmd=rename  
 src=/user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/
   @?tor.qzone.qq.com/keypart-00174 
 dst=/user/boss/pgv/fission/task16/split/  @?
 tor.qzone.qq.com/keypart-00174  perm=boss:boss:rw-r--r--
 after check the code that save FSImage,I found there are a problem that maybe 
 a bug of HDFS Code,I past below:
 -this is the saveFSImage method  in  FSImage.java, I make some 
 mark at the problem code
 /**
* Save the contents of the FS image to the file.
*/
   void saveFSImage(File newFile) throws IOException {
 FSNamesystem fsNamesys = FSNamesystem.getFSNamesystem();
 FSDirectory fsDir = fsNamesys.dir;
 long startTime = FSNamesystem.now();
 //
 // Write out data
 //
 DataOutputStream out = new DataOutputStream(
 new BufferedOutputStream(
  new 
 FileOutputStream(newFile)));
 try {
   .
 
   // save the rest of the nodes
   saveImage(strbuf, 0, 

[jira] [Commented] (HADOOP-9103) UTF8 class does not properly decode Unicode characters outside the basic multilingual plane

2012-11-29 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506912#comment-13506912
 ] 

Colin Patrick McCabe commented on HADOOP-9103:
--

bq. Rather than adding a comment saying this code is buggy, how about we fix 
the bug? Outputting proper 4-byte UTF8 sequences for a given UTF-16 surrogate 
pair is a much better solution than the current behavior.

That would be an incompatible change.  Consider what happens when the server 
hands back 4-byte UTF-8 sequences to existing DFSClients.  Boom, they fall over.

 UTF8 class does not properly decode Unicode characters outside the basic 
 multilingual plane
 ---

 Key: HADOOP-9103
 URL: https://issues.apache.org/jira/browse/HADOOP-9103
 Project: Hadoop Common
  Issue Type: Bug
  Components: io
Affects Versions: 0.20.1
 Environment: SUSE LINUX
Reporter: yixiaohua
Assignee: Todd Lipcon
 Attachments: FSImage.java, hadoop-9103.txt, hadoop-9103.txt, 
 ProblemString.txt, TestUTF8AndStringGetBytes.java, 
 TestUTF8AndStringGetBytes.java

   Original Estimate: 12h
  Remaining Estimate: 12h

 this the log information  of the  exception  from the SecondaryNameNode: 
 2012-03-28 00:48:42,553 ERROR 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: 
 java.io.IOException: Found lease for
  non-existent file 
 /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/@???
 ??tor.qzone.qq.com/keypart-00174
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFilesUnderConstruction(FSImage.java:1211)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:959)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:589)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$000(SecondaryNameNode.java:473)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:350)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:314)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:225)
 at java.lang.Thread.run(Thread.java:619)
 this is the log information  about the file from namenode:
 2012-03-28 00:32:26,528 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=boss,boss 
 ip=/10.131.16.34cmd=create  
 src=/user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/
   @?tor.qzone.qq.com/keypart-00174 dst=null
 perm=boss:boss:rw-r--r--
 2012-03-28 00:37:42,387 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 NameSystem.allocateBlock: 
 /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/
   @?tor.qzone.qq.com/keypart-00174. 
 blk_2751836614265659170_184668759
 2012-03-28 00:37:42,696 INFO org.apache.hadoop.hdfs.StateChange: DIR* 
 NameSystem.completeFile: file 
 /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/
   @?tor.qzone.qq.com/keypart-00174 is closed by 
 DFSClient_attempt_201203271849_0016_r_000174_0
 2012-03-28 00:37:50,315 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=boss,boss 
 ip=/10.131.16.34cmd=rename  
 src=/user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/
   @?tor.qzone.qq.com/keypart-00174 
 dst=/user/boss/pgv/fission/task16/split/  @?
 tor.qzone.qq.com/keypart-00174  perm=boss:boss:rw-r--r--
 after check the code that save FSImage,I found there are a problem that maybe 
 a bug of HDFS Code,I past below:
 -this is the saveFSImage method  in  FSImage.java, I make some 
 mark at the problem code
 /**
* Save the contents of the FS image to the file.
*/
   void saveFSImage(File newFile) throws IOException {
 FSNamesystem fsNamesys = FSNamesystem.getFSNamesystem();
 FSDirectory fsDir = fsNamesys.dir;
 long startTime = FSNamesystem.now();
 //
 // Write out data
 //
 DataOutputStream out = new DataOutputStream(
 new BufferedOutputStream(
  new 
 FileOutputStream(newFile)));
 try {
   .
 
   // save the rest of the nodes
   saveImage(strbuf, 0, fsDir.rootDir, out);--problem
   fsNamesys.saveFilesUnderConstruction(out);--problem  
 detail is below
   strbuf = null;
 } finally {
  

[jira] [Updated] (HADOOP-9056) Build native library on Windows

2012-11-29 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HADOOP-9056:
--

Attachment: HADOOP-9056.1.patch

 Build native library on Windows
 ---

 Key: HADOOP-9056
 URL: https://issues.apache.org/jira/browse/HADOOP-9056
 Project: Hadoop Common
  Issue Type: Improvement
  Components: native
Affects Versions: trunk-win
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Fix For: trunk-win

 Attachments: HADOOP-9056.1.patch, HADOOP-9056.patch

   Original Estimate: 168h
  Remaining Estimate: 168h

 The native library (hadoop.dll) must be compiled on Windows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-8982) TestSocketIOWithTimeout fails on Windows

2012-11-29 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506928#comment-13506928
 ] 

Xuan Gong commented on HADOOP-8982:
---

The failure of this test case is, i think, because the partial write is 
handled differently by mac and windows. We actually write the bytes to 
channels, the Pipe.SinkChannel we are using here implements a interface 
WritableByteChannel, from Java Doc about write function of this interface, it 
said Some types of channels, depending upon their state, may write only some 
of the bytes or possibly none at all. That is one reason why I think the 
differnet OS may cause the failure.
Bacially, this test opens a pipe channel, and the sink will keep 
writing to the channel with 4192 bytes each time. When the channel is full, 
sink will do the parial write(write 3000 bytes to the channel, the channel is 
full, then the remaining bytes in the Bytebuffer is 1192) when the test is 
running on the mac environment, on the other hand, when we run this test on the 
windows environment, if the channel can not fit for the full Bytebuffer size, 
it will not allow us to write part of it. That means, when we try to write 4192 
bytes to the channel when the channel still has 3000 bytes empty size. We can 
not write at all. The remaining bytes in the Bytebuffer is still 4192. 
When this partial wirte happens, we check the condition buf.capacity  
buf.remaining or not, if yes, we will close the stream. So, that is why the 
stream is close on Mac environment, but still open in windows environment. So, 
the next time, when we try to write, we will not go expected stream is close 
exception at Windows environment. 
So far, this is from my observations. So, the questions is whether the 
windows and mac handle parital write as I decribed previous ? If it is true, in 
order to fix this test failure. What we can do is add the function called 
tryToWriteOneByte() in SocketOutPutStream.java file, this function is only for 
test purpose.

public void tryToWriteOneByte(){
try{
write(1);
writer.close();
}catch(IOException e){
//do nothing
}
} 

Calling this function will insert a byte to the channel, if we can do 
that, that means the channel is not full, and the partial write happens, so we 
need to close the stream. If we can not do that, we will catch a exception, 
that means last time we got the exception is not because the partial write is 
happened, is because the channel is full before we do the next 4192 bytes write.

Since this test failure is happened on Windows, in the 
TestSocketIOWithTimeout.java, we can check whether the environment is Windows 
before we call this function. 
After doIO(null,out,TIMEOUT), we can do
if(System.getProperty(os.name).toLowerCase().indexOf(win)=0){
out.tryToWriteOneByte();
}

 TestSocketIOWithTimeout fails on Windows
 

 Key: HADOOP-8982
 URL: https://issues.apache.org/jira/browse/HADOOP-8982
 Project: Hadoop Common
  Issue Type: Bug
  Components: net
Affects Versions: trunk-win
Reporter: Chris Nauroth
Assignee: Chris Nauroth

 This is a possible race condition or difference in socket handling on Windows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-9103) UTF8 class does not properly decode Unicode characters outside the basic multilingual plane

2012-11-29 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HADOOP-9103:


Attachment: hadoop-9103.txt

Fixed typo in the test javadoc

 UTF8 class does not properly decode Unicode characters outside the basic 
 multilingual plane
 ---

 Key: HADOOP-9103
 URL: https://issues.apache.org/jira/browse/HADOOP-9103
 Project: Hadoop Common
  Issue Type: Bug
  Components: io
Affects Versions: 0.20.1
 Environment: SUSE LINUX
Reporter: yixiaohua
Assignee: Todd Lipcon
 Attachments: FSImage.java, hadoop-9103.txt, hadoop-9103.txt, 
 hadoop-9103.txt, ProblemString.txt, TestUTF8AndStringGetBytes.java, 
 TestUTF8AndStringGetBytes.java

   Original Estimate: 12h
  Remaining Estimate: 12h

 this the log information  of the  exception  from the SecondaryNameNode: 
 2012-03-28 00:48:42,553 ERROR 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: 
 java.io.IOException: Found lease for
  non-existent file 
 /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/@???
 ??tor.qzone.qq.com/keypart-00174
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFilesUnderConstruction(FSImage.java:1211)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:959)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:589)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$000(SecondaryNameNode.java:473)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:350)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:314)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:225)
 at java.lang.Thread.run(Thread.java:619)
 this is the log information  about the file from namenode:
 2012-03-28 00:32:26,528 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=boss,boss 
 ip=/10.131.16.34cmd=create  
 src=/user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/
   @?tor.qzone.qq.com/keypart-00174 dst=null
 perm=boss:boss:rw-r--r--
 2012-03-28 00:37:42,387 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 NameSystem.allocateBlock: 
 /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/
   @?tor.qzone.qq.com/keypart-00174. 
 blk_2751836614265659170_184668759
 2012-03-28 00:37:42,696 INFO org.apache.hadoop.hdfs.StateChange: DIR* 
 NameSystem.completeFile: file 
 /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/
   @?tor.qzone.qq.com/keypart-00174 is closed by 
 DFSClient_attempt_201203271849_0016_r_000174_0
 2012-03-28 00:37:50,315 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=boss,boss 
 ip=/10.131.16.34cmd=rename  
 src=/user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/
   @?tor.qzone.qq.com/keypart-00174 
 dst=/user/boss/pgv/fission/task16/split/  @?
 tor.qzone.qq.com/keypart-00174  perm=boss:boss:rw-r--r--
 after check the code that save FSImage,I found there are a problem that maybe 
 a bug of HDFS Code,I past below:
 -this is the saveFSImage method  in  FSImage.java, I make some 
 mark at the problem code
 /**
* Save the contents of the FS image to the file.
*/
   void saveFSImage(File newFile) throws IOException {
 FSNamesystem fsNamesys = FSNamesystem.getFSNamesystem();
 FSDirectory fsDir = fsNamesys.dir;
 long startTime = FSNamesystem.now();
 //
 // Write out data
 //
 DataOutputStream out = new DataOutputStream(
 new BufferedOutputStream(
  new 
 FileOutputStream(newFile)));
 try {
   .
 
   // save the rest of the nodes
   saveImage(strbuf, 0, fsDir.rootDir, out);--problem
   fsNamesys.saveFilesUnderConstruction(out);--problem  
 detail is below
   strbuf = null;
 } finally {
   out.close();
 }
 LOG.info(Image file of size  + newFile.length() +  saved in  
 + (FSNamesystem.now() - startTime)/1000 +  seconds.);
   }
  /**
* Save file tree image starting from the given root.
* This is a recursive procedure, which first saves all children of
* a current directory and then moves inside the sub-directories.
*/
  

[jira] [Commented] (HADOOP-9103) UTF8 class does not properly decode Unicode characters outside the basic multilingual plane

2012-11-29 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506933#comment-13506933
 ] 

Todd Lipcon commented on HADOOP-9103:
-

bq. Rather than adding a comment saying this code is buggy, how about we fix 
the bug? Outputting proper 4-byte UTF8 sequences for a given UTF-16 surrogate 
pair is a much better solution than the current behavior.

It's not buggy it's just different (reminds me of something my elementary 
school teachers used to say). But on a serious note, yea, what Colin said above 
-- it could break existing clients of the code who are using the old code to 
_decode_, and were relying on the fact that we are able to round-trip non-BMP 
characters through UTF8.java.



 UTF8 class does not properly decode Unicode characters outside the basic 
 multilingual plane
 ---

 Key: HADOOP-9103
 URL: https://issues.apache.org/jira/browse/HADOOP-9103
 Project: Hadoop Common
  Issue Type: Bug
  Components: io
Affects Versions: 0.20.1
 Environment: SUSE LINUX
Reporter: yixiaohua
Assignee: Todd Lipcon
 Attachments: FSImage.java, hadoop-9103.txt, hadoop-9103.txt, 
 hadoop-9103.txt, ProblemString.txt, TestUTF8AndStringGetBytes.java, 
 TestUTF8AndStringGetBytes.java

   Original Estimate: 12h
  Remaining Estimate: 12h

 this the log information  of the  exception  from the SecondaryNameNode: 
 2012-03-28 00:48:42,553 ERROR 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: 
 java.io.IOException: Found lease for
  non-existent file 
 /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/@???
 ??tor.qzone.qq.com/keypart-00174
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFilesUnderConstruction(FSImage.java:1211)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:959)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:589)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$000(SecondaryNameNode.java:473)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:350)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:314)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:225)
 at java.lang.Thread.run(Thread.java:619)
 this is the log information  about the file from namenode:
 2012-03-28 00:32:26,528 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=boss,boss 
 ip=/10.131.16.34cmd=create  
 src=/user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/
   @?tor.qzone.qq.com/keypart-00174 dst=null
 perm=boss:boss:rw-r--r--
 2012-03-28 00:37:42,387 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 NameSystem.allocateBlock: 
 /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/
   @?tor.qzone.qq.com/keypart-00174. 
 blk_2751836614265659170_184668759
 2012-03-28 00:37:42,696 INFO org.apache.hadoop.hdfs.StateChange: DIR* 
 NameSystem.completeFile: file 
 /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/
   @?tor.qzone.qq.com/keypart-00174 is closed by 
 DFSClient_attempt_201203271849_0016_r_000174_0
 2012-03-28 00:37:50,315 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=boss,boss 
 ip=/10.131.16.34cmd=rename  
 src=/user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/
   @?tor.qzone.qq.com/keypart-00174 
 dst=/user/boss/pgv/fission/task16/split/  @?
 tor.qzone.qq.com/keypart-00174  perm=boss:boss:rw-r--r--
 after check the code that save FSImage,I found there are a problem that maybe 
 a bug of HDFS Code,I past below:
 -this is the saveFSImage method  in  FSImage.java, I make some 
 mark at the problem code
 /**
* Save the contents of the FS image to the file.
*/
   void saveFSImage(File newFile) throws IOException {
 FSNamesystem fsNamesys = FSNamesystem.getFSNamesystem();
 FSDirectory fsDir = fsNamesys.dir;
 long startTime = FSNamesystem.now();
 //
 // Write out data
 //
 DataOutputStream out = new DataOutputStream(
 new BufferedOutputStream(
  new 
 FileOutputStream(newFile)));
 try {
   .
 
   // save the rest of the nodes
   saveImage(strbuf, 0, 

[jira] [Commented] (HADOOP-9056) Build native library on Windows

2012-11-29 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506943#comment-13506943
 ] 

Arpit Agarwal commented on HADOOP-9056:
---

Thanks for the feedback Chuan. I have addressed most of your comments. Most of 
the SecureIOUtils changes don't seem to be applicable in trunk. 

There is no equivalent to posix_fadvise in Win32 but we may be able to achieve 
a similar effect by passing flags to CreateFile, we can address that in a 
separate patch.

 Build native library on Windows
 ---

 Key: HADOOP-9056
 URL: https://issues.apache.org/jira/browse/HADOOP-9056
 Project: Hadoop Common
  Issue Type: Improvement
  Components: native
Affects Versions: trunk-win
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Fix For: trunk-win

 Attachments: HADOOP-9056.1.patch, HADOOP-9056.patch

   Original Estimate: 168h
  Remaining Estimate: 168h

 The native library (hadoop.dll) must be compiled on Windows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-9099) NetUtils.normalizeHostName fails on domains where UnknownHost resolves to an IP address

2012-11-29 Thread Ivan Mitic (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506950#comment-13506950
 ] 

Ivan Mitic commented on HADOOP-9099:


Thanks Mostafa and Nicholas for the review!

 NetUtils.normalizeHostName fails on domains where UnknownHost resolves to an 
 IP address
 ---

 Key: HADOOP-9099
 URL: https://issues.apache.org/jira/browse/HADOOP-9099
 Project: Hadoop Common
  Issue Type: Bug
  Components: test
Affects Versions: 1-win
Reporter: Ivan Mitic
Assignee: Ivan Mitic
Priority: Minor
 Fix For: 1.2.0, 1-win

 Attachments: HADOOP-9099.branch-1-win.patch


 I just hit this failure. We should use some more unique string for 
 UnknownHost:
 Testcase: testNormalizeHostName took 0.007 sec
   FAILED
 expected:[65.53.5.181] but was:[UnknownHost]
 junit.framework.AssertionFailedError: expected:[65.53.5.181] but 
 was:[UnknownHost]
   at 
 org.apache.hadoop.net.TestNetUtils.testNormalizeHostName(TestNetUtils.java:347)
 Will post a patch in a bit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-9093) Move all the Exception in PathExceptions to o.a.h.fs package

2012-11-29 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506952#comment-13506952
 ] 

Suresh Srinivas commented on HADOOP-9093:
-

Daryn, I posted your comment in HADOOP-9094. Lets move the conversation to that 
jira.

 Move all the Exception in PathExceptions to o.a.h.fs package
 

 Key: HADOOP-9093
 URL: https://issues.apache.org/jira/browse/HADOOP-9093
 Project: Hadoop Common
  Issue Type: Improvement
Affects Versions: 2.0.2-alpha
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Fix For: 2.0.3-alpha

 Attachments: HADOOP-9093.patch


 The exceptions in PathExceptions are useful for non shell related 
 functionality as well. Making this available as exceptions under fs will help 
 move some of the HDFS implementation code throw more specific exception than 
 throwing IOException (for example see HDFS-4209).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-9094) Add interface audience and stability annotation to PathExceptions

2012-11-29 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506954#comment-13506954
 ] 

Suresh Srinivas commented on HADOOP-9094:
-

bq. I propose using FileNotFoundException instead of PathNotFoundException as 
it is already extensively used. Similarly use AccessControlException instead of 
PathAccessException. If folks agree, I will make that change in the next patch. 
Alternatively we could at least make these exceptions subclasses of the 
exception that I am proposing replacing them with.

Daryn's comment from HADOOP-9093:
bq. I had considered that when I created these exceptions, but wanted all path 
exceptions to derive from a common class. I suppose PathException could be an 
interface and we copy-n-paste the base code - which is the main factor I chose 
to derive from a base class.

 Add interface audience and stability annotation to PathExceptions
 -

 Key: HADOOP-9094
 URL: https://issues.apache.org/jira/browse/HADOOP-9094
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 3.0.0
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas

 HADOOP-9093 moved path related exceptions to o.a.h.fs. This jira tracks 
 adding interface audience and stability to notation to those exceptions. It 
 also tracks the comment from HADOOP-9093:
 bq. I propose using FileNotFoundException instead of PathNotFoundException as 
 it is already extensively used. Similarly use AccessControlException instead 
 of PathAccessException. If folks agree, I will make that change in the next 
 patch. Alternatively we could at least make these exceptions subclasses of 
 the exception that I am proposing replacing them with.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-9094) Add interface audience and stability annotation to PathExceptions

2012-11-29 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506957#comment-13506957
 ] 

Suresh Srinivas commented on HADOOP-9094:
-

bq. I had considered that when I created these exceptions, but wanted all path 
exceptions to derive from a common class. I suppose PathException could be an 
interface and we copy-n-paste the base code - which is the main factor I chose 
to derive from a base class.

Given that the new exceptions format the exception message in certain way, 
making the following change:
# Move the message formatting to a static method
# Have PathNotFoundException subclass FileNotFoundException. It formats the 
exception message using the utility.
# PathAccessException - rename it as PathAccessControlException. Make it a 
subclass of AccessControlException. It also formats the exception message using 
the utility.

Other alternative is to blow away the Path*Exception in above cases and use the 
super class I have propose. The message in the exception can still use the 
utility to format the message.

I am leaning towards the second option.

 Add interface audience and stability annotation to PathExceptions
 -

 Key: HADOOP-9094
 URL: https://issues.apache.org/jira/browse/HADOOP-9094
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 3.0.0
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas

 HADOOP-9093 moved path related exceptions to o.a.h.fs. This jira tracks 
 adding interface audience and stability to notation to those exceptions. It 
 also tracks the comment from HADOOP-9093:
 bq. I propose using FileNotFoundException instead of PathNotFoundException as 
 it is already extensively used. Similarly use AccessControlException instead 
 of PathAccessException. If folks agree, I will make that change in the next 
 patch. Alternatively we could at least make these exceptions subclasses of 
 the exception that I am proposing replacing them with.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-9107) Hadoop IPC client eats InterruptedException and sets interrupt on the thread which is not documented

2012-11-29 Thread Hari Shreedharan (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506980#comment-13506980
 ] 

Hari Shreedharan commented on HADOOP-9107:
--

(1) is insufficient since clients often do not directly call this method. I 
believe that if this method gets interrupted:
* Clean up the call object - seems like some clean up is required in the 
Connection object.
* throw InterruptedException, regardless of whether the calls complete 
successfully or not.

 Hadoop IPC client eats InterruptedException and sets interrupt on the thread 
 which is not documented
 

 Key: HADOOP-9107
 URL: https://issues.apache.org/jira/browse/HADOOP-9107
 Project: Hadoop Common
  Issue Type: Bug
  Components: ipc
Affects Versions: 1.1.0, 2.0.2-alpha
Reporter: Hari Shreedharan

 This code in Client.java looks fishy:
 {code}
   public Writable call(RPC.RpcKind rpcKind, Writable rpcRequest,
   ConnectionId remoteId) throws InterruptedException, IOException {
 Call call = new Call(rpcKind, rpcRequest);
 Connection connection = getConnection(remoteId, call);
 connection.sendParam(call); // send the parameter
 boolean interrupted = false;
 synchronized (call) {
   while (!call.done) {
 try {
   call.wait();   // wait for the result
 } catch (InterruptedException ie) {
   // save the fact that we were interrupted
   interrupted = true;
 }
   }
   if (interrupted) {
 // set the interrupt flag now that we are done waiting
 Thread.currentThread().interrupt();
   }
   if (call.error != null) {
 if (call.error instanceof RemoteException) {
   call.error.fillInStackTrace();
   throw call.error;
 } else { // local exception
   InetSocketAddress address = connection.getRemoteAddress();
   throw NetUtils.wrapException(address.getHostName(),
   address.getPort(),
   NetUtils.getHostname(),
   0,
   call.error);
 }
   } else {
 return call.getRpcResult();
   }
 }
   }
 {code}
 Blocking calls are expected to throw InterruptedException if that is 
 interrupted. Also it seems like this method waits on the call objects even if 
 it  is interrupted. Currently, this method does not throw an 
 InterruptedException, nor is it documented that this method interrupts the 
 thread calling it. If it is interrupted, this method should still throw 
 InterruptedException, it should not matter if the call was successful or not.
 This is a major issue for clients which do not call this directly, but call 
 HDFS client API methods to write to HDFS, which may be interrupted by the 
 client due to timeouts, but does not throw InterruptedException. Any HDFS 
 client calls can interrupt the thread but it is not documented anywhere. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-9107) Hadoop IPC client eats InterruptedException and sets interrupt on the thread which is not documented

2012-11-29 Thread Hari Shreedharan (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506983#comment-13506983
 ] 

Hari Shreedharan commented on HADOOP-9107:
--

To ensure that the real client that calls this should know that the call was 
interrupted, rather than forcing it to check the thread's interrupt flag. 

 Hadoop IPC client eats InterruptedException and sets interrupt on the thread 
 which is not documented
 

 Key: HADOOP-9107
 URL: https://issues.apache.org/jira/browse/HADOOP-9107
 Project: Hadoop Common
  Issue Type: Bug
  Components: ipc
Affects Versions: 1.1.0, 2.0.2-alpha
Reporter: Hari Shreedharan

 This code in Client.java looks fishy:
 {code}
   public Writable call(RPC.RpcKind rpcKind, Writable rpcRequest,
   ConnectionId remoteId) throws InterruptedException, IOException {
 Call call = new Call(rpcKind, rpcRequest);
 Connection connection = getConnection(remoteId, call);
 connection.sendParam(call); // send the parameter
 boolean interrupted = false;
 synchronized (call) {
   while (!call.done) {
 try {
   call.wait();   // wait for the result
 } catch (InterruptedException ie) {
   // save the fact that we were interrupted
   interrupted = true;
 }
   }
   if (interrupted) {
 // set the interrupt flag now that we are done waiting
 Thread.currentThread().interrupt();
   }
   if (call.error != null) {
 if (call.error instanceof RemoteException) {
   call.error.fillInStackTrace();
   throw call.error;
 } else { // local exception
   InetSocketAddress address = connection.getRemoteAddress();
   throw NetUtils.wrapException(address.getHostName(),
   address.getPort(),
   NetUtils.getHostname(),
   0,
   call.error);
 }
   } else {
 return call.getRpcResult();
   }
 }
   }
 {code}
 Blocking calls are expected to throw InterruptedException if that is 
 interrupted. Also it seems like this method waits on the call objects even if 
 it  is interrupted. Currently, this method does not throw an 
 InterruptedException, nor is it documented that this method interrupts the 
 thread calling it. If it is interrupted, this method should still throw 
 InterruptedException, it should not matter if the call was successful or not.
 This is a major issue for clients which do not call this directly, but call 
 HDFS client API methods to write to HDFS, which may be interrupted by the 
 client due to timeouts, but does not throw InterruptedException. Any HDFS 
 client calls can interrupt the thread but it is not documented anywhere. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-9103) UTF8 class does not properly decode Unicode characters outside the basic multilingual plane

2012-11-29 Thread Andy Isaacson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13507039#comment-13507039
 ] 

Andy Isaacson commented on HADOOP-9103:
---

bq. It's not buggy it's just different 

It's buggy if we ever end up writing a CESU-8 bytestream where someone else 
expects UTF-8.  For example, {{dfs -ls}} writing CESU-8 to stdout wouldn't work 
properly, because other programs such as {{xterm}} or {{putty}} don't implement 
the CESU-8 decoding rules.  (This example doesn't happen currently, because the 
CESU-8 filename is deserialized into a String, where it's interpreted as a 
surrogate pair, which is then written, and the correct surrogate pair - UTF-8 
encoding happens on the output side.)  Hopefully we haven't overlooked any such 
existing bugs and nobody accidentally uses UTF8.java in the future.  (At least 
it's marked @Deprecated.)

Agreed that as long as UTF8.java is the thing that reads the bytestream, we can 
continue to implement CESU-8 and it can remain partially backwards compatible 
with previous versions of UTF8.java.

 UTF8 class does not properly decode Unicode characters outside the basic 
 multilingual plane
 ---

 Key: HADOOP-9103
 URL: https://issues.apache.org/jira/browse/HADOOP-9103
 Project: Hadoop Common
  Issue Type: Bug
  Components: io
Affects Versions: 0.20.1
 Environment: SUSE LINUX
Reporter: yixiaohua
Assignee: Todd Lipcon
 Attachments: FSImage.java, hadoop-9103.txt, hadoop-9103.txt, 
 hadoop-9103.txt, ProblemString.txt, TestUTF8AndStringGetBytes.java, 
 TestUTF8AndStringGetBytes.java

   Original Estimate: 12h
  Remaining Estimate: 12h

 this the log information  of the  exception  from the SecondaryNameNode: 
 2012-03-28 00:48:42,553 ERROR 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: 
 java.io.IOException: Found lease for
  non-existent file 
 /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/@???
 ??tor.qzone.qq.com/keypart-00174
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFilesUnderConstruction(FSImage.java:1211)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:959)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:589)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$000(SecondaryNameNode.java:473)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:350)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:314)
 at 
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:225)
 at java.lang.Thread.run(Thread.java:619)
 this is the log information  about the file from namenode:
 2012-03-28 00:32:26,528 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=boss,boss 
 ip=/10.131.16.34cmd=create  
 src=/user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/
   @?tor.qzone.qq.com/keypart-00174 dst=null
 perm=boss:boss:rw-r--r--
 2012-03-28 00:37:42,387 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
 NameSystem.allocateBlock: 
 /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/
   @?tor.qzone.qq.com/keypart-00174. 
 blk_2751836614265659170_184668759
 2012-03-28 00:37:42,696 INFO org.apache.hadoop.hdfs.StateChange: DIR* 
 NameSystem.completeFile: file 
 /user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/
   @?tor.qzone.qq.com/keypart-00174 is closed by 
 DFSClient_attempt_201203271849_0016_r_000174_0
 2012-03-28 00:37:50,315 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=boss,boss 
 ip=/10.131.16.34cmd=rename  
 src=/user/boss/pgv/fission/task16/split/_temporary/_attempt_201203271849_0016_r_000174_0/
   @?tor.qzone.qq.com/keypart-00174 
 dst=/user/boss/pgv/fission/task16/split/  @?
 tor.qzone.qq.com/keypart-00174  perm=boss:boss:rw-r--r--
 after check the code that save FSImage,I found there are a problem that maybe 
 a bug of HDFS Code,I past below:
 -this is the saveFSImage method  in  FSImage.java, I make some 
 mark at the problem code
 /**
* Save the contents of the FS image to the file.
*/
   void saveFSImage(File newFile) throws IOException {
 FSNamesystem fsNamesys = FSNamesystem.getFSNamesystem();
 FSDirectory fsDir = fsNamesys.dir;
 long startTime = FSNamesystem.now();
 //
 // Write out 

[jira] [Created] (HADOOP-9108) Add a method to clear terminateCalled to ExitUtil for test cases

2012-11-29 Thread Kihwal Lee (JIRA)
Kihwal Lee created HADOOP-9108:
--

 Summary: Add a method to clear terminateCalled to ExitUtil for 
test cases
 Key: HADOOP-9108
 URL: https://issues.apache.org/jira/browse/HADOOP-9108
 Project: Hadoop Common
  Issue Type: Improvement
  Components: util
Affects Versions: 0.23.5, 2.0.2-alpha
Reporter: Kihwal Lee


Currently once terminateCalled is set, it will stay set since it's a class 
static variable. This can break test cases where multiple test cases run in one 
jvm. In MiniDfsCluster, it should be cleared during shutdown for the next test 
case to run properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HADOOP-9108) Add a method to clear terminateCalled to ExitUtil for test cases

2012-11-29 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee resolved HADOOP-9108.


Resolution: Invalid

 Add a method to clear terminateCalled to ExitUtil for test cases
 

 Key: HADOOP-9108
 URL: https://issues.apache.org/jira/browse/HADOOP-9108
 Project: Hadoop Common
  Issue Type: Improvement
  Components: util
Affects Versions: 2.0.2-alpha, 0.23.5
Reporter: Kihwal Lee

 Currently once terminateCalled is set, it will stay set since it's a class 
 static variable. This can break test cases where multiple test cases run in 
 one jvm. In MiniDfsCluster, it should be cleared during shutdown for the next 
 test case to run properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Reopened] (HADOOP-9108) Add a method to clear terminateCalled to ExitUtil for test cases

2012-11-29 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee reopened HADOOP-9108:


  Assignee: Kihwal Lee

I found out the necessary changes have already been made in trunk and branch-2 
by HDFS-3663 and HDFS-3765.  But we cannot simply pull these patches to 
branch-0.23 because HDFS-3765 contains more than just ExitUtil change.

I will use this jira to implement something equivalent for branch-0.23. Since 
this is for tests, a slight divergence should be of no concern.

 Add a method to clear terminateCalled to ExitUtil for test cases
 

 Key: HADOOP-9108
 URL: https://issues.apache.org/jira/browse/HADOOP-9108
 Project: Hadoop Common
  Issue Type: Improvement
  Components: util
Affects Versions: 2.0.2-alpha, 0.23.5
Reporter: Kihwal Lee
Assignee: Kihwal Lee

 Currently once terminateCalled is set, it will stay set since it's a class 
 static variable. This can break test cases where multiple test cases run in 
 one jvm. In MiniDfsCluster, it should be cleared during shutdown for the next 
 test case to run properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-9108) Add a method to clear terminateCalled to ExitUtil for test cases

2012-11-29 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HADOOP-9108:
---

 Target Version/s: 0.23.6  (was: 3.0.0, 2.0.3-alpha, 0.23.6)
Affects Version/s: (was: 2.0.2-alpha)

 Add a method to clear terminateCalled to ExitUtil for test cases
 

 Key: HADOOP-9108
 URL: https://issues.apache.org/jira/browse/HADOOP-9108
 Project: Hadoop Common
  Issue Type: Improvement
  Components: util
Affects Versions: 0.23.5
Reporter: Kihwal Lee
Assignee: Kihwal Lee

 Currently once terminateCalled is set, it will stay set since it's a class 
 static variable. This can break test cases where multiple test cases run in 
 one jvm. In MiniDfsCluster, it should be cleared during shutdown for the next 
 test case to run properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-9108) Add a method to clear terminateCalled to ExitUtil for test cases

2012-11-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13507068#comment-13507068
 ] 

Hadoop QA commented on HADOOP-9108:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12555457/hadoop-9108.branch-0.23.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/1835//console

This message is automatically generated.

 Add a method to clear terminateCalled to ExitUtil for test cases
 

 Key: HADOOP-9108
 URL: https://issues.apache.org/jira/browse/HADOOP-9108
 Project: Hadoop Common
  Issue Type: Improvement
  Components: util
Affects Versions: 0.23.5
Reporter: Kihwal Lee
Assignee: Kihwal Lee
 Attachments: hadoop-9108.branch-0.23.patch


 Currently once terminateCalled is set, it will stay set since it's a class 
 static variable. This can break test cases where multiple test cases run in 
 one jvm. In MiniDfsCluster, it should be cleared during shutdown for the next 
 test case to run properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-9108) Add a method to clear terminateCalled to ExitUtil for test cases

2012-11-29 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated HADOOP-9108:
--

   Resolution: Fixed
Fix Version/s: 0.23.6
   Status: Resolved  (was: Patch Available)

This patch only applies to branch-0.23, hence the jenkins failures.  I 
committed it only to branch-0.23 since trunk and branch-2 already have similar 
functionality.

 Add a method to clear terminateCalled to ExitUtil for test cases
 

 Key: HADOOP-9108
 URL: https://issues.apache.org/jira/browse/HADOOP-9108
 Project: Hadoop Common
  Issue Type: Improvement
  Components: util
Affects Versions: 0.23.5
Reporter: Kihwal Lee
Assignee: Kihwal Lee
 Fix For: 0.23.6

 Attachments: hadoop-9108.branch-0.23.patch


 Currently once terminateCalled is set, it will stay set since it's a class 
 static variable. This can break test cases where multiple test cases run in 
 one jvm. In MiniDfsCluster, it should be cleared during shutdown for the next 
 test case to run properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-9107) Hadoop IPC client eats InterruptedException and sets interrupt on the thread which is not documented

2012-11-29 Thread Hari Shreedharan (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13507081#comment-13507081
 ] 

Hari Shreedharan commented on HADOOP-9107:
--

My take on what should really happen in the catch block:
* call.setException()
* Remove call from the calls table.
* In the receiveResponse method, check if calls.get(callId) returns null before 
proceeding.
* throw the InterruptedException (or wrap it and then throw), so client code 
can know something went wrong and the call failed.

 Hadoop IPC client eats InterruptedException and sets interrupt on the thread 
 which is not documented
 

 Key: HADOOP-9107
 URL: https://issues.apache.org/jira/browse/HADOOP-9107
 Project: Hadoop Common
  Issue Type: Bug
  Components: ipc
Affects Versions: 1.1.0, 2.0.2-alpha
Reporter: Hari Shreedharan

 This code in Client.java looks fishy:
 {code}
   public Writable call(RPC.RpcKind rpcKind, Writable rpcRequest,
   ConnectionId remoteId) throws InterruptedException, IOException {
 Call call = new Call(rpcKind, rpcRequest);
 Connection connection = getConnection(remoteId, call);
 connection.sendParam(call); // send the parameter
 boolean interrupted = false;
 synchronized (call) {
   while (!call.done) {
 try {
   call.wait();   // wait for the result
 } catch (InterruptedException ie) {
   // save the fact that we were interrupted
   interrupted = true;
 }
   }
   if (interrupted) {
 // set the interrupt flag now that we are done waiting
 Thread.currentThread().interrupt();
   }
   if (call.error != null) {
 if (call.error instanceof RemoteException) {
   call.error.fillInStackTrace();
   throw call.error;
 } else { // local exception
   InetSocketAddress address = connection.getRemoteAddress();
   throw NetUtils.wrapException(address.getHostName(),
   address.getPort(),
   NetUtils.getHostname(),
   0,
   call.error);
 }
   } else {
 return call.getRpcResult();
   }
 }
   }
 {code}
 Blocking calls are expected to throw InterruptedException if that is 
 interrupted. Also it seems like this method waits on the call objects even if 
 it  is interrupted. Currently, this method does not throw an 
 InterruptedException, nor is it documented that this method interrupts the 
 thread calling it. If it is interrupted, this method should still throw 
 InterruptedException, it should not matter if the call was successful or not.
 This is a major issue for clients which do not call this directly, but call 
 HDFS client API methods to write to HDFS, which may be interrupted by the 
 client due to timeouts, but does not throw InterruptedException. Any HDFS 
 client calls can interrupt the thread but it is not documented anywhere. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-9108) Add a method to clear terminateCalled to ExitUtil for test cases

2012-11-29 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13507104#comment-13507104
 ] 

Suresh Srinivas commented on HADOOP-9108:
-

+1 for the patch.

 Add a method to clear terminateCalled to ExitUtil for test cases
 

 Key: HADOOP-9108
 URL: https://issues.apache.org/jira/browse/HADOOP-9108
 Project: Hadoop Common
  Issue Type: Improvement
  Components: util
Affects Versions: 0.23.5
Reporter: Kihwal Lee
Assignee: Kihwal Lee
 Fix For: 0.23.6

 Attachments: hadoop-9108.branch-0.23.patch


 Currently once terminateCalled is set, it will stay set since it's a class 
 static variable. This can break test cases where multiple test cases run in 
 one jvm. In MiniDfsCluster, it should be cleared during shutdown for the next 
 test case to run properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HADOOP-9108) Add a method to clear terminateCalled to ExitUtil for test cases

2012-11-29 Thread Suresh Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HADOOP-9108:


Issue Type: Bug  (was: Improvement)

 Add a method to clear terminateCalled to ExitUtil for test cases
 

 Key: HADOOP-9108
 URL: https://issues.apache.org/jira/browse/HADOOP-9108
 Project: Hadoop Common
  Issue Type: Bug
  Components: util
Affects Versions: 0.23.5
Reporter: Kihwal Lee
Assignee: Kihwal Lee
 Fix For: 0.23.6

 Attachments: hadoop-9108.branch-0.23.patch


 Currently once terminateCalled is set, it will stay set since it's a class 
 static variable. This can break test cases where multiple test cases run in 
 one jvm. In MiniDfsCluster, it should be cleared during shutdown for the next 
 test case to run properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-9083) Port HADOOP-9020 Add a SASL PLAIN server to branch 1

2012-11-29 Thread Yu Gao (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13507140#comment-13507140
 ] 

Yu Gao commented on HADOOP-9083:


This jira does not intend to introduce SASL PLAIN as a new auth method to 1.x, 
but just add the SASL PLAIN server implementation, so that it can convenient 
components that want to use SASL PLAIN mechanism, like Hive.

Hive thrift server depends on thrift library which has provided TSaslTransport 
implementation, so with this PLAIN server registered, TSaslTransport can use it 
to do PLAIN auth. 

 Port HADOOP-9020 Add a SASL PLAIN server to branch 1
 

 Key: HADOOP-9083
 URL: https://issues.apache.org/jira/browse/HADOOP-9083
 Project: Hadoop Common
  Issue Type: Task
  Components: ipc, security
Affects Versions: 1.0.3
Reporter: Yu Gao
Assignee: Yu Gao
 Attachments: HADOOP-9020-branch-1.patch, test-patch.result, 
 test-TestSaslRPC.result


 It would be good if the patch of HADOOP-9020 for adding SASL PLAIN server 
 implementation could be ported to branch 1 as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-9082) Select and document a platform-independent scripting language for use in Hadoop environment

2012-11-29 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13507163#comment-13507163
 ] 

Allen Wittenauer commented on HADOOP-9082:
--

(I know this is mostly going to get ignored because a) it's from me, b) it's 
more than 3 lines, and c) we've already proven that we only care about Linux 
despite people wanting support for other platforms, but here we go anyway.)

While I can understand the build-time issues, I'm not sure I understand the 
run-time issues.  If you are running on a system that doesn't have libhadoop or 
want to launch a task, you're going to hit a fork() and that's going to call 
bash (or potentially sh).  Or are we planning on replacing taskjvm.sh as well? 
So the bash requirement doesn't go away.

At run-time, the whole purpose of these scripts is to launch Java.  That's it.  
The problem that we have is that our current scripts are extremely convoluted, 
wrap into themselves, and fundamentally aren't written very well.  Arguing that 
we can make our launcher scripts object oriented or using an IDE to debug them 
seems like we're expecting to raise the complexity to even more ludicrous 
levels.

One thing I'm very curious about is if we'll lose the ${BASH_SOURCE} 
functionality, something I considering absolutely critical, by moving to 
Python.  (It allows one to run without setting *any* environment variables. I 
think I submitted that as a patch years ago, but well...)

Let's say we pick Python.  Which version are we going to target? From a support 
perspective, we could very easily end up asking about not only the Java version 
but the Python version.  Do we really want that?

bq. The alternative would be to maintain two complete suites of scripts, one 
for Linux and one for Windows (and perhaps others in the future).

This is what most projects do that have Windows and UNIX functionality, from 
what I've seen.  This is because things are in different locations, delimiters, 
etc, etc  and if you merge them, you end up with a lot of if this then that, 
or if this2, then that2 to the point that you essentially have two different 
suites of scripts but just stored in one anyway.

bq. We want to avoid the need to update dual modules in two different languages 
when functionality changes, especially given that many Linux developers are not 
familiar with powershell or bat, and many Windows developers are not familiar 
with shell or bash.

I think this is the real message: the Linux developers.. which should be read 
as Java developers who work on Hadoop don't know bash and fundamentally 
ignore most attempts from outside to improve them.  Switching to something else 
isn't going to change this problem. Instead, it'll just allow for them to 
continue ignoring the community in favor of their own changes.

Perhaps the fundamental problem is this:  Why are so many launcher changes even 
necessary?  Why isn't Hadoop smart enough to figure out some of these things 
after Java is launched?  Have we even seriously attempted a simplification of 
the scripts?  (I suspect just using functions instead of the craziness around 
exported variables would make a world of difference.)  Has there been any 
thought about actually creating real configuration files built by installers so 
we don't have to recompute a half-dozen things at every run time?

Side-note: it would be interesting to see the memory footprint requirement 
differences on something like one of Yahoo!'s gateways.  Sure, individually it 
isn't much.  But at scale...

Anyway, I've given my $0.02.  Do what you want, I won't stop you. But I do 
question the thinking behind it.

 Select and document a platform-independent scripting language for use in 
 Hadoop environment
 ---

 Key: HADOOP-9082
 URL: https://issues.apache.org/jira/browse/HADOOP-9082
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Matt Foley

 This issue is going to be discussed at length in the common-dev@ mailing 
 list, under topic [PROPOSAL] introduce Python as build-time and run-time 
 dependency for Hadoop and throughout Hadoop stack.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira