[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-15 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050231#comment-13050231
 ] 

Todd Lipcon commented on HDFS-941:
--

Ran the following benchmark to compare 0.22 before vs after the application of 
HDFS-941:
- inserted a 128M file into HDFS
- read it 50 times using "hadoop fs -cat /file > /dev/null" and the unix "time" 
utility
- recompiled with the patch reverted, restarted NN/DN
- ran same test
- recompiled with the patch included, restarted NN/DN
- ran same test
- recompiled with patch reverted
- ran same test

This resulted in 100 samples for each setup, 50 from each run. The following is 
the output of a t-test for the important variables:


> t.test(d.22$wall, d.22.with.941$wall)

Welch Two Sample t-test

data:  d.22$wall and d.22.with.941$wall 
t = -0.4932, df = 174.594, p-value = 0.6225
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval:
 -0.011002972  0.006602972 
sample estimates:
mean of x mean of y 
   1.19371.1959 

> t.test(d.22$user, d.22.with.941$user)

Welch Two Sample t-test

data:  d.22$user and d.22.with.941$user 
t = -1.5212, df = 197.463, p-value = 0.1298
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval:
 -0.032378364  0.004178364 
sample estimates:
mean of x mean of y 
   1.33351.3476 

that is to say, it failed to reject the null hypothesis... in less stat-heavy 
terms, there's no statistical evidence that this patch makes the test any 
slower.

> Datanode xceiver protocol should allow reuse of a connection
> 
>
> Key: HDFS-941
> URL: https://issues.apache.org/jira/browse/HDFS-941
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node, hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: bc Wong
> Fix For: 0.22.0
>
> Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, 
> HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, 
> HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, 
> HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, 
> hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png
>
>
> Right now each connection into the datanode xceiver only processes one 
> operation.
> In the case that an operation leaves the stream in a well-defined state (eg a 
> client reads to the end of a block successfully) the same connection could be 
> reused for a second operation. This should improve random read performance 
> significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-15 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050219#comment-13050219
 ] 

Kihwal Lee commented on HDFS-941:
-

Perhaps it's confusing because this Jira is seen as Random Vs. Sequential read. 
But in fact this jira is really about improving short reads and the solution is 
to reduce the overhead of connection making, which is present in both short and 
long reads. It is by no means favoring random or short reads. In fact, if the 
client does typical sequential reads multiple times from the same dn, this 
patch will help them too. The gain will be bigger if the files are smaller. 
Sure, there is one time overhead of cache lookup (size: 16), this can be 
ignored when the read size is sufficiently big. This cache management overhead 
should show up, in theory, for very small cold(connecton-wise) accesses. So far 
I have only seen gains. But there might be some special chronic cases that this 
patch actually make read slower. But again I don't belive they are typical use 
cases. Having said that, I think it is reasonable to run tests against the 
latest patch and make sure there is no regression in performance. Uncommitting 
now may do more harm than good. Let's see the numbers first and decide what to 
do. 

> Datanode xceiver protocol should allow reuse of a connection
> 
>
> Key: HDFS-941
> URL: https://issues.apache.org/jira/browse/HDFS-941
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node, hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: bc Wong
> Fix For: 0.22.0
>
> Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, 
> HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, 
> HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, 
> HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, 
> hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png
>
>
> Right now each connection into the datanode xceiver only processes one 
> operation.
> In the case that an operation leaves the stream in a well-defined state (eg a 
> client reads to the end of a block successfully) the same connection could be 
> reused for a second operation. This should improve random read performance 
> significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1989) When checkpointing by backup node occurs parallely when a file is being closed by a client then Exception occurs saying no journal streams.

2011-06-15 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050207#comment-13050207
 ] 

ramkrishna.s.vasudevan commented on HDFS-1989:
--

Hi,

Yes. the editsLog.close() is the problem as it closes all the editStreams 
including the diverted editStreams.  



> When checkpointing by backup node occurs parallely when a file is being 
> closed by a client then Exception occurs saying no journal streams. 
> 
>
> Key: HDFS-1989
> URL: https://issues.apache.org/jira/browse/HDFS-1989
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: ramkrishna.s.vasudevan
> Fix For: 0.23.0
>
>
> Backup namenode initiates the checkpointing process. 
> As a part of checkpointing based on the timestamp it tries to download the 
> FSImage or use the existing one.
> Then it tries to save the FSImage.
> During this time it tries to close the editLog streams.
> Parallely when a client tries to close a file just after the checkpointing 
> process closes the editLog Stream then we get an exception saying
> java.io.IOException: java.lang.IllegalStateException: !!! WARNING !!! File 
> system changes are not persistent. No journal streams.
> Here the saveNameSpace api closes all the editlog streams resulting in this 
> issue.
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-15 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050197#comment-13050197
 ] 

stack commented on HDFS-941:


@Konstantin

Convention is that RM says whats in a release and no one else.  See his +1 
above.

bq. ...proof of no-degradation to sequential ios.

What would this test look like?  Perf tests done above showed only minor 
differences ("...well within the standard deviation." as per Todd).

And if this test can only be committed pending perf evaluation, why single this 
patch out and not require it of all commits to hdfs?

> Datanode xceiver protocol should allow reuse of a connection
> 
>
> Key: HDFS-941
> URL: https://issues.apache.org/jira/browse/HDFS-941
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node, hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: bc Wong
> Fix For: 0.22.0
>
> Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, 
> HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, 
> HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, 
> HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, 
> hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png
>
>
> Right now each connection into the datanode xceiver only processes one 
> operation.
> In the case that an operation leaves the stream in a well-defined state (eg a 
> client reads to the end of a block successfully) the same connection could be 
> reused for a second operation. This should improve random read performance 
> significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1692) In secure mode, Datanode process doesn't exit when disks fail.

2011-06-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050182#comment-13050182
 ] 

Hadoop QA commented on HDFS-1692:
-

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12482713/HDFS-1692-v0.23-2.patch
  against trunk revision 1136230.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

+1 system test framework.  The patch passed system test framework compile.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/790//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/790//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/790//console

This message is automatically generated.

> In secure mode, Datanode process doesn't exit when disks fail.
> --
>
> Key: HDFS-1692
> URL: https://issues.apache.org/jira/browse/HDFS-1692
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.20.204.0, 0.23.0
>Reporter: Bharath Mundlapudi
>Assignee: Bharath Mundlapudi
> Fix For: 0.20.204.0, 0.23.0
>
> Attachments: HDFS-1692-1.patch, HDFS-1692-v0.23-1.patch, 
> HDFS-1692-v0.23-2.patch
>
>
> In secure mode, when disks fail more than volumes tolerated, datanode process 
> doesn't exit properly and it just hangs even though shutdown method is 
> called. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-15 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050181#comment-13050181
 ] 

Konstantin Shvachko commented on HDFS-941:
--

-1 on committing this without the proof of no-degradation to sequential ios.
Should have done it before, but thought my message was clear.
Let me know if you want me to uncommit before benchmarks are provided.

> Datanode xceiver protocol should allow reuse of a connection
> 
>
> Key: HDFS-941
> URL: https://issues.apache.org/jira/browse/HDFS-941
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node, hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: bc Wong
> Fix For: 0.22.0
>
> Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, 
> HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, 
> HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, 
> HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, 
> hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png
>
>
> Right now each connection into the datanode xceiver only processes one 
> operation.
> In the case that an operation leaves the stream in a well-defined state (eg a 
> client reads to the end of a block successfully) the same connection could be 
> reused for a second operation. This should improve random read performance 
> significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2002) Incorrect computation of needed blocks in getTurnOffTip()

2011-06-15 Thread Jakob Homan (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050166#comment-13050166
 ] 

Jakob Homan commented on HDFS-2002:
---

Matthias - looks like TestSafeMode is checking the output of the getTip call.  
Can you update the test to expect the new values? Thanks. -jg

> Incorrect computation of needed blocks in getTurnOffTip()
> -
>
> Key: HDFS-2002
> URL: https://issues.apache.org/jira/browse/HDFS-2002
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.22.0
>Reporter: Konstantin Shvachko
>Assignee: Matthias Eckert
>  Labels: newbie
> Fix For: 0.22.0
>
> Attachments: hdfs-2002.patch
>
>
> {{SafeModeInfo.getTurnOffTip()}} under-reports the number of blocks needed to 
> reach the safemode threshold.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-1692) In secure mode, Datanode process doesn't exit when disks fail.

2011-06-15 Thread Suresh Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-1692:
--

Status: Patch Available  (was: Open)

> In secure mode, Datanode process doesn't exit when disks fail.
> --
>
> Key: HDFS-1692
> URL: https://issues.apache.org/jira/browse/HDFS-1692
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.20.204.0, 0.23.0
>Reporter: Bharath Mundlapudi
>Assignee: Bharath Mundlapudi
> Fix For: 0.20.204.0, 0.23.0
>
> Attachments: HDFS-1692-1.patch, HDFS-1692-v0.23-1.patch, 
> HDFS-1692-v0.23-2.patch
>
>
> In secure mode, when disks fail more than volumes tolerated, datanode process 
> doesn't exit properly and it just hangs even though shutdown method is 
> called. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HDFS-1952) FSEditLog.open() appears to succeed even if all EDITS directories fail

2011-06-15 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley resolved HDFS-1952.
--

   Resolution: Fixed
Fix Version/s: 0.23.0
   0.22.0

Committed to v22.  Thanks, Andrew!

> FSEditLog.open() appears to succeed even if all EDITS directories fail
> --
>
> Key: HDFS-1952
> URL: https://issues.apache.org/jira/browse/HDFS-1952
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.22.0, 0.23.0
>Reporter: Matt Foley
>Assignee: Andrew Wang
>  Labels: newbie
> Fix For: 0.22.0, 0.23.0
>
> Attachments: hdfs-1952-0.22.patch, hdfs-1952.patch, hdfs-1952.patch, 
> hdfs-1952.patch
>
>
> FSEditLog.open() appears to "succeed" even if all of the individual 
> directories failed to allow creation of an EditLogOutputStream.  The problem 
> and solution are essentially similar to that of HDFS-1505.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2018) Move all journal stream management code into one place

2011-06-15 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050141#comment-13050141
 ] 

Todd Lipcon commented on HDFS-2018:
---

Hi Ivan. I've been thinking about this patch the last couple days and I think I 
agree we need to do this. The difficulty seems to be in how we'll manage the 
current semantics when dealing with upgrading an old storage directory, while 
also keeping the code paths manageable.

Let me look over the diff itself and see how it looks.

> Move all journal stream management code into one place
> --
>
> Key: HDFS-2018
> URL: https://issues.apache.org/jira/browse/HDFS-2018
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ivan Kelly
>Assignee: Ivan Kelly
> Fix For: Edit log branch (HDFS-1073)
>
> Attachments: HDFS-2018.diff, HDFS-2018.diff
>
>
> Currently in the HDFS-1073 branch, the code for creating output streams is in 
> FileJournalManager and the code for input streams is in the inspectors. This 
> change does a number of things.
>   - Input and Output streams are now created by the JournalManager.
>   - FSImageStorageInspectors now deals with URIs when referring to edit logs
>   - Recovery of inprogress logs is performed by counting the number of 
> transactions instead of looking at the length of the file.
> The patch for this applies on top of the HDFS-1073 branch + HDFS-2003 patch.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1998) make refresh-namodenodes.sh refreshing all namenodes

2011-06-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050135#comment-13050135
 ] 

Hudson commented on HDFS-1998:
--

Integrated in Hadoop-Hdfs-trunk-Commit #746 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/746/])


> make refresh-namodenodes.sh refreshing all namenodes
> 
>
> Key: HDFS-1998
> URL: https://issues.apache.org/jira/browse/HDFS-1998
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: scripts
>Affects Versions: 0.23.0
>Reporter: Tanping Wang
>Assignee: Tanping Wang
>Priority: Minor
> Fix For: 0.23.0
>
> Attachments: HDFS-1998.2.patch, HDFS-1998.3.patch, HDFS-1998.patch
>
>
> refresh-namenodes.sh is used to refresh name nodes in the cluster to check 
> for updates of include/exclude list.  It is used when decommissioning or 
> adding a data node.  Currently it only refreshes the name node who serves the 
> defaultFs, if there is defaultFs defined.  Fix it by refreshing all the name 
> nodes in the cluster.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-988) saveNamespace race can corrupt the edits log

2011-06-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050133#comment-13050133
 ] 

Hudson commented on HDFS-988:
-

Integrated in Hadoop-Hdfs-trunk-Commit #746 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/746/])


> saveNamespace race can corrupt the edits log
> 
>
> Key: HDFS-988
> URL: https://issues.apache.org/jira/browse/HDFS-988
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.20-append, 0.21.0, 0.22.0
>Reporter: dhruba borthakur
>Assignee: Eli Collins
>Priority: Blocker
> Fix For: 0.20-append, 0.22.0
>
> Attachments: 988-fixups.txt, HDFS-988_fix_synchs.patch, 
> hdfs-988-2.patch, hdfs-988-3.patch, hdfs-988-4.patch, hdfs-988-5.patch, 
> hdfs-988-6.patch, hdfs-988-7.patch, hdfs-988-b22-1.patch, hdfs-988.txt, 
> saveNamespace.txt, saveNamespace_20-append.patch
>
>
> The adminstrator puts the namenode is safemode and then issues the 
> savenamespace command. This can corrupt the edits log. The problem is that  
> when the NN enters safemode, there could still be pending logSycs occuring 
> from other threads. Now, the saveNamespace command, when executed, would save 
> a edits log with partial writes. I have seen this happen on 0.20.
> https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2003) Separate FSEditLog reading logic from editLog memory state building logic

2011-06-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050134#comment-13050134
 ] 

Hudson commented on HDFS-2003:
--

Integrated in Hadoop-Hdfs-trunk-Commit #746 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/746/])


> Separate FSEditLog reading logic from editLog memory state building logic
> -
>
> Key: HDFS-2003
> URL: https://issues.apache.org/jira/browse/HDFS-2003
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: Edit log branch (HDFS-1073), 0.23.0
>Reporter: Ivan Kelly
>Assignee: Ivan Kelly
> Fix For: Edit log branch (HDFS-1073), 0.23.0
>
> Attachments: 2003-delta.txt, HDFS-2003-replicationfix-delta.diff, 
> HDFS-2003.diff, HDFS-2003.diff, HDFS-2003.diff, HDFS-2003.diff, 
> HDFS-2003.diff, HDFS-2003.diff, HDFS-2003.diff, HDFS-2003.diff, 
> hdfs-2003.txt, hdfs-2003.txt, hdfs-2003.txt
>
>
> Currently FSEditLogLoader has code for reading from an InputStream 
> interleaved with code which updates the FSNameSystem and FSDirectory. This 
> makes it difficult to read an edit log without having a whole load of other 
> object initialised, which is problematic if you want to do things like count 
> how many transactions are in a file etc. 
> This patch separates the reading of the stream and the building of the memory 
> state. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2056) Update fetchdt usage

2011-06-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050120#comment-13050120
 ] 

Hudson commented on HDFS-2056:
--

Integrated in Hadoop-Hdfs-trunk-Commit #746 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/746/])


> Update fetchdt usage
> 
>
> Key: HDFS-2056
> URL: https://issues.apache.org/jira/browse/HDFS-2056
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation, tools
>Affects Versions: 0.23.0
>Reporter: Tanping Wang
>Assignee: Tanping Wang
>Priority: Minor
> Fix For: 0.23.0
>
> Attachments: HDFS-2056.patch
>
>
> Update the usage of fetchdt.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050131#comment-13050131
 ] 

Hudson commented on HDFS-941:
-

Integrated in Hadoop-Hdfs-trunk-Commit #746 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/746/])


> Datanode xceiver protocol should allow reuse of a connection
> 
>
> Key: HDFS-941
> URL: https://issues.apache.org/jira/browse/HDFS-941
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node, hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: bc Wong
> Fix For: 0.22.0
>
> Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, 
> HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, 
> HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, 
> HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, 
> hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png
>
>
> Right now each connection into the datanode xceiver only processes one 
> operation.
> In the case that an operation leaves the stream in a well-defined state (eg a 
> client reads to the end of a block successfully) the same connection could be 
> reused for a second operation. This should improve random read performance 
> significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1875) MiniDFSCluster hard-codes dfs.datanode.address to localhost

2011-06-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050130#comment-13050130
 ] 

Hudson commented on HDFS-1875:
--

Integrated in Hadoop-Hdfs-trunk-Commit #746 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/746/])


> MiniDFSCluster hard-codes dfs.datanode.address to localhost
> ---
>
> Key: HDFS-1875
> URL: https://issues.apache.org/jira/browse/HDFS-1875
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.22.0
>Reporter: Eric Payne
>Assignee: Eric Payne
> Fix For: 0.23.0
>
> Attachments: HDFS-1875.patch, HDFS-1875.patch
>
>
> When creating RPC addresses that represent the communication sockets for each 
> simulated DataNode, the MiniDFSCluster class hard-codes the address of the 
> dfs.datanode.address port to be "127.0.0.1:0"
> The DataNodeCluster test tool uses the MiniDFSCluster class to create a 
> selected number of simulated datanodes on a single host. In the 
> DataNodeCluster setup, the NameNode is not simulated but is started as a 
> separate daemon.
> The problem is that if the write requrests into the simulated datanodes are 
> originated on a host that is not the same host running the simulated 
> datanodes, the connections are refused. This is because the RPC sockets that 
> are started by MiniDFSCluster are for "localhost" (127.0.0.1) and are not 
> accessible from outside that same machine.
> It is proposed that the MiniDFSCluster.setupDatanodeAddress() method be 
> overloaded in order to accommodate an environment where the NameNode is on 
> one host, the client is on another host, and the simulated DataNodes are on 
> yet another host (or even multiple hosts simulating multiple DataNodes each).
> The overloaded API would add a parameter that would be used as the basis for 
> creating the RPS sockets. By default, it would remain 127.0.0.1

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1520) HDFS 20 append: Lightweight NameNode operation to trigger lease recovery

2011-06-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050128#comment-13050128
 ] 

Hudson commented on HDFS-1520:
--

Integrated in Hadoop-Hdfs-trunk-Commit #746 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/746/])


> HDFS 20 append: Lightweight NameNode operation to trigger lease recovery
> 
>
> Key: HDFS-1520
> URL: https://issues.apache.org/jira/browse/HDFS-1520
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: name-node
>Affects Versions: 0.20-append
>Reporter: Hairong Kuang
>Assignee: Hairong Kuang
> Fix For: 0.20-append
>
> Attachments: recoverLeaseApache20.patch
>
>
> Currently HBase uses append to trigger the close of HLog during Hlog split. 
> Append is a very expensive operation, which involves not only NameNode 
> operations but creating a writing pipeline. If one of datanodes on the 
> pipeline has a problem, this recovery may takes minutes. I'd like implement a 
> lightweight NameNode operation to trigger lease recovery and make HBase to 
> use this instead.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1954) Improve corrupt files warning message on NameNode web UI

2011-06-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050132#comment-13050132
 ] 

Hudson commented on HDFS-1954:
--

Integrated in Hadoop-Hdfs-trunk-Commit #746 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/746/])


> Improve corrupt files warning message on NameNode web UI
> 
>
> Key: HDFS-1954
> URL: https://issues.apache.org/jira/browse/HDFS-1954
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Reporter: philo vivero
>Assignee: Patrick Hunt
> Fix For: 0.22.0
>
> Attachments: HDFS-1954.patch, HDFS-1954.patch, HDFS-1954.patch, 
> branch-0.22-hdfs-1954.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> On NameNode web interface, you may get this warning:
>   WARNING : There are about 32 missing blocks. Please check the log or run 
> fsck.
> If the cluster was started less than 14 days before, it would be great to 
> add: "Is dfs.data.dir defined?"
> If at the point of that error message, that parameter could be checked, and 
> error made "OMG dfs.data.dir isn't defined!" that'd be even better. As is, 
> troubleshooting undefined parameters is a difficult proposition.
> I suspect this is an easy fix.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2030) Fix the usability of namenode upgrade command

2011-06-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050126#comment-13050126
 ] 

Hudson commented on HDFS-2030:
--

Integrated in Hadoop-Hdfs-trunk-Commit #746 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/746/])


> Fix the usability of namenode upgrade command
> -
>
> Key: HDFS-2030
> URL: https://issues.apache.org/jira/browse/HDFS-2030
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.23.0
>Reporter: Bharath Mundlapudi
>Assignee: Bharath Mundlapudi
>Priority: Minor
> Fix For: 0.23.0
>
> Attachments: HDFS-2030-1.patch, HDFS-2030-2.patch, HDFS-2030-3.patch
>
>
> Fixing the Namenode upgrade option along the same line as Namenode format 
> option. 
> If clusterid is not given then clusterid will be automatically generated for 
> the upgrade but if clusterid is given then it will be honored.
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1619) Remove AC_TYPE* from the libhdfs

2011-06-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050121#comment-13050121
 ] 

Hudson commented on HDFS-1619:
--

Integrated in Hadoop-Hdfs-trunk-Commit #746 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/746/])


> Remove AC_TYPE* from the libhdfs
> 
>
> Key: HDFS-1619
> URL: https://issues.apache.org/jira/browse/HDFS-1619
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: libhdfs
>Reporter: Roman Shaposhnik
>Assignee: Roman Shaposhnik
> Fix For: 0.22.0
>
> Attachments: HDFS-1619-C99.patch.txt, HDFS-1619.patch.txt, 
> hdfs-1619-2.patch
>
>
> Remove AC_TYPE* from the libhdfs build since we get these via stdint.
> Currently configure.ac uses AC_TYPE_INT16_T, AC_TYPE_INT32_T, AC_TYPE_INT64_T 
> and AC_TYPE_UINT16_T and thus requires autoconf 2.61 or higher. 
> This prevents using it on such platforms as CentOS/RHEL 5.4 and 5.5. Given 
> that those are pretty popular and also given that it is really difficult to 
> find a platform
> these days that doesn't natively define  intXX_t types I'm curious as to 
> whether we can simply remove those macros or perhaps fail ONLY if we happen 
> to be on such
> a platform. 
> Here's a link to GNU autoconf docs for your reference:
> http://www.gnu.org/software/hello/manual/autoconf/Particular-Types.html

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1409) The "register" method of the BackupNode class should be "UnsupportedActionException("register")"

2011-06-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050127#comment-13050127
 ] 

Hudson commented on HDFS-1409:
--

Integrated in Hadoop-Hdfs-trunk-Commit #746 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/746/])


> The "register" method of the BackupNode class should be 
> "UnsupportedActionException("register")"
> 
>
> Key: HDFS-1409
> URL: https://issues.apache.org/jira/browse/HDFS-1409
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.21.0
>Reporter: Ching-Shen Chen
>Priority: Trivial
> Fix For: 0.22.0
>
> Attachments: HDFS-1409.patch, HDFS-1409.patch
>
>
> The register method of the BackupNode class should be 
> "UnsupportedActionException("register")" rather than  
> "UnsupportedActionException("journal")".

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1149) Lease reassignment is not persisted to edit log

2011-06-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050125#comment-13050125
 ] 

Hudson commented on HDFS-1149:
--

Integrated in Hadoop-Hdfs-trunk-Commit #746 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/746/])


> Lease reassignment is not persisted to edit log
> ---
>
> Key: HDFS-1149
> URL: https://issues.apache.org/jira/browse/HDFS-1149
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.21.0, 0.22.0, 0.23.0
>Reporter: Todd Lipcon
>Assignee: Aaron T. Myers
> Fix For: 0.23.0
>
> Attachments: editsStored, hdfs-1149.0.patch, hdfs-1149.1.patch, 
> hdfs-1149.1.patch, hdfs-1149.2.patch
>
>
> During lease recovery, the lease gets reassigned to a special NN holder. This 
> is not currently persisted to the edit log, which means that after an NN 
> restart, the original leaseholder could end up allocating more blocks or 
> completing a file that has already started recovery.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2046) Force entropy to come from non-true random for tests

2011-06-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050129#comment-13050129
 ] 

Hudson commented on HDFS-2046:
--

Integrated in Hadoop-Hdfs-trunk-Commit #746 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/746/])


> Force entropy to come from non-true random for tests
> 
>
> Key: HDFS-2046
> URL: https://issues.apache.org/jira/browse/HDFS-2046
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: build, test
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: 0.23.0
>
> Attachments: hdfs-2046.txt
>
>
> Same as HADOOP-7335 but for HDFS

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1586) Add InterfaceAudience annotation to MiniDFSCluster

2011-06-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050123#comment-13050123
 ] 

Hudson commented on HDFS-1586:
--

Integrated in Hadoop-Hdfs-trunk-Commit #746 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/746/])


> Add InterfaceAudience annotation to MiniDFSCluster
> --
>
> Key: HDFS-1586
> URL: https://issues.apache.org/jira/browse/HDFS-1586
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Suresh Srinivas
>Assignee: Suresh Srinivas
> Attachments: HDFS-1586.1.patch, HDFS-1586.patch
>
>
> MiniDFSCluster is used both by hdfs and mapreduce. Annotation needs to be 
> added to this class to reflect this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2039) TestNameNodeMetrics uses a bad test root path

2011-06-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050122#comment-13050122
 ] 

Hudson commented on HDFS-2039:
--

Integrated in Hadoop-Hdfs-trunk-Commit #746 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/746/])


> TestNameNodeMetrics uses a bad test root path
> -
>
> Key: HDFS-2039
> URL: https://issues.apache.org/jira/browse/HDFS-2039
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 0.23.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: 0.22.0
>
> Attachments: hdfs-2039.txt
>
>
> I found that running TestNameNodeMetrics within eclipse fails, since 
> TEST_ROOT_DIR_PATH has a default which is a non-absolute path. Since this 
> path is a DFS path, rather than a local FS path, it shouldn't use the test 
> root dir system property anyhow - we can just hardcode it to a path on DFS.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2073) Namenode is missing @Override annotations

2011-06-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050117#comment-13050117
 ] 

Hudson commented on HDFS-2073:
--

Integrated in Hadoop-Hdfs-trunk-Commit #746 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/746/])
HDFS-2073. Add @Override annotation to NameNode. Contributed by Suresh 
Srinivas.

suresh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1136230
Files : 
* /hadoop/common/trunk/hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hdfs/src/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java


> Namenode is missing @Override annotations
> -
>
> Key: HDFS-2073
> URL: https://issues.apache.org/jira/browse/HDFS-2073
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: Suresh Srinivas
>Assignee: Suresh Srinivas
>Priority: Minor
> Fix For: 0.23.0
>
> Attachments: HDFS-2073.patch
>
>
> NameNode implements several protocols. The methods that implement the 
> interface do not have @Override. Also @inheritdoc is used, which is not 
> needed with @Override.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2071) Use of isConnected() in DataXceiver is invalid

2011-06-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050116#comment-13050116
 ] 

Hudson commented on HDFS-2071:
--

Integrated in Hadoop-Hdfs-trunk-Commit #746 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/746/])


> Use of isConnected() in DataXceiver is invalid
> --
>
> Key: HDFS-2071
> URL: https://issues.apache.org/jira/browse/HDFS-2071
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.23.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Minor
> Fix For: 0.22.0
>
> Attachments: HDFS-2071.patch
>
>
> The use of Socket.isConnected() in DataXceiver.run() is not valid. It returns 
> false until the connection is made and then always returns true after that. 
> It will never return false after the initial connection is successfully made. 
> Socket.isClosed() or SocketChannel.isOpen() should be used instead, assuming 
> someone is handling SocketException and does Socket.close() or 
> SocketChannel.close(). It seems the op handlers in DataXceiver are diligently 
> using IOUtils.closeStream(), which will invoke SocketChannel.close().
> {code}
> - } while (s.isConnected() && socketKeepaliveTimeout > 0);
> + } while (!s.isClosed() && socketKeepaliveTimeout > 0);
> {code}
> The effect of this bug is very minor, as the socket is read again right 
> after. If the connection was closed, the readOp() will throw an EOFException, 
> which is caught and dealt with properly.  The system still functions normally 
> with probably only few microseconds of extra overhead in the premature 
> connection closure cases.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2041) Some mtimes and atimes are lost when edit logs are replayed

2011-06-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050112#comment-13050112
 ] 

Hudson commented on HDFS-2041:
--

Integrated in Hadoop-Hdfs-trunk-Commit #746 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/746/])


> Some mtimes and atimes are lost when edit logs are replayed
> ---
>
> Key: HDFS-2041
> URL: https://issues.apache.org/jira/browse/HDFS-2041
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: 0.23.0
>
> Attachments: hdfs-2041.txt, hdfs-2041.txt
>
>
> The refactoring in HDFS-2003 allowed findbugs to expose two potential bugs:
> - the atime field logged with OP_MKDIR is unused
> - the timestamp field logged with OP_CONCAT_DELETE is unused
> The concat issue is definitely real. The atime for MKDIR might always be 
> identical to mtime in that case, in which case it could be ignored.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1942) If all Block Pool service threads exit then datanode should exit.

2011-06-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050114#comment-13050114
 ] 

Hudson commented on HDFS-1942:
--

Integrated in Hadoop-Hdfs-trunk-Commit #746 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/746/])


> If all Block Pool service threads exit then datanode should exit.
> -
>
> Key: HDFS-1942
> URL: https://issues.apache.org/jira/browse/HDFS-1942
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.23.0
>Reporter: Bharath Mundlapudi
>Assignee: Bharath Mundlapudi
> Attachments: HDFS-1942-1.patch, HDFS-1942-2.patch, HDFS-1942-3.patch
>
>
> Currently, if all block pool service threads exit, Datanode continue to run. 
> This should be fixed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1052) HDFS scalability with multiple namenodes

2011-06-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050118#comment-13050118
 ] 

Hudson commented on HDFS-1052:
--

Integrated in Hadoop-Hdfs-trunk-Commit #746 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/746/])


> HDFS scalability with multiple namenodes
> 
>
> Key: HDFS-1052
> URL: https://issues.apache.org/jira/browse/HDFS-1052
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: name-node
>Affects Versions: 0.22.0
>Reporter: Suresh Srinivas
>Assignee: Suresh Srinivas
> Fix For: 0.23.0
>
> Attachments: Block pool proposal.pdf, HDFS-1052.3.patch, 
> HDFS-1052.4.patch, HDFS-1052.5.patch, HDFS-1052.6.patch, HDFS-1052.patch, 
> Mulitple Namespaces5.pdf, high-level-design.pdf
>
>
> HDFS currently uses a single namenode that limits scalability of the cluster. 
> This jira proposes an architecture to scale the nameservice horizontally 
> using multiple namenodes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2063) libhdfs test is broken

2011-06-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050109#comment-13050109
 ] 

Hudson commented on HDFS-2063:
--

Integrated in Hadoop-Hdfs-trunk-Commit #746 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/746/])


> libhdfs test is broken
> --
>
> Key: HDFS-2063
> URL: https://issues.apache.org/jira/browse/HDFS-2063
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.23.0
>Reporter: Eli Collins
>Assignee: Eric Yang
> Attachments: HDFS-2063-2.patch, HDFS-2063.patch
>
>
> Looks like the recent bin/script shuffling in HDFS-1963 broke the libhdfs 
> test. This works on 22.
> {noformat}
> $ ant -Dlibhdfs=true compile test
> ...
>  [exec] Hadoop common not found.
>  [exec] /home/eli/src/hdfs3/src/c++/libhdfs/tests/test-libhdfs.sh: line 
> 181: /home/eli/src/hdfs3/bin/hadoop-daemon.sh: No such file or directory
>  [exec] /home/eli/src/hdfs3/src/c++/libhdfs/tests/test-libhdfs.sh: line 
> 182: /home/eli/src/hdfs3/bin/hadoop-daemon.sh: No such file or directory
>  [exec] Wait 30s for the datanode to start up...
> {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2066) Create a package and individual class files for DataTransferProtocol

2011-06-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050111#comment-13050111
 ] 

Hudson commented on HDFS-2066:
--

Integrated in Hadoop-Hdfs-trunk-Commit #746 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/746/])


> Create a package and individual class files for DataTransferProtocol
> 
>
> Key: HDFS-2066
> URL: https://issues.apache.org/jira/browse/HDFS-2066
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: data-node, hdfs client, name-node
>Affects Versions: 0.23.0
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Fix For: 0.23.0
>
> Attachments: h2066_20110610.patch
>
>
> {{DataTransferProtocol}} contains quite a few classes.  It is better to 
> create a package and put the classes into individual files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2067) Bump DATA_TRANSFER_VERSION in trunk for protobufs

2011-06-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050113#comment-13050113
 ] 

Hudson commented on HDFS-2067:
--

Integrated in Hadoop-Hdfs-trunk-Commit #746 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/746/])


> Bump DATA_TRANSFER_VERSION in trunk for protobufs
> -
>
> Key: HDFS-2067
> URL: https://issues.apache.org/jira/browse/HDFS-2067
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node, hdfs client
>Affects Versions: 0.23.0
>Reporter: Todd Lipcon
>Assignee: Tsz Wo (Nicholas), SZE
> Fix For: 0.23.0
>
> Attachments: h2067_20110611.patch
>
>
> Forgot to bump DATA_TRANSFER_VERSION in HDFS-2058. We need to do this since 
> the protobufs are incompatible with the old writables.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2058) DataTransfer Protocol using protobufs

2011-06-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050115#comment-13050115
 ] 

Hudson commented on HDFS-2058:
--

Integrated in Hadoop-Hdfs-trunk-Commit #746 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/746/])


> DataTransfer Protocol using protobufs
> -
>
> Key: HDFS-2058
> URL: https://issues.apache.org/jira/browse/HDFS-2058
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 0.23.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: 0.23.0
>
> Attachments: HDFS-2058.patch, hdfs-2058.txt, hdfs-2058.txt, 
> hdfs-2058.txt, hdfs-2058.txt, hdfs-2058.txt
>
>
> We've been talking about this for a long time... would be nice to use 
> something like protobufs or Thrift for some of our wire protocols.
> I knocked together a prototype of DataTransferProtocol on top of proto bufs 
> that seems to work.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2040) Only build libhdfs if a flag is passed

2011-06-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050110#comment-13050110
 ] 

Hudson commented on HDFS-2040:
--

Integrated in Hadoop-Hdfs-trunk-Commit #746 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/746/])


> Only build libhdfs if a flag is passed
> --
>
> Key: HDFS-2040
> URL: https://issues.apache.org/jira/browse/HDFS-2040
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 0.23.0
>Reporter: Eli Collins
>Assignee: Eli Collins
>Priority: Minor
> Fix For: 0.23.0
>
> Attachments: hdfs-2040-1.patch, hdfs-2040-2.patch
>
>
> In HDFS-2022 we made ant binary build libhdfs unconditionally, this is a pain 
> for users who now need to get the native toolchain working to create a 
> tarball to test a change, and inconsistent with common and MR (see 
> MAPREDUCE-2559) which only build native code if a flag is passed. Let's 
> revert to the previous behavior of requiring -Dlibhdfs be passed at build 
> time. We could also create a new ant target that doesn't build the native 
> code, however restoring the old behavior seems simplest.  
>   

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2069) Incorrect default trash interval value in the docs

2011-06-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050107#comment-13050107
 ] 

Hudson commented on HDFS-2069:
--

Integrated in Hadoop-Hdfs-trunk-Commit #746 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/746/])


> Incorrect default trash interval value in the docs
> --
>
> Key: HDFS-2069
> URL: https://issues.apache.org/jira/browse/HDFS-2069
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: 0.20.2
>Reporter: Ravi Phulari
>Assignee: Harsh J
>Priority: Trivial
> Fix For: 0.23.0
>
> Attachments: HDFS-2069.r1.diff
>
>
> Current HDFS architecture information about Trash is incorrectly documented 
> as  - 
> {color:red} 
> The current default policy is to delete files from /trash that are more than 
> 6 hours old. In the future, this policy will be configurable through a well 
> defined interface.
> {color}
> It should be something like - 
> Current default trash interval is set to 0 (Deletes file without storing in 
> trash ) . This value is configurable parameter stored as fs.trash.interval 
> stored in core-site.xml . 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1295) Improve namenode restart times by short-circuiting the first block reports from datanodes

2011-06-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050105#comment-13050105
 ] 

Hudson commented on HDFS-1295:
--

Integrated in Hadoop-Hdfs-trunk-Commit #746 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/746/])


> Improve namenode restart times by short-circuiting the first block reports 
> from datanodes
> -
>
> Key: HDFS-1295
> URL: https://issues.apache.org/jira/browse/HDFS-1295
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: dhruba borthakur
>Assignee: Matt Foley
> Fix For: Federation Branch, 0.23.0
>
> Attachments: HDFS-1295_delta_for_trunk.patch, 
> HDFS-1295_for_ymerge.patch, HDFS-1295_for_ymerge_v2.patch, 
> IBR_shortcut_v2a.patch, IBR_shortcut_v3atrunk.patch, 
> IBR_shortcut_v4atrunk.patch, IBR_shortcut_v4atrunk.patch, 
> IBR_shortcut_v4atrunk.patch, IBR_shortcut_v6atrunk.patch, 
> IBR_shortcut_v7atrunk.patch, shortCircuitBlockReport_1.txt
>
>
> The namenode restart is dominated by the performance of processing block 
> reports. On a 2000 node cluster with 90 million blocks,  block report 
> processing takes 30 to 40 minutes. The namenode "diffs" the contents of the 
> incoming block report with the contents of the blocks map, and then applies 
> these diffs to the blocksMap, but in reality there is no need to compute the 
> "diff" because this is the first block report from the datanode.
> This code change improves block report processing time by 300%.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1948) Forward port 'hdfs-1520 lightweight namenode operation to trigger lease reccovery'

2011-06-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050108#comment-13050108
 ] 

Hudson commented on HDFS-1948:
--

Integrated in Hadoop-Hdfs-trunk-Commit #746 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/746/])


> Forward port 'hdfs-1520 lightweight namenode operation to trigger lease 
> reccovery'
> --
>
> Key: HDFS-1948
> URL: https://issues.apache.org/jira/browse/HDFS-1948
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: stack
>Assignee: stack
> Fix For: 0.22.0
>
> Attachments: 1948-part1.txt, 1948-v2.txt, 1948-v3.txt, 1948-v3.txt, 
> 1948-v4-minus_rpc_version_change.txt, 1948-v4.22.txt, 1948-v4.txt
>
>
> This issue is about forward porting from branch-0.20-append the little 
> namenode api that facilitates stealing of a file's lease.  The forward port 
> would be an adaption of hdfs-1520 and its companion patches, hdfs-1555 and 
> hdfs-1554, to suit the TRUNK.
> Intent is to get this fix into 0.22 time willing; i'll run a vote to get ok 
> on getting it added to branch.  HBase needs this facility.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2061) two minor bugs in BlockManager block report processing

2011-06-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050106#comment-13050106
 ] 

Hudson commented on HDFS-2061:
--

Integrated in Hadoop-Hdfs-trunk-Commit #746 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/746/])


> two minor bugs in BlockManager block report processing
> --
>
> Key: HDFS-2061
> URL: https://issues.apache.org/jira/browse/HDFS-2061
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: Matt Foley
>Assignee: Matt Foley
>Priority: Minor
> Fix For: 0.23.0
>
> Attachments: HDFS-2061.patch
>
>
> In a recent review of HDFS-1295 patches (speedup for block report 
> processing), found two very minor bugs in BlockManager, as documented in 
> following comments.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HDFS-2073) Namenode is missing @Override annotations

2011-06-15 Thread Suresh Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas resolved HDFS-2073.
---

  Resolution: Fixed
Hadoop Flags: [Reviewed]

I committed the patch.

> Namenode is missing @Override annotations
> -
>
> Key: HDFS-2073
> URL: https://issues.apache.org/jira/browse/HDFS-2073
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: Suresh Srinivas
>Assignee: Suresh Srinivas
>Priority: Minor
> Fix For: 0.23.0
>
> Attachments: HDFS-2073.patch
>
>
> NameNode implements several protocols. The methods that implement the 
> interface do not have @Override. Also @inheritdoc is used, which is not 
> needed with @Override.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-1836) Thousand of CLOSE_WAIT socket

2011-06-15 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-1836:
-

Fix Version/s: 0.20.205.0

Marking it fixed in 205 since 206 isn't there yet.

> Thousand of CLOSE_WAIT socket 
> --
>
> Key: HDFS-1836
> URL: https://issues.apache.org/jira/browse/HDFS-1836
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs client
>Affects Versions: 0.20.2
> Environment: Linux 2.6.18-194.32.1.el5 #1 SMP Wed Jan 5 17:52:25 EST 
> 2011 x86_64 x86_64 x86_64 GNU/Linux
> java version "1.6.0_23"
> Java(TM) SE Runtime Environment (build 1.6.0_23-b05)
> Java HotSpot(TM) 64-Bit Server VM (build 19.0-b09, mixed mode)
>Reporter: Dennis Cheung
>Assignee: Bharath Mundlapudi
> Fix For: 0.20.3, 0.20.205.0
>
> Attachments: hdfs-1836-0.20.205.txt, hdfs-1836-0.20.txt, 
> hdfs-1836-0.20.txt, patch-draft-1836.patch
>
>
> $ /usr/sbin/lsof -i TCP:50010 | grep -c CLOSE_WAIT
> 4471
> It is better if everything runs normal. 
> However, from time to time there are some "DataStreamer Exception: 
> java.net.SocketTimeoutException" and "DFSClient.processDatanodeError(2507) | 
> Error Recovery for" can be found from log file and the number of CLOSE_WAIT 
> socket just keep increasing
> The CLOSE_WAIT handles may remain for hours and days; then "Too many open 
> file" some day.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-1381) MiniDFSCluster documentation refers to out-of-date configuration parameters

2011-06-15 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated HDFS-1381:
--

Tags: newbie

> MiniDFSCluster documentation refers to out-of-date configuration parameters
> ---
>
> Key: HDFS-1381
> URL: https://issues.apache.org/jira/browse/HDFS-1381
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.20.1
>Reporter: Jakob Homan
>
> The javadoc for MiniDFSCluster makes repeated references to setting 
> dfs.name.dir and dfs.data.dir.  These should be replaced with references to 
> DFSConfigKeys' DFS_NAMENODE_NAME_DIR_KEY and DFS_DATANODE_DATA_DIR_KEY, 
> respectively.  The old values are deprecated in DFSConfigKeys, but we should 
> switch to the new values where ever we can.
> Also, a quick search the code shows that TestDFSStorageStateRecovery.java and 
> UpgradeUtilities.java should be updated as well.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2054) BlockSender.sendChunk() prints ERROR for connection closures encountered during transferToFully()

2011-06-15 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050072#comment-13050072
 ] 

Kihwal Lee commented on HDFS-2054:
--

The new patch has been uploaded.

> BlockSender.sendChunk() prints ERROR for connection closures encountered  
> during transferToFully()
> --
>
> Key: HDFS-2054
> URL: https://issues.apache.org/jira/browse/HDFS-2054
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node
>Affects Versions: 0.22.0, 0.23.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Minor
> Fix For: 0.22.0, 0.23.0
>
> Attachments: HDFS-2054-1.patch, HDFS-2054.patch, HDFS-2054.patch
>
>
> The addition of ERROR was part of HDFS-1527. In environments where clients 
> tear down FSInputStream/connection before reaching the end of stream, this 
> error message often pops up. Since these are not really errors and especially 
> not the fault of data node, the message should be toned down at least. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2054) BlockSender.sendChunk() prints ERROR for connection closures encountered during transferToFully()

2011-06-15 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-2054:
-

Attachment: HDFS-2054-1.patch

> BlockSender.sendChunk() prints ERROR for connection closures encountered  
> during transferToFully()
> --
>
> Key: HDFS-2054
> URL: https://issues.apache.org/jira/browse/HDFS-2054
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node
>Affects Versions: 0.22.0, 0.23.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Minor
> Fix For: 0.22.0, 0.23.0
>
> Attachments: HDFS-2054-1.patch, HDFS-2054.patch, HDFS-2054.patch
>
>
> The addition of ERROR was part of HDFS-1527. In environments where clients 
> tear down FSInputStream/connection before reaching the end of stream, this 
> error message often pops up. Since these are not really errors and especially 
> not the fault of data node, the message should be toned down at least. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1788) FsShell ls: Show symlinks properties

2011-06-15 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050065#comment-13050065
 ] 

Daryn Sharp commented on HDFS-1788:
---

Very nice! First off, this isn't a HDFS change.  Maybe someone with admin 
powers can move the jira to HADOOP, or close this and repost on HADOOP-6424.  

In general, I'd really like to see {{statReal}} revert to {{stat}}.  That will 
greatly reduce the size of the patch.  Since I'm a unix stalwart, I'd "prefer" 
{{statLink}} be called {{lstat}}.  In either case, it should be a method to 
avoid increasing the load on the NN (see more below).

+FsShell+
* Make sure that renaming {{getFS()}} to {{getFC()}} didn't break {{DFSAdmin}}, 
or {{TestMRCLI}}, etc.
* I think the addition of {{FileContext.processDeleteOnExit()}} to {{close()}} 
may cause problems.  Ex.  What happens if I have temp files open before running 
a FsShell command?  Won't this change cause the files to unexpectedly go "poof"?

+Ls+
* The new {{\-L}} flag is really implementing part of {{\-l}}. {{\-L}} is 
supposed to replace the link with the name & attrs of its target.  It would be 
a nice option, but probably not strictly needed unless you are feeling 
ambitious.  Now might be a good time to make {{\-l}} generate output the way ls 
is REALLY supposed to look.  Otherwise altering the {{\-l}} output in the 
future will be deemed incompatible.  You might consider another jira...

+PathData+
* I'd recommend undoing as much as possible since there reasons why it is the 
way it is, plus it will cause a major merge conflict with my pending PathData 
changes.
* The String ctors need to be restored since Path can mangle the string it is 
given.
* The 3-arg ctor that takes FileStatus must be re-added.  I took great effort 
to reduce the RPC load on the NN, but this patch will undo some of that work 
and generate *2X the stats to the NN*.  
* Always doing the equivalent of stat/lstat on every object is causing *2X the 
stats to the NN*.  Combined with the previous point, this patch is causing *4X 
the stats to the NN* unless there's magic going on deep in the client
* {{lstat}} is used so infrequently it should probably be an on-demand 
{{item.lstat()}} method
* {{refreshStatus}} is expected to throw FNF, but this changes it to ignore it
* If {{FileSystem}} can't be completely eliminated, please remove 
{{fs(Configuration)}} and leave the {{fs}} member intact.  That will also 
greatly reduce the size of the patch, and remove errors that may be caused by 
providing a different config than the one used to originally create the object.

+Tail+
* Changing {{refreshStatus}} to not throw FNF will cause a NPE here.  This is 
one of the reasons {{refreshStatus}} needs to retain original behavior.

+CopyCommands+
* Are you sure {{copyCrcToLocal}} will work now?  The raw fs was needed since 
the actual fs obscures the crc file.  Does {{FileContext}} change that behavior?

+Ln+
* {{Ln}} Shouldn't be in {{CopyCommands}}.  Please move this class into it's 
own file.  
* Should require {{\-s}} to create symbolic links.  If {{\-s}} isn't given, it 
should throw that hardlinks aren't supported -- that way we leave the door open 
to the possibility of hardlinks someday.
* It can't be a {{CommandWithDestination}} or it forces the target of the 
symlink to exist at creation time.  It's completely legit to create a symlink 
that points to a non-existent path.  Also, it should take 1 or 2 args like the 
unix command.  {{processArguments}} is probably the best place to implement it.
* Might consider splitting this off to accelerate the rest being integrated.  
Up to you.

+LocalFileSystem+
* Why expose {{NAME}}?  It doesn't appear to be used.

+FileContext+
* Why add {{fsUri}}?  It doesn't appear to be used.

Overall, great work!  Don't let the length of the review be discouraging.  It's 
a great improvement to the shell!

> FsShell ls: Show symlinks properties
> 
>
> Key: HDFS-1788
> URL: https://issues.apache.org/jira/browse/HDFS-1788
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Reporter: Jonathan Eagles
>Assignee: John George
>Priority: Minor
> Attachments: HDFS-1788.patch
>
>
> ls FsShell command implementation has been consistent with the linux 
> implementations of ls \-l. With the addition of symlinks, I would expect the 
> ability to show file type 'd' for directory, '\-' for file, and 'l' for 
> symlink. In addition, following the linkname entry for symlinks, I would 
> expect the ability to show "\-> ". In linux, the default is to 
> the the properties of the link and not of the link target. In linux, '-L' 
> option allows for the dereferencing of symlinks to show link target 
> properties, but it is not the default. 

--
This message is automatically gen

[jira] [Commented] (HDFS-2054) BlockSender.sendChunk() prints ERROR for connection closures encountered during transferToFully()

2011-06-15 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050062#comment-13050062
 ] 

Kihwal Lee commented on HDFS-2054:
--

I found the following from Sun's NIO example.  :)

j2se/share/sample/nio/server/RequestHandler.java
{code}
} catch (IOException x) {
String m = x.getMessage();
if (!m.equals("Broken pipe") &&
!m.equals("Connection reset by peer")) {
System.err.println("RequestHandler: " + x.toString());
}
{code}

I will add the check.

> BlockSender.sendChunk() prints ERROR for connection closures encountered  
> during transferToFully()
> --
>
> Key: HDFS-2054
> URL: https://issues.apache.org/jira/browse/HDFS-2054
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node
>Affects Versions: 0.22.0, 0.23.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Minor
> Fix For: 0.22.0, 0.23.0
>
> Attachments: HDFS-2054.patch, HDFS-2054.patch
>
>
> The addition of ERROR was part of HDFS-1527. In environments where clients 
> tear down FSInputStream/connection before reaching the end of stream, this 
> error message often pops up. Since these are not really errors and especially 
> not the fault of data node, the message should be toned down at least. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2034) length in getBlockRange becomes -ve when reading only from currently being written blk

2011-06-15 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-2034:
--

Status: Open  (was: Patch Available)

> length in getBlockRange becomes -ve when reading only from currently being 
> written blk
> --
>
> Key: HDFS-2034
> URL: https://issues.apache.org/jira/browse/HDFS-2034
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: John George
>Assignee: John George
>Priority: Minor
> Attachments: HDFS-2034-1.patch, HDFS-2034-1.patch, HDFS-2034-2.patch, 
> HDFS-2034-3.patch, HDFS-2034.patch
>
>
> This came up during HDFS-1907. Posting an example that Todd posted in 
> HDFS-1907 that brought out this issue.
> {quote}
> Here's an example sequence to describe what I mean:
> 1. open file, write one and a half blocks
> 2. call hflush
> 3. another reader asks for the first byte of the second block
> {quote}
> In this case since offset is greater than the completed block length, the 
> math in getBlockRange() of DFSInputStreamer.java will set "length" to 
> negative.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-15 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-941:
-

   Resolution: Fixed
Fix Version/s: 0.22.0
   Status: Resolved  (was: Patch Available)

Committed the 0.22 patch. Konstantin: I think if you look at that comment 
again, you'll see that some of the test runs got faster, some got slower, and 
all were well within the standard deviation.

Given the only extra overhead that might be introduced here is a single lookup 
in the Socketcache, I see no reason to think this would have any negative 
effect. If you have benchmark results that disagree, please post them.

> Datanode xceiver protocol should allow reuse of a connection
> 
>
> Key: HDFS-941
> URL: https://issues.apache.org/jira/browse/HDFS-941
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node, hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: bc Wong
> Fix For: 0.22.0
>
> Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, 
> HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, 
> HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, 
> HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, 
> hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png
>
>
> Right now each connection into the datanode xceiver only processes one 
> operation.
> In the case that an operation leaves the stream in a well-defined state (eg a 
> client reads to the end of a block successfully) the same connection could be 
> reused for a second operation. This should improve random read performance 
> significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2071) Use of isConnected() in DataXceiver is invalid

2011-06-15 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-2071:
--

   Resolution: Fixed
Fix Version/s: (was: 0.20.3)
   0.22.0
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

+1, committed. Thanks Kihwal

> Use of isConnected() in DataXceiver is invalid
> --
>
> Key: HDFS-2071
> URL: https://issues.apache.org/jira/browse/HDFS-2071
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.23.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Minor
> Fix For: 0.22.0
>
> Attachments: HDFS-2071.patch
>
>
> The use of Socket.isConnected() in DataXceiver.run() is not valid. It returns 
> false until the connection is made and then always returns true after that. 
> It will never return false after the initial connection is successfully made. 
> Socket.isClosed() or SocketChannel.isOpen() should be used instead, assuming 
> someone is handling SocketException and does Socket.close() or 
> SocketChannel.close(). It seems the op handlers in DataXceiver are diligently 
> using IOUtils.closeStream(), which will invoke SocketChannel.close().
> {code}
> - } while (s.isConnected() && socketKeepaliveTimeout > 0);
> + } while (!s.isClosed() && socketKeepaliveTimeout > 0);
> {code}
> The effect of this bug is very minor, as the socket is read again right 
> after. If the connection was closed, the readOp() will throw an EOFException, 
> which is caught and dealt with properly.  The system still functions normally 
> with probably only few microseconds of extra overhead in the premature 
> connection closure cases.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2054) BlockSender.sendChunk() prints ERROR for connection closures encountered during transferToFully()

2011-06-15 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050045#comment-13050045
 ] 

Todd Lipcon commented on HDFS-2054:
---

Hey Kihwal. Don't we want to check for "Connection reset by peer" also? I've 
seen that as well.

> BlockSender.sendChunk() prints ERROR for connection closures encountered  
> during transferToFully()
> --
>
> Key: HDFS-2054
> URL: https://issues.apache.org/jira/browse/HDFS-2054
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node
>Affects Versions: 0.22.0, 0.23.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Minor
> Fix For: 0.22.0, 0.23.0
>
> Attachments: HDFS-2054.patch, HDFS-2054.patch
>
>
> The addition of ERROR was part of HDFS-1527. In environments where clients 
> tear down FSInputStream/connection before reaching the end of stream, this 
> error message often pops up. Since these are not really errors and especially 
> not the fault of data node, the message should be toned down at least. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-1692) In secure mode, Datanode process doesn't exit when disks fail.

2011-06-15 Thread Bharath Mundlapudi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharath Mundlapudi updated HDFS-1692:
-

Attachment: HDFS-1692-v0.23-2.patch

I have cleaned up a little bit, like the logging related stuff and few 
comments. Uploading the patch again.

> In secure mode, Datanode process doesn't exit when disks fail.
> --
>
> Key: HDFS-1692
> URL: https://issues.apache.org/jira/browse/HDFS-1692
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.20.204.0, 0.23.0
>Reporter: Bharath Mundlapudi
>Assignee: Bharath Mundlapudi
> Fix For: 0.20.204.0, 0.23.0
>
> Attachments: HDFS-1692-1.patch, HDFS-1692-v0.23-1.patch, 
> HDFS-1692-v0.23-2.patch
>
>
> In secure mode, when disks fail more than volumes tolerated, datanode process 
> doesn't exit properly and it just hangs even though shutdown method is 
> called. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HDFS-1811) Create scripts to decommission datanodes

2011-06-15 Thread Suresh Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas resolved HDFS-1811.
---

Resolution: Duplicate

Already fixed as a part of HDFS-1703.

> Create scripts to decommission datanodes
> 
>
> Key: HDFS-1811
> URL: https://issues.apache.org/jira/browse/HDFS-1811
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Owen O'Malley
>Assignee: Erik Steffl
>
> Create scripts to decommission datanodes:
>   - distribute exclude file
> - input is location of exclude file
> - location on namenodes: hdfs getconf -excludeFile
> - list of namenodes: hdfs getconf -namenodes
> - scp excludes files to all namenodes
>   - refresh namenodes
> - list of namenodes: hdfs getconf -namenodes
> - refresh namenodes: hdfs dfsadmin -refreshNodes
> Two scripts are needed because each of them might require different 
> permissions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2054) BlockSender.sendChunk() prints ERROR for connection closures encountered during transferToFully()

2011-06-15 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049995#comment-13049995
 ] 

Kihwal Lee commented on HDFS-2054:
--

Test failures:

TestHDFSCLI: The quota related test failures are not due to this patch. 
They also failed in build #696.
TestHDFSTrash : It was failing in other recent pre-commit builds: e.g. 
https://builds.apache.org/job/PreCommit-HDFS-Build/786/

Tests included: No test is included as justified above.


> BlockSender.sendChunk() prints ERROR for connection closures encountered  
> during transferToFully()
> --
>
> Key: HDFS-2054
> URL: https://issues.apache.org/jira/browse/HDFS-2054
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node
>Affects Versions: 0.22.0, 0.23.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Minor
> Fix For: 0.22.0, 0.23.0
>
> Attachments: HDFS-2054.patch, HDFS-2054.patch
>
>
> The addition of ERROR was part of HDFS-1527. In environments where clients 
> tear down FSInputStream/connection before reaching the end of stream, this 
> error message often pops up. Since these are not really errors and especially 
> not the fault of data node, the message should be toned down at least. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2073) Namenode is missing @Override annotations

2011-06-15 Thread Jitendra Nath Pandey (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049992#comment-13049992
 ] 

Jitendra Nath Pandey commented on HDFS-2073:


+1 for the patch.

> Namenode is missing @Override annotations
> -
>
> Key: HDFS-2073
> URL: https://issues.apache.org/jira/browse/HDFS-2073
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: Suresh Srinivas
>Assignee: Suresh Srinivas
>Priority: Minor
> Fix For: 0.23.0
>
> Attachments: HDFS-2073.patch
>
>
> NameNode implements several protocols. The methods that implement the 
> interface do not have @Override. Also @inheritdoc is used, which is not 
> needed with @Override.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2071) Use of isConnected() in DataXceiver is invalid

2011-06-15 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049994#comment-13049994
 ] 

Kihwal Lee commented on HDFS-2071:
--

Test failures:
* TestHDFSCLI: The quota related test failures are not due to this patch. They 
also failed in build #696.
* TestHDFSTrash : It was failing in other recent pre-commit builds: e.g. 
https://builds.apache.org/job/PreCommit-HDFS-Build/786/

Tests included: No test is included as justified above.

> Use of isConnected() in DataXceiver is invalid
> --
>
> Key: HDFS-2071
> URL: https://issues.apache.org/jira/browse/HDFS-2071
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.23.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Minor
> Fix For: 0.20.3
>
> Attachments: HDFS-2071.patch
>
>
> The use of Socket.isConnected() in DataXceiver.run() is not valid. It returns 
> false until the connection is made and then always returns true after that. 
> It will never return false after the initial connection is successfully made. 
> Socket.isClosed() or SocketChannel.isOpen() should be used instead, assuming 
> someone is handling SocketException and does Socket.close() or 
> SocketChannel.close(). It seems the op handlers in DataXceiver are diligently 
> using IOUtils.closeStream(), which will invoke SocketChannel.close().
> {code}
> - } while (s.isConnected() && socketKeepaliveTimeout > 0);
> + } while (!s.isClosed() && socketKeepaliveTimeout > 0);
> {code}
> The effect of this bug is very minor, as the socket is read again right 
> after. If the connection was closed, the readOp() will throw an EOFException, 
> which is caught and dealt with properly.  The system still functions normally 
> with probably only few microseconds of extra overhead in the premature 
> connection closure cases.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2054) BlockSender.sendChunk() prints ERROR for connection closures encountered during transferToFully()

2011-06-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049977#comment-13049977
 ] 

Hadoop QA commented on HDFS-2054:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12482699/HDFS-2054.patch
  against trunk revision 1136132.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these core unit tests:
  org.apache.hadoop.cli.TestHDFSCLI
  org.apache.hadoop.hdfs.TestHDFSTrash

+1 contrib tests.  The patch passed contrib unit tests.

+1 system test framework.  The patch passed system test framework compile.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/788//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/788//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/788//console

This message is automatically generated.

> BlockSender.sendChunk() prints ERROR for connection closures encountered  
> during transferToFully()
> --
>
> Key: HDFS-2054
> URL: https://issues.apache.org/jira/browse/HDFS-2054
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node
>Affects Versions: 0.22.0, 0.23.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Minor
> Fix For: 0.22.0, 0.23.0
>
> Attachments: HDFS-2054.patch, HDFS-2054.patch
>
>
> The addition of ERROR was part of HDFS-1527. In environments where clients 
> tear down FSInputStream/connection before reaching the end of stream, this 
> error message often pops up. Since these are not really errors and especially 
> not the fault of data node, the message should be toned down at least. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2071) Use of isConnected() in DataXceiver is invalid

2011-06-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049978#comment-13049978
 ] 

Hadoop QA commented on HDFS-2071:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12482696/HDFS-2071.patch
  against trunk revision 1136132.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these core unit tests:
  org.apache.hadoop.cli.TestHDFSCLI
  org.apache.hadoop.hdfs.TestHDFSTrash

+1 contrib tests.  The patch passed contrib unit tests.

+1 system test framework.  The patch passed system test framework compile.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/787//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/787//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/787//console

This message is automatically generated.

> Use of isConnected() in DataXceiver is invalid
> --
>
> Key: HDFS-2071
> URL: https://issues.apache.org/jira/browse/HDFS-2071
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.23.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Minor
> Fix For: 0.20.3
>
> Attachments: HDFS-2071.patch
>
>
> The use of Socket.isConnected() in DataXceiver.run() is not valid. It returns 
> false until the connection is made and then always returns true after that. 
> It will never return false after the initial connection is successfully made. 
> Socket.isClosed() or SocketChannel.isOpen() should be used instead, assuming 
> someone is handling SocketException and does Socket.close() or 
> SocketChannel.close(). It seems the op handlers in DataXceiver are diligently 
> using IOUtils.closeStream(), which will invoke SocketChannel.close().
> {code}
> - } while (s.isConnected() && socketKeepaliveTimeout > 0);
> + } while (!s.isClosed() && socketKeepaliveTimeout > 0);
> {code}
> The effect of this bug is very minor, as the socket is read again right 
> after. If the connection was closed, the readOp() will throw an EOFException, 
> which is caught and dealt with properly.  The system still functions normally 
> with probably only few microseconds of extra overhead in the premature 
> connection closure cases.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1989) When checkpointing by backup node occurs parallely when a file is being closed by a client then Exception occurs saying no journal streams.

2011-06-15 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049973#comment-13049973
 ] 

Konstantin Shvachko commented on HDFS-1989:
---

Seems like a valid bug.
Client does not directly talk to BN, but NN sends the journal transaction 
(close in the case). And if that kicks in when BN closed edits, but hasn't 
reopened them yet, the exception can happen.
Todd's right the transaction should go into journal spool, but I suspect that 
{{edits.close()}} closes all streams including the spool, and that could be the 
problem.

> When checkpointing by backup node occurs parallely when a file is being 
> closed by a client then Exception occurs saying no journal streams. 
> 
>
> Key: HDFS-1989
> URL: https://issues.apache.org/jira/browse/HDFS-1989
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: ramkrishna.s.vasudevan
> Fix For: 0.23.0
>
>
> Backup namenode initiates the checkpointing process. 
> As a part of checkpointing based on the timestamp it tries to download the 
> FSImage or use the existing one.
> Then it tries to save the FSImage.
> During this time it tries to close the editLog streams.
> Parallely when a client tries to close a file just after the checkpointing 
> process closes the editLog Stream then we get an exception saying
> java.io.IOException: java.lang.IllegalStateException: !!! WARNING !!! File 
> system changes are not persistent. No journal streams.
> Here the saveNameSpace api closes all the editlog streams resulting in this 
> issue.
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2071) Use of isConnected() in DataXceiver is invalid

2011-06-15 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049968#comment-13049968
 ] 

Harsh J commented on HDFS-2071:
---

Gah, missed that, and the comments there. I take back my suggestion :-)

> Use of isConnected() in DataXceiver is invalid
> --
>
> Key: HDFS-2071
> URL: https://issues.apache.org/jira/browse/HDFS-2071
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.23.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Minor
> Fix For: 0.20.3
>
> Attachments: HDFS-2071.patch
>
>
> The use of Socket.isConnected() in DataXceiver.run() is not valid. It returns 
> false until the connection is made and then always returns true after that. 
> It will never return false after the initial connection is successfully made. 
> Socket.isClosed() or SocketChannel.isOpen() should be used instead, assuming 
> someone is handling SocketException and does Socket.close() or 
> SocketChannel.close(). It seems the op handlers in DataXceiver are diligently 
> using IOUtils.closeStream(), which will invoke SocketChannel.close().
> {code}
> - } while (s.isConnected() && socketKeepaliveTimeout > 0);
> + } while (!s.isClosed() && socketKeepaliveTimeout > 0);
> {code}
> The effect of this bug is very minor, as the socket is read again right 
> after. If the connection was closed, the readOp() will throw an EOFException, 
> which is caught and dealt with properly.  The system still functions normally 
> with probably only few microseconds of extra overhead in the premature 
> connection closure cases.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2071) Use of isConnected() in DataXceiver is invalid

2011-06-15 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049959#comment-13049959
 ] 

Kihwal Lee commented on HDFS-2071:
--

HDFS-941 is already in trunk. The current discussion mainly for 0.22.

> Use of isConnected() in DataXceiver is invalid
> --
>
> Key: HDFS-2071
> URL: https://issues.apache.org/jira/browse/HDFS-2071
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.23.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Minor
> Fix For: 0.20.3
>
> Attachments: HDFS-2071.patch
>
>
> The use of Socket.isConnected() in DataXceiver.run() is not valid. It returns 
> false until the connection is made and then always returns true after that. 
> It will never return false after the initial connection is successfully made. 
> Socket.isClosed() or SocketChannel.isOpen() should be used instead, assuming 
> someone is handling SocketException and does Socket.close() or 
> SocketChannel.close(). It seems the op handlers in DataXceiver are diligently 
> using IOUtils.closeStream(), which will invoke SocketChannel.close().
> {code}
> - } while (s.isConnected() && socketKeepaliveTimeout > 0);
> + } while (!s.isClosed() && socketKeepaliveTimeout > 0);
> {code}
> The effect of this bug is very minor, as the socket is read again right 
> after. If the connection was closed, the readOp() will throw an EOFException, 
> which is caught and dealt with properly.  The system still functions normally 
> with probably only few microseconds of extra overhead in the premature 
> connection closure cases.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2071) Use of isConnected() in DataXceiver is invalid

2011-06-15 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049957#comment-13049957
 ] 

Harsh J commented on HDFS-2071:
---

Kihwal - Since HDFS-941 is not yet committed, I think you can comment/review on 
that JIRA regarding this bug. Would be worth putting into as a single commit 
than having this follow up to that?

> Use of isConnected() in DataXceiver is invalid
> --
>
> Key: HDFS-2071
> URL: https://issues.apache.org/jira/browse/HDFS-2071
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.23.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Minor
> Fix For: 0.20.3
>
> Attachments: HDFS-2071.patch
>
>
> The use of Socket.isConnected() in DataXceiver.run() is not valid. It returns 
> false until the connection is made and then always returns true after that. 
> It will never return false after the initial connection is successfully made. 
> Socket.isClosed() or SocketChannel.isOpen() should be used instead, assuming 
> someone is handling SocketException and does Socket.close() or 
> SocketChannel.close(). It seems the op handlers in DataXceiver are diligently 
> using IOUtils.closeStream(), which will invoke SocketChannel.close().
> {code}
> - } while (s.isConnected() && socketKeepaliveTimeout > 0);
> + } while (!s.isClosed() && socketKeepaliveTimeout > 0);
> {code}
> The effect of this bug is very minor, as the socket is read again right 
> after. If the connection was closed, the readOp() will throw an EOFException, 
> which is caught and dealt with properly.  The system still functions normally 
> with probably only few microseconds of extra overhead in the premature 
> connection closure cases.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2054) BlockSender.sendChunk() prints ERROR for connection closures encountered during transferToFully()

2011-06-15 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049952#comment-13049952
 ] 

Kihwal Lee commented on HDFS-2054:
--

No test is added since it only changes the log message. 


> BlockSender.sendChunk() prints ERROR for connection closures encountered  
> during transferToFully()
> --
>
> Key: HDFS-2054
> URL: https://issues.apache.org/jira/browse/HDFS-2054
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node
>Affects Versions: 0.22.0, 0.23.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Minor
> Fix For: 0.22.0, 0.23.0
>
> Attachments: HDFS-2054.patch, HDFS-2054.patch
>
>
> The addition of ERROR was part of HDFS-1527. In environments where clients 
> tear down FSInputStream/connection before reaching the end of stream, this 
> error message often pops up. Since these are not really errors and especially 
> not the fault of data node, the message should be toned down at least. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2054) BlockSender.sendChunk() prints ERROR for connection closures encountered during transferToFully()

2011-06-15 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-2054:
-

Attachment: HDFS-2054.patch

> BlockSender.sendChunk() prints ERROR for connection closures encountered  
> during transferToFully()
> --
>
> Key: HDFS-2054
> URL: https://issues.apache.org/jira/browse/HDFS-2054
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node
>Affects Versions: 0.22.0, 0.23.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Minor
> Fix For: 0.22.0, 0.23.0
>
> Attachments: HDFS-2054.patch, HDFS-2054.patch
>
>
> The addition of ERROR was part of HDFS-1527. In environments where clients 
> tear down FSInputStream/connection before reaching the end of stream, this 
> error message often pops up. Since these are not really errors and especially 
> not the fault of data node, the message should be toned down at least. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2054) BlockSender.sendChunk() prints ERROR for connection closures encountered during transferToFully()

2011-06-15 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-2054:
-

Fix Version/s: 0.23.0
   0.22.0
   Status: Patch Available  (was: Open)

> BlockSender.sendChunk() prints ERROR for connection closures encountered  
> during transferToFully()
> --
>
> Key: HDFS-2054
> URL: https://issues.apache.org/jira/browse/HDFS-2054
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node
>Affects Versions: 0.22.0, 0.23.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Minor
> Fix For: 0.22.0, 0.23.0
>
> Attachments: HDFS-2054.patch, HDFS-2054.patch
>
>
> The addition of ERROR was part of HDFS-1527. In environments where clients 
> tear down FSInputStream/connection before reaching the end of stream, this 
> error message often pops up. Since these are not really errors and especially 
> not the fault of data node, the message should be toned down at least. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2071) Use of isConnected() in DataXceiver is invalid

2011-06-15 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049946#comment-13049946
 ] 

Kihwal Lee commented on HDFS-2071:
--

The system functions without patch with almost no noticeable timing difference.
No test is added since the correctness cannot be checked by running tests.

> Use of isConnected() in DataXceiver is invalid
> --
>
> Key: HDFS-2071
> URL: https://issues.apache.org/jira/browse/HDFS-2071
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.23.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Minor
> Fix For: 0.20.3
>
> Attachments: HDFS-2071.patch
>
>
> The use of Socket.isConnected() in DataXceiver.run() is not valid. It returns 
> false until the connection is made and then always returns true after that. 
> It will never return false after the initial connection is successfully made. 
> Socket.isClosed() or SocketChannel.isOpen() should be used instead, assuming 
> someone is handling SocketException and does Socket.close() or 
> SocketChannel.close(). It seems the op handlers in DataXceiver are diligently 
> using IOUtils.closeStream(), which will invoke SocketChannel.close().
> {code}
> - } while (s.isConnected() && socketKeepaliveTimeout > 0);
> + } while (!s.isClosed() && socketKeepaliveTimeout > 0);
> {code}
> The effect of this bug is very minor, as the socket is read again right 
> after. If the connection was closed, the readOp() will throw an EOFException, 
> which is caught and dealt with properly.  The system still functions normally 
> with probably only few microseconds of extra overhead in the premature 
> connection closure cases.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2071) Use of isConnected() in DataXceiver is invalid

2011-06-15 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-2071:
-

Attachment: HDFS-2071.patch

> Use of isConnected() in DataXceiver is invalid
> --
>
> Key: HDFS-2071
> URL: https://issues.apache.org/jira/browse/HDFS-2071
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.23.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Minor
> Fix For: 0.20.3
>
> Attachments: HDFS-2071.patch
>
>
> The use of Socket.isConnected() in DataXceiver.run() is not valid. It returns 
> false until the connection is made and then always returns true after that. 
> It will never return false after the initial connection is successfully made. 
> Socket.isClosed() or SocketChannel.isOpen() should be used instead, assuming 
> someone is handling SocketException and does Socket.close() or 
> SocketChannel.close(). It seems the op handlers in DataXceiver are diligently 
> using IOUtils.closeStream(), which will invoke SocketChannel.close().
> {code}
> - } while (s.isConnected() && socketKeepaliveTimeout > 0);
> + } while (!s.isClosed() && socketKeepaliveTimeout > 0);
> {code}
> The effect of this bug is very minor, as the socket is read again right 
> after. If the connection was closed, the readOp() will throw an EOFException, 
> which is caught and dealt with properly.  The system still functions normally 
> with probably only few microseconds of extra overhead in the premature 
> connection closure cases.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2071) Use of isConnected() in DataXceiver is invalid

2011-06-15 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-2071:
-

Fix Version/s: 0.20.3
   Status: Patch Available  (was: Open)

> Use of isConnected() in DataXceiver is invalid
> --
>
> Key: HDFS-2071
> URL: https://issues.apache.org/jira/browse/HDFS-2071
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.23.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Minor
> Fix For: 0.20.3
>
> Attachments: HDFS-2071.patch
>
>
> The use of Socket.isConnected() in DataXceiver.run() is not valid. It returns 
> false until the connection is made and then always returns true after that. 
> It will never return false after the initial connection is successfully made. 
> Socket.isClosed() or SocketChannel.isOpen() should be used instead, assuming 
> someone is handling SocketException and does Socket.close() or 
> SocketChannel.close(). It seems the op handlers in DataXceiver are diligently 
> using IOUtils.closeStream(), which will invoke SocketChannel.close().
> {code}
> - } while (s.isConnected() && socketKeepaliveTimeout > 0);
> + } while (!s.isClosed() && socketKeepaliveTimeout > 0);
> {code}
> The effect of this bug is very minor, as the socket is read again right 
> after. If the connection was closed, the readOp() will throw an EOFException, 
> which is caught and dealt with properly.  The system still functions normally 
> with probably only few microseconds of extra overhead in the premature 
> connection closure cases.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HDFS-2071) Use of isConnected() in DataXceiver is invalid

2011-06-15 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee reassigned HDFS-2071:


Assignee: Kihwal Lee

> Use of isConnected() in DataXceiver is invalid
> --
>
> Key: HDFS-2071
> URL: https://issues.apache.org/jira/browse/HDFS-2071
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.23.0
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Minor
>
> The use of Socket.isConnected() in DataXceiver.run() is not valid. It returns 
> false until the connection is made and then always returns true after that. 
> It will never return false after the initial connection is successfully made. 
> Socket.isClosed() or SocketChannel.isOpen() should be used instead, assuming 
> someone is handling SocketException and does Socket.close() or 
> SocketChannel.close(). It seems the op handlers in DataXceiver are diligently 
> using IOUtils.closeStream(), which will invoke SocketChannel.close().
> {code}
> - } while (s.isConnected() && socketKeepaliveTimeout > 0);
> + } while (!s.isClosed() && socketKeepaliveTimeout > 0);
> {code}
> The effect of this bug is very minor, as the socket is read again right 
> after. If the connection was closed, the readOp() will throw an EOFException, 
> which is caught and dealt with properly.  The system still functions normally 
> with probably only few microseconds of extra overhead in the premature 
> connection closure cases.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-1981) When namenode goes down while checkpointing and if is started again subsequent Checkpointing is always failing

2011-06-15 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-1981:
--

 Priority: Blocker  (was: Major)
Affects Version/s: (was: 0.23.0)
   0.22.0
Fix Version/s: (was: 0.23.0)
   0.22.0

I have seen this bug in 0.22. Making it a blocker. Will review the patch [very] 
soon.

> When namenode goes down while checkpointing and if is started again 
> subsequent Checkpointing is always failing
> --
>
> Key: HDFS-1981
> URL: https://issues.apache.org/jira/browse/HDFS-1981
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.22.0
> Environment: Linux
>Reporter: ramkrishna.s.vasudevan
>Priority: Blocker
> Fix For: 0.22.0
>
> Attachments: HDFS-1981.patch
>
>
> This scenario is applicable in NN and BNN case.
> When the namenode goes down after creating the edits.new, on subsequent 
> restart the divertFileStreams will not happen to edits.new as the edits.new 
> file is already present and the size is zero.
> so on trying to saveCheckPoint an exception occurs 
> 2011-05-23 16:38:57,476 WARN org.mortbay.log: /getimage: java.io.IOException: 
> GetImage failed. java.io.IOException: Namenode has an edit log with timestamp 
> of 2011-05-23 16:38:56 but new checkpoint was created using editlog  with 
> timestamp 2011-05-23 16:37:30. Checkpoint Aborted.
> This is a bug or is that the behaviour.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1692) In secure mode, Datanode process doesn't exit when disks fail.

2011-06-15 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049925#comment-13049925
 ] 

Suresh Srinivas commented on HDFS-1692:
---

Minor comments:
# DataNode.java - in finally block, LOG.warn is more appropriate
# DataXceiverServer.java - Catching AsynchronousCloseException may not be 
neccessary


> In secure mode, Datanode process doesn't exit when disks fail.
> --
>
> Key: HDFS-1692
> URL: https://issues.apache.org/jira/browse/HDFS-1692
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.20.204.0, 0.23.0
>Reporter: Bharath Mundlapudi
>Assignee: Bharath Mundlapudi
> Fix For: 0.20.204.0, 0.23.0
>
> Attachments: HDFS-1692-1.patch, HDFS-1692-v0.23-1.patch
>
>
> In secure mode, when disks fail more than volumes tolerated, datanode process 
> doesn't exit properly and it just hangs even though shutdown method is 
> called. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1981) When namenode goes down while checkpointing and if is started again subsequent Checkpointing is always failing

2011-06-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049924#comment-13049924
 ] 

Hadoop QA commented on HDFS-1981:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12482669/HDFS-1981.patch
  against trunk revision 1135329.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 4 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The applied patch generated 32 javac compiler warnings (more 
than the trunk's current 31 warnings).

-1 findbugs.  The patch appears to introduce 1 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these core unit tests:
  org.apache.hadoop.cli.TestHDFSCLI
  org.apache.hadoop.hdfs.TestHDFSTrash

+1 contrib tests.  The patch passed contrib unit tests.

+1 system test framework.  The patch passed system test framework compile.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/786//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/786//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/786//console

This message is automatically generated.

> When namenode goes down while checkpointing and if is started again 
> subsequent Checkpointing is always failing
> --
>
> Key: HDFS-1981
> URL: https://issues.apache.org/jira/browse/HDFS-1981
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.23.0
> Environment: Linux
>Reporter: ramkrishna.s.vasudevan
> Fix For: 0.23.0
>
> Attachments: HDFS-1981.patch
>
>
> This scenario is applicable in NN and BNN case.
> When the namenode goes down after creating the edits.new, on subsequent 
> restart the divertFileStreams will not happen to edits.new as the edits.new 
> file is already present and the size is zero.
> so on trying to saveCheckPoint an exception occurs 
> 2011-05-23 16:38:57,476 WARN org.mortbay.log: /getimage: java.io.IOException: 
> GetImage failed. java.io.IOException: Namenode has an edit log with timestamp 
> of 2011-05-23 16:38:56 but new checkpoint was created using editlog  with 
> timestamp 2011-05-23 16:37:30. Checkpoint Aborted.
> This is a bug or is that the behaviour.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1989) When checkpointing by backup node occurs parallely when a file is being closed by a client then Exception occurs saying no journal streams.

2011-06-15 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049914#comment-13049914
 ] 

Todd Lipcon commented on HDFS-1989:
---

my point is that the client cannot issue a close() at this time, because the 
BNN has diverted its logs from "apply" mode to "spool" mode, and clients don't 
talk directly to the BN.

> When checkpointing by backup node occurs parallely when a file is being 
> closed by a client then Exception occurs saying no journal streams. 
> 
>
> Key: HDFS-1989
> URL: https://issues.apache.org/jira/browse/HDFS-1989
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: ramkrishna.s.vasudevan
> Fix For: 0.23.0
>
>
> Backup namenode initiates the checkpointing process. 
> As a part of checkpointing based on the timestamp it tries to download the 
> FSImage or use the existing one.
> Then it tries to save the FSImage.
> During this time it tries to close the editLog streams.
> Parallely when a client tries to close a file just after the checkpointing 
> process closes the editLog Stream then we get an exception saying
> java.io.IOException: java.lang.IllegalStateException: !!! WARNING !!! File 
> system changes are not persistent. No journal streams.
> Here the saveNameSpace api closes all the editlog streams resulting in this 
> issue.
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-1942) If all Block Pool service threads exit then datanode should exit.

2011-06-15 Thread Suresh Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-1942:
--

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

+1 for the patch.

I committed it. Thank you Bharath.

> If all Block Pool service threads exit then datanode should exit.
> -
>
> Key: HDFS-1942
> URL: https://issues.apache.org/jira/browse/HDFS-1942
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.23.0
>Reporter: Bharath Mundlapudi
>Assignee: Bharath Mundlapudi
> Attachments: HDFS-1942-1.patch, HDFS-1942-2.patch, HDFS-1942-3.patch
>
>
> Currently, if all block pool service threads exit, Datanode continue to run. 
> This should be fixed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-15 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049863#comment-13049863
 ] 

stack commented on HDFS-941:


I reran tests, same three failed.  I backed out my patch and the same three 
failed.  So, this patch does not seem to be responsible for these test failures 
on my machine.

I'm +1 on commit.

> Datanode xceiver protocol should allow reuse of a connection
> 
>
> Key: HDFS-941
> URL: https://issues.apache.org/jira/browse/HDFS-941
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node, hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: bc Wong
> Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, 
> HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, 
> HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, 
> HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, 
> hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png
>
>
> Right now each connection into the datanode xceiver only processes one 
> operation.
> In the case that an operation leaves the stream in a well-defined state (eg a 
> client reads to the end of a block successfully) the same connection could be 
> reused for a second operation. This should improve random read performance 
> significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1805) Some Tests in TestDFSShell can not shutdown the MiniDFSCluster on any exception/assertion failure. This will leads to fail other testcases.

2011-06-15 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049800#comment-13049800
 ] 

ramkrishna.s.vasudevan commented on HDFS-1805:
--

Hi Daryn,
Thanks for the comments.

Few points to be clarified,
Making the cluster and conf as final:

@BeforeClass
public static void startCluster() throws Exception {
conf = new HdfsConfiguration();
cluster = new 
MiniDFSCluster.Builder(conf).numDataNodes(2).build();
}

'conf' can be made final by moving out of this @BeforeClass api.
But 'cluster' if we try to move outside the @BeforeClass then the build() api 
throws exception.  So it can be moved out like 
final static MiniDFSCluster cluster = new 
MiniDFSCluster.Builder(conf).numDataNodes(2).build();


The FileUtil.fullyDelete(fs, new Path("/"))
is deprecated.

If we use the FileUtil.fullyDeleteContents(File dir)  accepts specific file 
name or directory name.  So can we proceed with the api deleteFromDFS().
Based on the comments will prepare patch. Thanks.

> Some Tests in TestDFSShell can not shutdown the MiniDFSCluster on any 
> exception/assertion failure. This will leads to fail other testcases.
> ---
>
> Key: HDFS-1805
> URL: https://issues.apache.org/jira/browse/HDFS-1805
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.23.0
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Minor
> Fix For: 0.23.0
>
> Attachments: HDFS-1805-1.patch, HDFS-1805-2.patch, HDFS-1805-3.patch, 
> HDFS-1805.patch
>
>
> Some test cases in TestDFSShell are not shutting down the MiniDFSCluster in 
> finally.
> If any test assertion failure or exception can result in not shutting down 
> this cluster. Because of this other testcases will fail. This will create 
> difficulty in finding the actual testcase failures.
> So, better to shutdown the cluster in finally. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-1981) When namenode goes down while checkpointing and if is started again subsequent Checkpointing is always failing

2011-06-15 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HDFS-1981:
-

Attachment: HDFS-1981.patch

> When namenode goes down while checkpointing and if is started again 
> subsequent Checkpointing is always failing
> --
>
> Key: HDFS-1981
> URL: https://issues.apache.org/jira/browse/HDFS-1981
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.23.0
> Environment: Linux
>Reporter: ramkrishna.s.vasudevan
> Fix For: 0.23.0
>
> Attachments: HDFS-1981.patch
>
>
> This scenario is applicable in NN and BNN case.
> When the namenode goes down after creating the edits.new, on subsequent 
> restart the divertFileStreams will not happen to edits.new as the edits.new 
> file is already present and the size is zero.
> so on trying to saveCheckPoint an exception occurs 
> 2011-05-23 16:38:57,476 WARN org.mortbay.log: /getimage: java.io.IOException: 
> GetImage failed. java.io.IOException: Namenode has an edit log with timestamp 
> of 2011-05-23 16:38:56 but new checkpoint was created using editlog  with 
> timestamp 2011-05-23 16:37:30. Checkpoint Aborted.
> This is a bug or is that the behaviour.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-1981) When namenode goes down while checkpointing and if is started again subsequent Checkpointing is always failing

2011-06-15 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HDFS-1981:
-

Status: Patch Available  (was: Open)

> When namenode goes down while checkpointing and if is started again 
> subsequent Checkpointing is always failing
> --
>
> Key: HDFS-1981
> URL: https://issues.apache.org/jira/browse/HDFS-1981
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.23.0
> Environment: Linux
>Reporter: ramkrishna.s.vasudevan
> Fix For: 0.23.0
>
> Attachments: HDFS-1981.patch
>
>
> This scenario is applicable in NN and BNN case.
> When the namenode goes down after creating the edits.new, on subsequent 
> restart the divertFileStreams will not happen to edits.new as the edits.new 
> file is already present and the size is zero.
> so on trying to saveCheckPoint an exception occurs 
> 2011-05-23 16:38:57,476 WARN org.mortbay.log: /getimage: java.io.IOException: 
> GetImage failed. java.io.IOException: Namenode has an edit log with timestamp 
> of 2011-05-23 16:38:56 but new checkpoint was created using editlog  with 
> timestamp 2011-05-23 16:37:30. Checkpoint Aborted.
> This is a bug or is that the behaviour.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2076) ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(1

2011-06-15 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049767#comment-13049767
 ] 

Harsh J commented on HDFS-2076:
---

This appears to be more of a usage question with the details given than a full 
blown, qualified issue.

Please post such questions first to hdfs-u...@hadoop.apache.org since the JIRA 
is only to track confirmed issues (which can be confirmed as a result on user 
or dev discussion lists, if under doubt).

Please post your DN log to a pastebin and link it with your message, since that 
helps determine what is going wrong.

P.s. The error messages were worked upon and in future they would be more 
descriptive.

> ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(1
> -
>
> Key: HDFS-2076
> URL: https://issues.apache.org/jira/browse/HDFS-2076
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.20.2
> Environment: hadoop -hdfs
>Reporter: chakali ranga swamy
>
> see sir
> datanode log socket and datasteam problem unable to upload text file to DFS i 
> deleted tmp folders dfs and mapred again i formated "hadoop namenode -format"
> start-all.sh done then
> dfs folder contains:
> data node ,name node,secondarynamenode
> mapred: empty
> about space:-
> linux-8ysi:/etc/hadoop/hadoop-0.20.2 # df -h
> Filesystem Size Used Avail Use% Mounted on
> /dev/sda5 25G 16G 7.4G 69% /
> udev 987M 212K 986M 1% /dev
> /dev/sda7 42G 5.5G 34G 14% /home
> ---
> http://localhost:50070/dfshealth.jsp--
> NameNode 'localhost:54310'
> Started: Wed Jun 15 04:13:14 IST 2011
> Version: 0.20.2, r911707
> Compiled: Fri Feb 19 08:07:34 UTC 2010 by chrisdo
> Upgrades: There are no upgrades in progress.
> Browse the filesystem
> Namenode Logs
> Cluster Summary
> 10 files and directories, 0 blocks = 10 total. Heap Size is 15.5 MB / 966.69 
> MB (1%)
> Configured Capacity : 24.61 GB
> DFS Used : 24 KB
> Non DFS Used : 17.23 GB
> DFS Remaining : 7.38 GB
> DFS Used% : 0 %
> DFS Remaining% : 29.99 %
> Live Nodes : 1
> Dead Nodes : 0
> NameNode Storage:
> Storage Directory Type State
> /tmp/Testinghadoop/dfs/name IMAGE_AND_EDITS Active
> Hadoop, 2011.
> 
> core-site.xml
> -
> 
> 
> 
> 
> 
> hadoop.tmp.dir
> /tmp/Testinghadoop/
> A base for other temporary directories.
> 
> 
> fs.default.name
> hdfs://localhost:54310
> The name of the default file system. A URI whose
> scheme and authority determine the FileSystem implementation. The
> uri's scheme determines the config property (fs.SCHEME.impl) naming
> the FileSystem implementation class. The uri's authority is used to
> determine the host, port, etc. for a filesystem.
> 
> 
> 
> hdfs-site.xml
> --
> 
> 
> 
> 
> 
> dfs.permissions
> true
> 
> If "true", enable permission checking in HDFS.
> If "false", permission checking is turned off,
> but all other behavior is unchanged.
> Switching from one parameter value to the other does not change the mode,
> owner, or group of files or directories.
> 
> 
> 
> dfs.replication
> 1
> Default block replication.
> The actual number of replications can be specified when the file is created.
> The default is used if replication is not specified in create time.
> 
> 
> 
> ---
> mapred-site.xml
> --
> 
> 
> 
> 
> 
> mapred.job.tracker
> localhost:54311
> The host and port that the MapReduce job tracker runs
> at. If "local", then jobs are run in-process as a single map
> and reduce task.
> 
> 
> 
> --
> please give suggetions about this error:
> --
> linux-8ysi:/etc/hadoop/hadoop-0.20.2/conf # hadoop fsck /
> RUN_JAVA
> /usr/java/jre1.6.0_25/bin/java
> .Status: HEALTHY
> Total size: 0 B
> Total dirs: 7
> Total files: 1 (Files currently being written: 1)
> Total blocks (validated): 0
> Minimally replicated blocks: 0
> Over-replicated blocks: 0
> Under-replicated blocks: 0
> Mis-replicated blocks: 0
> Default replication factor: 1
> Average block replication: 0.0
> Corrupt blocks: 0
> Missing replicas: 0
> Number of data-nodes: 1
> Number of racks: 1
> The filesystem under path '/' is HEALTHY
> linux-8ysi:/etc/hadoop/hadoop-0.20.2/conf # hadoop dfsadmin -report
> RUN_JAVA
> /usr/java/jre1.6.0_25/bin/java
> Configured Capacity: 26425618432 (24.61 GB)
> Present Capacity: 7923564544 (7.38 GB)
> DFS Remaining: 7923539968 (7.38 GB)
> DFS Used: 2457

[jira] [Created] (HDFS-2076) ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(1

2011-06-15 Thread chakali ranga swamy (JIRA)
ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(1
-

 Key: HDFS-2076
 URL: https://issues.apache.org/jira/browse/HDFS-2076
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.20.2
 Environment: hadoop -hdfs
Reporter: chakali ranga swamy


see sir
datanode log socket and datasteam problem unable to upload text file to DFS i 
deleted tmp folders dfs and mapred again i formated "hadoop namenode -format"
start-all.sh done then
dfs folder contains:
data node ,name node,secondarynamenode
mapred: empty
about space:-
linux-8ysi:/etc/hadoop/hadoop-0.20.2 # df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda5 25G 16G 7.4G 69% /
udev 987M 212K 986M 1% /dev
/dev/sda7 42G 5.5G 34G 14% /home
---
http://localhost:50070/dfshealth.jsp--

NameNode 'localhost:54310'
Started: Wed Jun 15 04:13:14 IST 2011
Version: 0.20.2, r911707
Compiled: Fri Feb 19 08:07:34 UTC 2010 by chrisdo
Upgrades: There are no upgrades in progress.

Browse the filesystem
Namenode Logs
Cluster Summary
10 files and directories, 0 blocks = 10 total. Heap Size is 15.5 MB / 966.69 MB 
(1%)
Configured Capacity : 24.61 GB
DFS Used : 24 KB
Non DFS Used : 17.23 GB
DFS Remaining : 7.38 GB
DFS Used% : 0 %
DFS Remaining% : 29.99 %
Live Nodes : 1
Dead Nodes : 0

NameNode Storage:
Storage Directory Type State
/tmp/Testinghadoop/dfs/name IMAGE_AND_EDITS Active

Hadoop, 2011.

core-site.xml
-








hadoop.tmp.dir
/tmp/Testinghadoop/
A base for other temporary directories.



fs.default.name
hdfs://localhost:54310
The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.




hdfs-site.xml
--








dfs.permissions
true

If "true", enable permission checking in HDFS.
If "false", permission checking is turned off,
but all other behavior is unchanged.
Switching from one parameter value to the other does not change the mode,
owner, or group of files or directories.




dfs.replication
1
Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.




---
mapred-site.xml
--







mapred.job.tracker
localhost:54311
The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.



--

please give suggetions about this error:
--
linux-8ysi:/etc/hadoop/hadoop-0.20.2/conf # hadoop fsck /
RUN_JAVA
/usr/java/jre1.6.0_25/bin/java
.Status: HEALTHY
Total size: 0 B
Total dirs: 7
Total files: 1 (Files currently being written: 1)
Total blocks (validated): 0
Minimally replicated blocks: 0
Over-replicated blocks: 0
Under-replicated blocks: 0
Mis-replicated blocks: 0
Default replication factor: 1
Average block replication: 0.0
Corrupt blocks: 0
Missing replicas: 0
Number of data-nodes: 1
Number of racks: 1


The filesystem under path '/' is HEALTHY

linux-8ysi:/etc/hadoop/hadoop-0.20.2/conf # hadoop dfsadmin -report
RUN_JAVA
/usr/java/jre1.6.0_25/bin/java
Configured Capacity: 26425618432 (24.61 GB)
Present Capacity: 7923564544 (7.38 GB)
DFS Remaining: 7923539968 (7.38 GB)
DFS Used: 24576 (24 KB)
DFS Used%: 0%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-
Datanodes available: 1 (1 total, 0 dead)

Name: 127.0.0.1:50010
Decommission Status : Normal
Configured Capacity: 26425618432 (24.61 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 18502053888 (17.23 GB)
DFS Remaining: 7923539968(7.38 GB)
DFS Used%: 0%
DFS Remaining%: 29.98%
Last contact: Wed Jun 15 05:54:00 IST 2011

i got this error:


linux-8ysi:/etc/hadoop/hadoop-0.20.2 # hadoop dfs -put spo.txt In
RUN_JAVA
/usr/java/jre1.6.0_25/bin/java
11/06/15 04:50:18 WARN hdfs.DFSClient: DataStreamer Exception: 
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File 
/user/root/In/spo.txt could only be replicated to 0 nodes, instead of 1
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
at sun.re

[jira] [Updated] (HDFS-1788) FsShell ls: Show symlinks properties

2011-06-15 Thread Bochun Bai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bochun Bai updated HDFS-1788:
-

Attachment: HDFS-1788.patch

A field fs.shell.PathData.statLink is added.
PathData.stat is renamed to statReal.

FsShell-ls uses statLink for all.
Other commands like -cat and -cp will use statReal.

> FsShell ls: Show symlinks properties
> 
>
> Key: HDFS-1788
> URL: https://issues.apache.org/jira/browse/HDFS-1788
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Reporter: Jonathan Eagles
>Assignee: John George
>Priority: Minor
> Attachments: HDFS-1788.patch
>
>
> ls FsShell command implementation has been consistent with the linux 
> implementations of ls \-l. With the addition of symlinks, I would expect the 
> ability to show file type 'd' for directory, '\-' for file, and 'l' for 
> symlink. In addition, following the linkname entry for symlinks, I would 
> expect the ability to show "\-> ". In linux, the default is to 
> the the properties of the link and not of the link target. In linux, '-L' 
> option allows for the dereferencing of symlinks to show link target 
> properties, but it is not the default. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1989) When checkpointing by backup node occurs parallely when a file is being closed by a client then Exception occurs saying no journal streams.

2011-06-15 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049710#comment-13049710
 ] 

ramkrishna.s.vasudevan commented on HDFS-1989:
--

Hi Todd,
In the backup name node side during checkpointing 
{noformat}
bnImage.loadCheckpoint(sig);
sig.validateStorageInfo(bnImage);
bnImage.saveCheckpoint();
{noformat}

{noformat}
void saveCheckpoint() throws IOException {
saveNamespace(false);
  }
{noformat}

In savenamespace
{noformat}
  void saveNamespace(boolean renewCheckpointTime) throws IOException {
 
// try to restore all failed edit logs here
assert editLog != null : "editLog must be initialized";
storage.attemptRestoreRemovedStorage();

editLog.close();
{noformat}


So here the editlogs are getting closed in the Checkpoint flow.

This is where the problem comes when the client tries to issue a close file 
after editLog.close() is exceuted.



> When checkpointing by backup node occurs parallely when a file is being 
> closed by a client then Exception occurs saying no journal streams. 
> 
>
> Key: HDFS-1989
> URL: https://issues.apache.org/jira/browse/HDFS-1989
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: ramkrishna.s.vasudevan
> Fix For: 0.23.0
>
>
> Backup namenode initiates the checkpointing process. 
> As a part of checkpointing based on the timestamp it tries to download the 
> FSImage or use the existing one.
> Then it tries to save the FSImage.
> During this time it tries to close the editLog streams.
> Parallely when a client tries to close a file just after the checkpointing 
> process closes the editLog Stream then we get an exception saying
> java.io.IOException: java.lang.IllegalStateException: !!! WARNING !!! File 
> system changes are not persistent. No journal streams.
> Here the saveNameSpace api closes all the editlog streams resulting in this 
> issue.
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira