[jira] [Created] (HDFS-11421) Make WebHDFS' ACLs RegEx configurable
Harsh J created HDFS-11421: -- Summary: Make WebHDFS' ACLs RegEx configurable Key: HDFS-11421 URL: https://issues.apache.org/jira/browse/HDFS-11421 Project: Hadoop HDFS Issue Type: Improvement Components: webhdfs Reporter: Harsh J Assignee: Harsh J Part of HDFS-5608 added support for GET/SET ACLs over WebHDFS. This currently identifies the passed arguments via a hard-coded regex that mandates certain group and user naming styles. A similar limitation had existed before for CHOWN and other User/Group set related operations of WebHDFS, where it was then made configurable via HDFS-11391 + HDFS-4983. Such configurability should be allowed for the ACL operations too. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-2569) DN decommissioning quirks
[ https://issues.apache.org/jira/browse/HDFS-2569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-2569. --- Resolution: Cannot Reproduce Assignee: (was: Harsh J) Cannot quite reproduce this on current versions. > DN decommissioning quirks > - > > Key: HDFS-2569 > URL: https://issues.apache.org/jira/browse/HDFS-2569 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 0.23.0 >Reporter: Harsh J > > Decommissioning a node is working slightly odd in 0.23+: > The steps I did: > - Start HDFS via {{hdfs namenode}} and {{hdfs datanode}}. 1-node cluster. > - Zero files/blocks, so I go ahead and exclude-add my DN and do {{hdfs > dfsadmin -refreshNodes}} > - I see the following log in NN tails, which is fine: > {code} > 11/11/20 09:28:10 INFO util.HostsFileReader: Setting the includes file to > 11/11/20 09:28:10 INFO util.HostsFileReader: Setting the excludes file to > build/test/excludes > 11/11/20 09:28:10 INFO util.HostsFileReader: Refreshing hosts > (include/exclude) list > 11/11/20 09:28:10 INFO util.HostsFileReader: Adding 192.168.1.23 to the list > of hosts from build/test/excludes > {code} > - However, DN log tail gets no new messages. DN still runs. > - The dfshealth.jsp page shows this table, which makes no sense -- why is > there 1 live and 1 dead?: > |Live Nodes|1 (Decommissioned: 1)| > |Dead Nodes|1 (Decommissioned: 0)| > |Decommissioning Nodes|0| > - The live nodes page shows this, meaning DN is still up and heartbeating but > is decommissioned: > |Node|Last Contact|Admin State| > |192.168.1.23|0|Decommissioned| > - The dead nodes page shows this, and the link to the DN is broken cause the > port is linked as -1. Also, showing 'false' for decommissioned makes no sense > when live node page shows that it is already decommissioned: > |Node|Decommissioned| > |192.168.1.23|false| > Investigating if this is a quirk only observed when the DN had 0 blocks on it > in sum total. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-11012) Unnecessary INFO logging on DFSClients for InvalidToken
Harsh J created HDFS-11012: -- Summary: Unnecessary INFO logging on DFSClients for InvalidToken Key: HDFS-11012 URL: https://issues.apache.org/jira/browse/HDFS-11012 Project: Hadoop HDFS Issue Type: Improvement Components: fs Affects Versions: 2.5.0 Reporter: Harsh J Assignee: Harsh J Priority: Minor In situations where a DFSClient would receive an InvalidToken exception (as described at [1]), a single retry is automatically made (as observed at [2]). However, we still print an INFO message into the DFSClient's logger even though the message is expected in some scenarios. This should ideally be a DEBUG level message to avoid confusion. If the retry or the retried attempt fails, the final clause handles it anyway and prints out a proper WARN (as seen at [3]) so the INFO is unnecessary. [1] - https://github.com/apache/hadoop/blob/release-2.7.0/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java#L1330-L1356 [2] - https://github.com/apache/hadoop/blob/release-2.7.0/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java#L649-L651 and https://github.com/apache/hadoop/blob/release-2.7.0/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java#L1163-L1170 [3] - https://github.com/apache/hadoop/blob/release-2.7.0/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java#L652-L658 and https://github.com/apache/hadoop/blob/release-2.7.0/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java#L1171-L1177 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-6542) WebHDFSFileSystem doesn't transmit desired checksum type
[ https://issues.apache.org/jira/browse/HDFS-6542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-6542. --- Resolution: Duplicate I missed this JIRA when searching before I filed HDFS-10237, but now noticed via association to HADOOP-8240. Since I've already posted a patch on HDFS-10237 and there's no ongoing work/assignee here, am marking this as a duplicate of HDFS-10237. Sorry for the extra noise! > WebHDFSFileSystem doesn't transmit desired checksum type > > > Key: HDFS-6542 > URL: https://issues.apache.org/jira/browse/HDFS-6542 > Project: Hadoop HDFS > Issue Type: Bug > Components: webhdfs >Reporter: Andrey Stepachev >Priority: Minor > > Currently DFSClient has possibility to specify desired checksum type. This > behaviour controlled by dfs.checksym.type parameter settable by client. > It works with hdfs:// filesystem, but doesn't works with webhdfs.It fails to > work because webhdfs will use default type of checksumming initialised by > server instance of DFSClient. > As example https://issues.apache.org/jira/browse/HADOOP-8240 doesn't works > with webhdfs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9949) Testcase for catching DN UUID regeneration regression
Harsh J created HDFS-9949: - Summary: Testcase for catching DN UUID regeneration regression Key: HDFS-9949 URL: https://issues.apache.org/jira/browse/HDFS-9949 Project: Hadoop HDFS Issue Type: Test Affects Versions: 2.6.0 Reporter: Harsh J Assignee: Harsh J Priority: Minor In the following scenario, in releases without HDFS-8211, the DN may regenerate its UUIDs unintentionally. 0. Consider a DN with two disks {{/data1/dfs/dn,/data2/dfs/dn}} 1. Stop DN 2. Unmount the second disk, {{/data2/dfs/dn}} 3. Create (in the scenario, this was an accident) /data2/dfs/dn on the root path 4. Start DN 5. DN now considers /data2/dfs/dn empty so formats it, but during the format it uses {{datanode.getDatanodeUuid()}} which is null until register() is called. 6. As a result, after the directory loading, {{datanode.checkDatanodUuid()}} gets called with successful condition, and it causes a new generation of UUID which is written to all disks {{/data1/dfs/dn/current/VERSION}} and {{/data2/dfs/dn/current/VERSION}}. 7. Stop DN (in the scenario, this was when the mistake of unmounted disk was realised) 8. Mount the second disk back again {{/data2/dfs/dn}}, causing the {{VERSION}} file to be the original one again on it (mounting masks the root path that we last generated upon). 9. DN fails to start up cause it finds mismatched UUID between the two disks The DN should not generate a new UUID if one of the storage disks already have the older one. HDFS-8211 unintentionally fixes this by changing the {{datanode.getDatanodeUuid()}} function to rely on the {{DataStorage}} representation of the UUID vs. the {{DatanodeID}} object which only gets available (non-null) _after_ the registration. It'd still be good to add a direct test case to the above scenario that passes on trunk and branch-2, but fails on branch-2.7 and lower, so we can catch a regression around this in future. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-8475) Exception in createBlockOutputStream java.io.EOFException: Premature EOF: no length prefix available
[ https://issues.apache.org/jira/browse/HDFS-8475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-8475. --- Resolution: Not A Bug I don't see a bug reported here - the report says the write was done with a single replica and that the single replica was manually corrupted. Please post to u...@hadoop.apache.org for problems observed in usage. If you plan to reopen this, please post precise steps of how the bug may be reproduced. I'd recommend looking at your NN and DN logs to trace further on what's happening. > Exception in createBlockOutputStream java.io.EOFException: Premature EOF: no > length prefix available > > > Key: HDFS-8475 > URL: https://issues.apache.org/jira/browse/HDFS-8475 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Vinod Valecha >Priority: Blocker > > Scenraio: > = > write a file > corrupt block manually > Exception stack trace- > 2015-05-24 02:31:55.291 INFO [T-33716795] > [org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer] Exception in > createBlockOutputStream > java.io.EOFException: Premature EOF: no length prefix available > at > org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:1492) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1155) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1088) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514) > [5/24/15 2:31:55:291 UTC] 02027a3b DFSClient I > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer createBlockOutputStream > Exception in createBlockOutputStream > java.io.EOFException: Premature EOF: no > length prefix available > at > org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:1492) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1155) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1088) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514) > 2015-05-24 02:31:55.291 INFO [T-33716795] > [org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer] Abandoning > BP-176676314-10.108.106.59-1402620296713:blk_1404621403_330880579 > [5/24/15 2:31:55:291 UTC] 02027a3b DFSClient I > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer nextBlockOutputStream > Abandoning BP-176676314-10.108.106.59-1402620296713:blk_1404621403_330880579 > 2015-05-24 02:31:55.299 INFO [T-33716795] > [org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer] Excluding datanode > 10.108.106.59:50010 > [5/24/15 2:31:55:299 UTC] 02027a3b DFSClient I > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer nextBlockOutputStream > Excluding datanode 10.108.106.59:50010 > 2015-05-24 02:31:55.300 WARNING [T-33716795] > [org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer] DataStreamer Exception > org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > /var/db/opera/files/B4889CCDA75F9751DDBB488E5AAB433E/BE4DAEF290B7136ED6EF3D4B157441A2/BE4DAEF290B7136ED6EF3D4B157441A2-4.pag > could only be replicated to 0 nodes instead of minReplication (=1). There > are 1 datanode(s) running and 1 node(s) are excluded in this operation. > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555) > [5/24/15 2:31:55:300 UTC] 02027a3b DFSClient W > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer run DataStreamer Exception > > org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > /var/db/opera/files/B4889CCDA75F9751DDBB488E5AAB433E/BE4DAEF290B7136ED6EF3D4B157441A2/BE4DAEF290B7136ED6EF3D4B157441A2-4.pag > could only be replicated to 0 nodes instead of minReplication (=1). There > are 1 datanode(s) running and 1 node(s) are excluded in this operation. > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555) > at >
[jira] [Resolved] (HDFS-8298) HA: NameNode should not shut down completely without quorum, doesn't recover from temporary network outages
[ https://issues.apache.org/jira/browse/HDFS-8298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-8298. --- Resolution: Invalid Closing out - for specific identified improvements (such as log improvements, or ideas about improving unclear root-causing), please log a more direct JIRA. > HA: NameNode should not shut down completely without quorum, doesn't recover > from temporary network outages > --- > > Key: HDFS-8298 > URL: https://issues.apache.org/jira/browse/HDFS-8298 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ha, HDFS, namenode, qjm >Affects Versions: 2.6.0 >Reporter: Hari Sekhon > > In an HDFS HA setup if there is a temporary problem with contacting journal > nodes (eg. network interruption), the NameNode shuts down entirely, when it > should instead go in to a standby mode so that it can stay online and retry > to achieve quorum later. > If both NameNodes shut themselves off like this then even after the temporary > network outage is resolved, the entire cluster remains offline indefinitely > until operator intervention, whereas it could have self-repaired after > re-contacting the journalnodes and re-achieving quorum. > {code}2015-04-15 15:59:26,900 FATAL namenode.FSEditLog > (JournalSet.java:mapJournalsAndReportErrors(398)) - Error: flush failed for > required journal (JournalAndStre > am(mgr=QJM to [:8485, :8485, :8485], stream=QuorumOutputStream > starting at txid 54270281)) > java.io.IOException: Interrupted waiting 2ms for a quorum of nodes to > respond. > at > org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:134) > at > org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107) > at > org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113) > at > org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107) > at > org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:533) > at > org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:393) > at > org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:57) > at > org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:529) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:639) > at > org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:388) > at java.lang.Thread.run(Thread.java:745) > 2015-04-15 15:59:26,901 WARN client.QuorumJournalManager > (QuorumOutputStream.java:abort(72)) - Aborting QuorumOutputStream starting at > txid 54270281 > 2015-04-15 15:59:26,904 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - > Exiting with status 1 > 2015-04-15 15:59:27,001 INFO namenode.NameNode (StringUtils.java:run(659)) - > SHUTDOWN_MSG: > / > SHUTDOWN_MSG: Shutting down NameNode at / > /{code} > Hari Sekhon > http://www.linkedin.com/in/harisekhon -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-6674) UserGroupInformation.loginUserFromKeytab will hang forever if keytab file length is less than 6 byte.
[ https://issues.apache.org/jira/browse/HDFS-6674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-6674. --- Resolution: Invalid The hang, if still valid, seems to result as an outcome of the underlying Java libraries being at fault. There's not anything HDFS can control about this, and this bug instead needs to be reported to the Oracle/OpenJDK communities with a test case. > UserGroupInformation.loginUserFromKeytab will hang forever if keytab file > length is less than 6 byte. > -- > > Key: HDFS-6674 > URL: https://issues.apache.org/jira/browse/HDFS-6674 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Affects Versions: 2.0.1-alpha >Reporter: liuyang >Priority: Minor > > The jstack is as follows: >java.lang.Thread.State: RUNNABLE > at java.io.FileInputStream.available(Native Method) > at java.io.BufferedInputStream.available(BufferedInputStream.java:399) > - locked <0x000745585330> (a > sun.security.krb5.internal.ktab.KeyTabInputStream) > at sun.security.krb5.internal.ktab.KeyTab.load(KeyTab.java:257) > at sun.security.krb5.internal.ktab.KeyTab.(KeyTab.java:97) > at sun.security.krb5.internal.ktab.KeyTab.getInstance0(KeyTab.java:124) > - locked <0x000745586560> (a java.lang.Class for > sun.security.krb5.internal.ktab.KeyTab) > at sun.security.krb5.internal.ktab.KeyTab.getInstance(KeyTab.java:157) > at javax.security.auth.kerberos.KeyTab.takeSnapshot(KeyTab.java:119) > at > javax.security.auth.kerberos.KeyTab.getEncryptionKeys(KeyTab.java:192) > at > javax.security.auth.kerberos.JavaxSecurityAuthKerberosAccessImpl.keyTabGetEncryptionKeys(JavaxSecurityAuthKerberosAccessImpl.java:36) > at > sun.security.jgss.krb5.Krb5Util.keysFromJavaxKeyTab(Krb5Util.java:381) > at > com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:701) > at > com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:584) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at javax.security.auth.login.LoginContext.invoke(LoginContext.java:784) > at > javax.security.auth.login.LoginContext.access$000(LoginContext.java:203) > at javax.security.auth.login.LoginContext$5.run(LoginContext.java:721) > at javax.security.auth.login.LoginContext$5.run(LoginContext.java:719) > at java.security.AccessController.doPrivileged(Native Method) > at > javax.security.auth.login.LoginContext.invokeCreatorPriv(LoginContext.java:718) > at javax.security.auth.login.LoginContext.login(LoginContext.java:590) > at > org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:679) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-4224) The dncp_block_verification log can be compressed
[ https://issues.apache.org/jira/browse/HDFS-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-4224. --- Resolution: Invalid Invalid after HDFS-7430 > The dncp_block_verification log can be compressed > - > > Key: HDFS-4224 > URL: https://issues.apache.org/jira/browse/HDFS-4224 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.0.0-alpha >Reporter: Harsh J >Priority: Minor > > On some systems, I noticed that when the scanner runs, the > dncp_block_verification.log.curr file under the block pool gets quite large > (several GBs). Although this is rolled away, we could also configure > compression upon it (a codec that may work without natives, would be a good > default) and save on I/O and space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-237) Better handling of dfsadmin command when namenode is slow
[ https://issues.apache.org/jira/browse/HDFS-237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-237. -- Resolution: Later This older JIRA is a bit stale given the multiple changes that went into the RPC side. Follow HADOOP-9640 and related JIRAs instead for more recent work. bq. a separate rpc queue This is supported today via the servicerpc-address configs (typically set to 8022, and strongly recommended for HA modes). > Better handling of dfsadmin command when namenode is slow > - > > Key: HDFS-237 > URL: https://issues.apache.org/jira/browse/HDFS-237 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Koji Noguchi > > Probably when hitting HADOOP-3810, Namenode became unresponsive. Large time > spent in GC. > All dfs/dfsadmin command were timing out. > WebUI was coming up after waiting for a long time. > Maybe setting a long timeout would have made the dfsadmin command go through. > But it would be nice to have a separate queue/handler which doesn't compete > with regular rpc calls. > All I wanted to do was dfsadmin -safemode enter, dfsadmin -finalizeUpgrade ... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8516) The 'hdfs crypto -listZones' should not print an extra newline at end of output
Harsh J created HDFS-8516: - Summary: The 'hdfs crypto -listZones' should not print an extra newline at end of output Key: HDFS-8516 URL: https://issues.apache.org/jira/browse/HDFS-8516 Project: Hadoop HDFS Issue Type: Improvement Components: tools Reporter: Harsh J Assignee: Harsh J Priority: Minor It currently prints an extra newline (TableListing already adds a newline to end of table string). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-7306) can't decommission w/under construction blocks
[ https://issues.apache.org/jira/browse/HDFS-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-7306. --- Resolution: Duplicate This should be resolved via HDFS-5579. can't decommission w/under construction blocks -- Key: HDFS-7306 URL: https://issues.apache.org/jira/browse/HDFS-7306 Project: Hadoop HDFS Issue Type: Bug Reporter: Allen Wittenauer We need a way to decommission a node with open blocks. Now that HDFS supports append, this should be do-able. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-3349) DFSAdmin fetchImage command should initialize security credentials
[ https://issues.apache.org/jira/browse/HDFS-3349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-3349. --- Resolution: Cannot Reproduce Target Version/s: (was: 2.0.0-alpha) Trying with lack of credentials throws the proper response back (No tgt). I think this is stale given Aaron's comment as well, marking as resolved. DFSAdmin fetchImage command should initialize security credentials -- Key: HDFS-3349 URL: https://issues.apache.org/jira/browse/HDFS-3349 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.0.0-alpha Reporter: Aaron T. Myers Priority: Minor The `hdfs dfsadmin -fetchImage' command should fetch the fsimage using the appropriate credentials if security is enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-2360) Ugly stacktrace when quota exceeds
[ https://issues.apache.org/jira/browse/HDFS-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-2360. --- Resolution: Not a Problem The last line of the command (excluding the log and its stack trace via the WARN) does today print the base message reason that should catch the eye clearly: {code} put: The DiskSpace quota of /testDir is exceeded: quota = 1024 B = 1 KB but diskspace consumed = 402653184 B = 384 MB {code} Resolving this as it should be clear enough. To get rid of the WARN, the client logger can be nullified, but the catch layer is rather generic today to specifically turn it off without causing other impact (for other use-cases and troubles) I think. As always though, feel free to reopen with any counter-point. Ugly stacktrace when quota exceeds -- Key: HDFS-2360 URL: https://issues.apache.org/jira/browse/HDFS-2360 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 0.23.0 Reporter: Rajit Saha Priority: Minor Will it be better to catch the exception and throw a small reasonable messege to user when they exceed quota? $hdfs dfs -mkdir testDir $hdfs dfsadmin -setSpaceQuota 191M testDir $hdfs dfs -count -q testDir none inf 200278016 2002780161 0 0 hdfs://NN hostname:port/user/hdfsqa/testDir $hdfs dfs -put /etc/passwd /user/hadoopqa/testDir 11/09/19 08:08:15 WARN hdfs.DFSClient: DataStreamer Exception org.apache.hadoop.hdfs.protocol.DSQuotaExceededException: The DiskSpace quota of /user/hdfsqa/testDir is exceeded: quota=191.0m diskspace consumed=768.0m at org.apache.hadoop.hdfs.server.namenode.INodeDirectoryWithQuota.verifyQuota(INodeDirectoryWithQuota.java:159) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyQuota(FSDirectory.java:1609) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateCount(FSDirectory.java:1383) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addBlock(FSDirectory.java:370) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.allocateBlock(FSNamesystem.java:1681) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1476) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:389) at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:365) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1496) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1492) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1490) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:90) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1100) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:972) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:454) Caused by: org.apache.hadoop.hdfs.protocol.DSQuotaExceededException: The DiskSpace quota of /user/hdfsqa/testDir is exceeded: quota=191.0m diskspace consumed=768.0m at org.apache.hadoop.hdfs.server.namenode.INodeDirectoryWithQuota.verifyQuota(INodeDirectoryWithQuota.java:159) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyQuota(FSDirectory.java:1609) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateCount(FSDirectory.java:1383) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addBlock(FSDirectory.java:370) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.allocateBlock(FSNamesystem.java:1681) at
[jira] [Resolved] (HDFS-5740) getmerge file system shell command needs error message for user error
[ https://issues.apache.org/jira/browse/HDFS-5740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-5740. --- Resolution: Not a Problem This is no longer an issue on branch-2 and trunk today. The command accepts a collection of files now, and prepares the output accordingly. getmerge file system shell command needs error message for user error - Key: HDFS-5740 URL: https://issues.apache.org/jira/browse/HDFS-5740 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Affects Versions: 1.1.2 Environment: {noformat}[jpfuntner@h58 tmp]$ cat /etc/redhat-release Red Hat Enterprise Linux Server release 6.0 (Santiago) [jpfuntner@h58 tmp]$ hadoop version Hadoop 1.1.2.21 Subversion -r Compiled by jenkins on Thu Jan 10 03:38:39 PST 2013 From source with checksum ce0aa0de785f572347f1afee69c73861{noformat} Reporter: John Pfuntner Priority: Minor I naively tried a {{getmerge}} operation but it didn't seem to do anything and there was no error message: {noformat}[jpfuntner@h58 tmp]$ hadoop fs -mkdir /user/jpfuntner/tmp [jpfuntner@h58 tmp]$ num=0; while [ $num -lt 5 ]; do echo file$num | hadoop fs -put - /user/jpfuntner/tmp/file$num; let num=num+1; done [jpfuntner@h58 tmp]$ ls -A [jpfuntner@h58 tmp]$ hadoop fs -getmerge /user/jpfuntner/tmp/file* files.txt [jpfuntner@h58 tmp]$ ls -A [jpfuntner@h58 tmp]$ hadoop fs -ls /user/jpfuntner/tmp Found 5 items -rw--- 3 jpfuntner hdfs 6 2014-01-08 17:37 /user/jpfuntner/tmp/file0 -rw--- 3 jpfuntner hdfs 6 2014-01-08 17:37 /user/jpfuntner/tmp/file1 -rw--- 3 jpfuntner hdfs 6 2014-01-08 17:37 /user/jpfuntner/tmp/file2 -rw--- 3 jpfuntner hdfs 6 2014-01-08 17:37 /user/jpfuntner/tmp/file3 -rw--- 3 jpfuntner hdfs 6 2014-01-08 17:37 /user/jpfuntner/tmp/file4 [jpfuntner@h58 tmp]$ {noformat} It was pointed out to me that I made a mistake and my source should have been a directory not a set of regular files. It works if I use the directory: {noformat}[jpfuntner@h58 tmp]$ hadoop fs -getmerge /user/jpfuntner/tmp/ files.txt [jpfuntner@h58 tmp]$ ls -A files.txt .files.txt.crc [jpfuntner@h58 tmp]$ cat files.txt file0 file1 file2 file3 file4 [jpfuntner@h58 tmp]$ {noformat} I think the {{getmerge}} command should issue an error message to let the user know they made a mistake. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-4494) Confusing exception for unresolvable hdfs host with security enabled
[ https://issues.apache.org/jira/browse/HDFS-4494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-4494. --- Resolution: Done Target Version/s: 2.1.0-beta, 3.0.0 (was: 3.0.0, 2.1.0-beta) This seems resolved now (as of 2.6.0): {code} [root@host ~]# hdfs getconf -confKey hadoop.security.authentication kerberos [root@host ~]# hadoop fs -ls hdfs://asdfsdfsdf/ -ls: java.net.UnknownHostException: asdfsdfsdf Usage: hadoop fs [generic options] -ls [-d] [-h] [-R] [path ...] {code} Marking as Done. Confusing exception for unresolvable hdfs host with security enabled Key: HDFS-4494 URL: https://issues.apache.org/jira/browse/HDFS-4494 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0 Reporter: Daryn Sharp Priority: Minor {noformat} $ hadoop fs -ls hdfs://unresolvable-host ls: Can't replace _HOST pattern since client address is null {noformat} It's misleading because it's not even related to the client's address. It'd be a bit more informative to see something like {{UnknownHostException: unresolvable-host}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-4290) Expose an event listener interface in DFSOutputStreams for block write pipeline status changes
[ https://issues.apache.org/jira/browse/HDFS-4290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-4290. --- Resolution: Later Specific problems/use-cases driving this need haven't been bought up in the past years. Resolving as Later for now. Expose an event listener interface in DFSOutputStreams for block write pipeline status changes -- Key: HDFS-4290 URL: https://issues.apache.org/jira/browse/HDFS-4290 Project: Hadoop HDFS Issue Type: New Feature Components: hdfs-client Affects Versions: 2.0.0-alpha Reporter: Harsh J Priority: Minor I've noticed HBase periodically polls the current status of block replicas for its HLog files via the API presented by HDFS-826. It would perhaps be better for such clients if they could register a listener instead. The listener(s) can be sent an event in case things change in the last open block (due to DN fall but no replacement found, etc. cases). This would avoid having a periodic, parallel looped check in such clients and be more efficient. Just a thought :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-3621) Add a main method to HdfsConfiguration, for debug purposes
[ https://issues.apache.org/jira/browse/HDFS-3621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-3621. --- Resolution: Won't Fix Thanks for the work Plamen! Add a main method to HdfsConfiguration, for debug purposes -- Key: HDFS-3621 URL: https://issues.apache.org/jira/browse/HDFS-3621 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Affects Versions: 2.0.0-alpha Reporter: Harsh J Assignee: Plamen Jeliazkov Priority: Trivial Labels: newbie Attachments: HDFS-3621.patch Just like Configuration has a main() func that dumps XML out for debug purposes, we should have a similar function under the HdfsConfiguration class that does the same. This is useful in testing out app classpath setups at times. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7899) Improve EOF error message
Harsh J created HDFS-7899: - Summary: Improve EOF error message Key: HDFS-7899 URL: https://issues.apache.org/jira/browse/HDFS-7899 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.6.0 Reporter: Harsh J Priority: Minor Currently, a DN disconnection for reasons other than connection timeout or refused messages, such as an EOF message as a result of rejection or other network fault, reports in this manner: {code} WARN org.apache.hadoop.hdfs.DFSClient: Failed to connect to /x.x.x.x: for block, add to deadNodes and continue. java.io.EOFException: Premature EOF: no length prefix available java.io.EOFException: Premature EOF: no length prefix available at org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:171) at org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:392) at org.apache.hadoop.hdfs.BlockReaderFactory.newBlockReader(BlockReaderFactory.java:137) at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:1103) at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:538) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:750) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:794) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:602) {code} This is not very clear to a user (warn's at the hdfs-client). It could likely be improved with a more diagnosable message, or at least the direct reason than an EOF. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-5688) Wire-encription in QJM
[ https://issues.apache.org/jira/browse/HDFS-5688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-5688. --- Resolution: Cannot Reproduce Wire-encription in QJM -- Key: HDFS-5688 URL: https://issues.apache.org/jira/browse/HDFS-5688 Project: Hadoop HDFS Issue Type: Bug Components: ha, journal-node, security Affects Versions: 2.2.0 Reporter: Juan Carlos Fernandez Labels: security Attachments: core-site.xml, hdfs-site.xml, jaas.conf, journal.xml, namenode.xml, ssl-client.xml, ssl-server.xml When HA is implemented with QJM and using kerberos, it's not possible to set wire-encrypted data. If it's set property hadoop.rpc.protection to something different to authentication it doesn't work propertly, getting the error: ERROR security.UserGroupInformation: PriviledgedActionException as:principal@REALM (auth:KERBEROS) cause:javax.security.sasl.SaslException: No common protection layer between client and ser With NFS as shared storage everything works like a charm -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-7752) Improve description for dfs.namenode.num.extra.edits.retained and dfs.namenode.num.checkpoints.retained properties on hdfs-default.xml
[ https://issues.apache.org/jira/browse/HDFS-7752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-7752. --- Resolution: Fixed Fix Version/s: 2.7.0 Target Version/s: (was: 2.7.0) Thanks Wellington! I've committed this to branch-2 and trunk. Improve description for dfs.namenode.num.extra.edits.retained and dfs.namenode.num.checkpoints.retained properties on hdfs-default.xml -- Key: HDFS-7752 URL: https://issues.apache.org/jira/browse/HDFS-7752 Project: Hadoop HDFS Issue Type: Improvement Components: documentation Affects Versions: 2.6.0 Reporter: Wellington Chevreuil Assignee: Wellington Chevreuil Priority: Minor Fix For: 2.7.0 Attachments: HDFS-7752.patch, HDFS-7752.patch Current description for dfs.namenode.num.extra.edits.retained and dfs.namenode.num.checkpoints.retained properties on hdfs-default.xml is not clear on how much and which files will be kept on namenodes meta-data directory. For dfs.namenode.num.checkpoints.retained, it's not clear that it applies to the number of fsimage_* files. For dfs.namenode.num.extra.edits.retained, it's not clear the value set indirectly applies to edits_* files, and how the configured value translates into the number of edit files to be kept. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-7580) NN - JN communication should use reusable authentication methods
[ https://issues.apache.org/jira/browse/HDFS-7580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-7580. --- Resolution: Invalid Looking at the JDK sources there's no way to programmatically configure the KDC timeouts, so resolving this as invalid as there's nothing we can really do at our end. I'll just make a krb5.conf change. NN - JN communication should use reusable authentication methods - Key: HDFS-7580 URL: https://issues.apache.org/jira/browse/HDFS-7580 Project: Hadoop HDFS Issue Type: Improvement Components: journal-node, namenode Affects Versions: 2.5.0 Reporter: Harsh J It appears that NNs talk to JNs via general SaslRPC in secure mode, causing all requests to be carried out with a kerberos authentication. This can cause delays and occasionally NN failures if the KDC used does not respond in its default timeout period (30s, whereas the QJM writes come with default of 20s). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-7532) dncp_block_verification.log.prev too large
[ https://issues.apache.org/jira/browse/HDFS-7532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-7532. --- Resolution: Duplicate Should be eventually fixed via HDFS-7430. Yes, you may shutdown the affected DN temporarily, delete these files and start it back up. dncp_block_verification.log.prev too large -- Key: HDFS-7532 URL: https://issues.apache.org/jira/browse/HDFS-7532 Project: Hadoop HDFS Issue Type: Bug Reporter: Arti Wadhwani Priority: Blocker Hi, Using hadoop version : Hadoop 2.0.0-cdh4.7.0 can see on one datanode, dncp_block_verification.log.prev is too large. Is it safe to delete this file? {noformat} -rw-r--r-- 1 hdfs hdfs 1166438426181 Oct 31 09:34 dncp_block_verification.log.prev -rw-r--r-- 1 hdfs hdfs 138576163 Dec 15 22:16 dncp_block_verification.log.curr {noformat} This is similar to HDFS-6114 but that is for dncp_block_verification.log.curr file. Thanks, Arti Wadhwani -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7580) NN - JN communication should use reusable authentication methods
Harsh J created HDFS-7580: - Summary: NN - JN communication should use reusable authentication methods Key: HDFS-7580 URL: https://issues.apache.org/jira/browse/HDFS-7580 Project: Hadoop HDFS Issue Type: Improvement Components: journal-node, namenode Affects Versions: 2.5.0 Reporter: Harsh J It appears that NNs talk to JNs via general SaslRPC in secure mode, causing all requests to be carried out with a kerberos authentication. This can cause delays and occasionally NN failures if the KDC used does not respond in its default timeout period (30s, whereas the QJM writes come with default of 20s). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7546) Document, and set an accepting default for dfs.namenode.kerberos.principal.pattern
Harsh J created HDFS-7546: - Summary: Document, and set an accepting default for dfs.namenode.kerberos.principal.pattern Key: HDFS-7546 URL: https://issues.apache.org/jira/browse/HDFS-7546 Project: Hadoop HDFS Issue Type: Improvement Components: security Reporter: Harsh J Priority: Minor This config is used in the SaslRpcClient, and the no-default breaks cross-realm trust principals being used at clients. Current location: https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SaslRpcClient.java#L309 The config should be documented and the default should be set to * to preserve the prior-to-introduction behaviour. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7501) TransactionsSinceLastCheckpoint can be negative on SBNs
Harsh J created HDFS-7501: - Summary: TransactionsSinceLastCheckpoint can be negative on SBNs Key: HDFS-7501 URL: https://issues.apache.org/jira/browse/HDFS-7501 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.5.0 Reporter: Harsh J Priority: Trivial The metric TransactionsSinceLastCheckpoint is derived as FSEditLog.txid minus NNStorage.mostRecentCheckpointTxId. In Standby mode, the former does not increment beyond the loaded or last-when-active value, but the latter does change due to checkpoints done regularly in this mode. Thereby, the SBN will eventually end up showing negative values for TransactionsSinceLastCheckpoint. This is not an issue as the metric only makes sense to be monitored on the Active NameNode, but we should perhaps just show the value 0 by detecting if the NN is in SBN form, as allowing a negative number is confusing to view within a chart that tracks it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7290) Add HTTP response code to the HttpPutFailedException message
Harsh J created HDFS-7290: - Summary: Add HTTP response code to the HttpPutFailedException message Key: HDFS-7290 URL: https://issues.apache.org/jira/browse/HDFS-7290 Project: Hadoop HDFS Issue Type: New Feature Components: namenode Affects Versions: 2.5.0 Reporter: Harsh J Assignee: Harsh J Priority: Minor If the TransferFsImage#uploadImageFromStorage(…) call fails for some reason, we try to print back the reason of the connection failure. We currently only grab connection.getResponseMessage(…) and use that as our exception's lone string, but this can often be empty if there was no real response message from the connection end. However, the failures always have a code, so we should also ensure to print the error code returned, for at least a partial hint. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-3534) LeaseExpiredException on NameNode if file is moved while being created.
[ https://issues.apache.org/jira/browse/HDFS-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-3534. --- Resolution: Not A Problem As explained in above comments, this is expected behaviour. Resolving. LeaseExpiredException on NameNode if file is moved while being created. --- Key: HDFS-3534 URL: https://issues.apache.org/jira/browse/HDFS-3534 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.20.2, 0.20.205.0 Reporter: Mitesh Singh Jat If a file (big_file.txt size=512MB) being created (or uploaded) on hdfs, and a rename (fs -mv) of that file is done. Then following exception occurs:- {noformat} 12/06/13 08:56:42 WARN hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /user/mitesh/temp/big_file.txt File does not exist. [Lease. Holder: DFSClient_-2105467303, pendingcreates: 1] at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1604) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1595) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1511) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:685) at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1082) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382) at org.apache.hadoop.ipc.Client.call(Client.java:1066) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225) at $Proxy6.addBlock(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at $Proxy6.addBlock(Unknown Source) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3324) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3188) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2300(DFSClient.java:2406) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2646) 12/06/13 08:56:42 WARN hdfs.DFSClient: Error Recovery for block blk_-5525713112321593595_679317395 bad datanode[0] nodes == null 12/06/13 08:56:42 WARN hdfs.DFSClient: Could not get block locations. Source file /user/mitesh/temp/big_file.txt - Aborting... ... {noformat} Whereas this issue is not seen on *Hadoop 0.23*. I have used following shell script to simulate the issue. {code:title=run_parallely.sh} #!/bin/bash hadoop=hadoop filename=big_file.txt dest=/user/mitesh/temp/$filename dest2=/user/mitesh/xyz/$filename ## Clean up hadoop fs -rm -skipTrash $dest hadoop fs -rm -skipTrash $dest2 ## Copy big_file.txt onto hdfs hadoop fs -put $filename $dest cmd1.log 21 ## sleep until entry is created, hoping copying is not finished until $(hadoop fs -test -e $dest) do sleep 1 done ## Now move hadoop fs -mv $dest $dest2 cmd2.log 21 {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HDFS-160) namenode fails to run on ppc
[ https://issues.apache.org/jira/browse/HDFS-160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-160. -- Resolution: Cannot Reproduce This has likely gone stale now, and no similar reports have been received (PPC may be the reason?) - closing this one out as Cannot Reproduce. namenode fails to run on ppc Key: HDFS-160 URL: https://issues.apache.org/jira/browse/HDFS-160 Project: Hadoop HDFS Issue Type: Bug Environment: PowerPC using Fedora 9 (all updates) and gcj-1.5.0.0 Reporter: Fabian Deutsch Priority: Minor Attachments: build.log, hadoop-env.sh, hadoop-site.xml, java.hprof.txt, jdb-namenode-QUIT.log, netstat.log Hadoop starts, but eats 100% CPU. Data- and Secondarynamenodes can not connect. No jobs were run, just trying to start the daemon. using bin/start-dfs.sh. Using the same simple configuration on an x86-arch - also using Fedora 9 and gcj-1.5.0.0 - works perfectly. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HDFS-114) Remove code related to OP_READ_METADATA from DataNode
[ https://issues.apache.org/jira/browse/HDFS-114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-114. -- Resolution: Duplicate Remove code related to OP_READ_METADATA from DataNode - Key: HDFS-114 URL: https://issues.apache.org/jira/browse/HDFS-114 Project: Hadoop HDFS Issue Type: Bug Environment: All Reporter: Lohit Vijayarenu Priority: Minor HADOOP-2797 removed OP_READ_METADATA. But there is code still in DataNode for this. We could remove this and the corresponding datanode metrics associated with it. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HDFS-156) namenode doesn't start if group id cannot be resolved to name
[ https://issues.apache.org/jira/browse/HDFS-156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-156. -- Resolution: Duplicate Fixed indirectly via HADOOP-4656's change set. The 'id' command is now used instead of 'groups' when looking up user memberships. namenode doesn't start if group id cannot be resolved to name - Key: HDFS-156 URL: https://issues.apache.org/jira/browse/HDFS-156 Project: Hadoop HDFS Issue Type: Bug Environment: Linux n510 2.6.22-3-686 #1 SMP Mon Nov 12 08:32:57 UTC 2007 i686 GNU/Linux Java: java version 1.5.0_14 Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_14-b03) Java HotSpot(TM) Client VM (build 1.5.0_14-b03, mixed mode, sharing) PAM: ldap Reporter: Andrew Gudkov Assignee: Patrick Winters Priority: Minor Attachments: groupname.patch Namenode failes to start because unix group name for my user can't be got. First, system threw rather obscure message: {quote} ERROR dfs.NameNode (NameNode.java:main(856)) - java.lang.NullPointerException at org.apache.hadoop.dfs.FSNamesystem.close(FSNamesystem.java:428) at org.apache.hadoop.dfs.FSNamesystem.init(FSNamesystem.java:237) at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:130) at org.apache.hadoop.dfs.NameNode.init(NameNode.java:175) at org.apache.hadoop.dfs.NameNode.init(NameNode.java:161) at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:843) at org.apache.hadoop.dfs.NameNode.main(NameNode.java:852) {quote} I traversed through stack trace entries, and found (FSNamesystem:237) this code {quote} 233 FSNamesystem(NameNode nn, Configuration conf) throws IOException { 234 try { 235 initialize(nn, conf); 236 } catch(IOException e) { 237 close(); 238 throw e; 239 } 240 } {quote} Inserting e.printStackTrace() gave me next {quote} dfs.NameNodeMetrics (NameNodeMetrics.java:init(76)) - Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext java.io.IOException: javax.security.auth.login.LoginException: Login failed: id: cannot find name for group ID 1040 at org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:250) at org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:268) at org.apache.hadoop.dfs.FSNamesystem.setConfigurationParameters(FSNamesystem.java:330) at org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:249) at org.apache.hadoop.dfs.FSNamesystem.init(FSNamesystem.java:235) at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:130) at org.apache.hadoop.dfs.NameNode.init(NameNode.java:175) at org.apache.hadoop.dfs.NameNode.init(NameNode.java:161) at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:843) at org.apache.hadoop.dfs.NameNode.main(NameNode.java:852) at org.apache.hadoop.dfs.FSNamesystem.setConfigurationParameters(FSNamesystem.java:332) at org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:249) at org.apache.hadoop.dfs.FSNamesystem.init(FSNamesystem.java:235) at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:130) at org.apache.hadoop.dfs.NameNode.init(NameNode.java:175) at org.apache.hadoop.dfs.NameNode.init(NameNode.java:161) at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:843) at org.apache.hadoop.dfs.NameNode.main(NameNode.java:852) {quote} And this is true - command groups returns the same - id: cannot find name for group ID 1040. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HDFS-184) SecondaryNameNode doCheckpoint() renames current directory before asking NameNode to rollEditLog()
[ https://issues.apache.org/jira/browse/HDFS-184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-184. -- Resolution: Not A Problem This doesn't appear to be a problem today, esp. after the new edits and fsimage retention style, as we do rollEditLog as the first-thing before any other local operation. Likely gone stale. Closing out for now as 'Not A Problem' (anymore). SecondaryNameNode doCheckpoint() renames current directory before asking NameNode to rollEditLog() -- Key: HDFS-184 URL: https://issues.apache.org/jira/browse/HDFS-184 Project: Hadoop HDFS Issue Type: Bug Reporter: Lohit Vijayarenu Priority: Minor In SecondaryNameNode doCheckPoint() function invokes _startCheckpoint()_ before calling _namenode.rollEditLog()_ _startCheckpoint()_ internally invokes _CheckpointStorage::startCheckpoint()_ which renames current to lastcheckpoint.tmp. if call to namenode failed, then we would redo the above step renaming empty current directory in next iteration? Should we remove after we know namenode has successfully rolled edits? -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HDFS-174) GnuWin32 coreutils df output causes DF to throw
[ https://issues.apache.org/jira/browse/HDFS-174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-174. -- Resolution: Not A Problem HDFS currently has proper Windows environment support without relying on local unix-like tools to be available. Resolving as 'Not a Problem' (anymore). GnuWin32 coreutils df output causes DF to throw --- Key: HDFS-174 URL: https://issues.apache.org/jira/browse/HDFS-174 Project: Hadoop HDFS Issue Type: Bug Reporter: Albert Strasheim Priority: Minor The output from GnuWin32's coreutils's df looks like this: C:\Program Files\GnuWin32\bindf -k C:\hadoop-0.13.0 Filesystem 1K-blocks Used Available Use% Mounted on df: `NTFS': No such file or directory - 96124924 86288848 9836076 90% C:\ This causes DF's parsing to fail with the following exception: Exception in thread main java.io.IOException: df: `NTFS': No such file or directory at org.apache.hadoop.fs.DF.doDF(DF.java:65) at org.apache.hadoop.fs.DF.init(DF.java:54) at org.apache.hadoop.fs.DF.main(DF.java:168) Fixing this would be useful since it might allow for Hadoop to be used without installing Cygwin. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HDFS-285) limit concurrent connections(data serving thread) in one datanode
[ https://issues.apache.org/jira/browse/HDFS-285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-285. -- Resolution: Not A Problem This has likely gone stale (probably addressed at a higher level via Raghu's earliest comments). In having seen some pretty large HBase region sets on several clusters, and never having faced the described stack limit OOME (but having faced the transceiver limits) I think this is likely no longer an issue. Closing out as 'Not a Problem' (anymore). limit concurrent connections(data serving thread) in one datanode - Key: HDFS-285 URL: https://issues.apache.org/jira/browse/HDFS-285 Project: Hadoop HDFS Issue Type: Improvement Reporter: Luo Ning Priority: Minor i'm here after HADOOP-2341 and HADOOP-2346, in my hbase env, many opening mapfiles cause datanode OOME(stack memory), because 2000+ data serving threads in datanode process. although HADOOP-2346 has implements timeouts, it will be some situation many connection created before the read timeout(default 6min) reach. like hbase does, it open all files on regionserver startup. limit concurrent connections(data serving thread) will make datanode more stable. and i think it could be done in SocketIOWithTimeout$SelectorPool#select: 1. in SelectorPool#select, record all waiting SelectorInfo instances in a List at the beginning, and remove it after 'Selector#select' done. 2. before real 'select', do a limitation check, if reached, close the first selectorInfo. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HDFS-345) DataNode to send block reports to multiple namenodes?
[ https://issues.apache.org/jira/browse/HDFS-345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-345. -- Resolution: Implemented This is pretty close to the HDFS HA mechanism available in current versions. Resolving as 'Implemented'. DataNode to send block reports to multiple namenodes? - Key: HDFS-345 URL: https://issues.apache.org/jira/browse/HDFS-345 Project: Hadoop HDFS Issue Type: Improvement Reporter: Marco Nicosia Priority: Minor I have this theory that I could test the memory footprint of a new version of the Hadoop namenode, without interrupting a running instance. We could shut down the secondary namenode process, and run a new version of the namenode code on the image file found on the secondary namenode server. But just running on the image file wouldn't be enough. It'd be great if I could get a real feel by having all the block reports also make their way to my fake namenode. Would it be possible for datanodes to report to two different namenodes, even if only one is the active, live namenode? (I understand that this wouldn't work if the format of the block report, or worse, the rpc layer, were incompatible.) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HDFS-372) DataNode should reuse delBlockFromDisk
[ https://issues.apache.org/jira/browse/HDFS-372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-372. -- Resolution: Not A Problem The invalidation of blocks was deliberately moved onto using the async disk deletion services to facilitate for large delete operations without blocking operations. The remainder of the deletes (except for unfinalizing a block) appear to be special cases (missing block files while meta continues to exist) under the FSDataSet implementation and the delBlockFromDisk wouldn't apply to it. Likely gone stale. Closing out as 'Not a Problem'. DataNode should reuse delBlockFromDisk -- Key: HDFS-372 URL: https://issues.apache.org/jira/browse/HDFS-372 Project: Hadoop HDFS Issue Type: Improvement Reporter: Hairong Kuang Priority: Minor FSDataSet should reuse delBlcokFromDisk where it should/can be used like in invalidateBlock. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HDFS-386) NameNode webUI should show the config it is running with.
[ https://issues.apache.org/jira/browse/HDFS-386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-386. -- Resolution: Duplicate NameNode webUI should show the config it is running with. - Key: HDFS-386 URL: https://issues.apache.org/jira/browse/HDFS-386 Project: Hadoop HDFS Issue Type: Improvement Reporter: Lohit Vijayarenu Priority: Minor It would be good if Namenode webUI also showed the config it is running with. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HDFS-431) port fuse-dfs existing autoconf to hadoop project's autoconf infrastructure
[ https://issues.apache.org/jira/browse/HDFS-431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-431. -- Resolution: Invalid This is now Invalid since we've moved on to using CMake as the build framework instead. port fuse-dfs existing autoconf to hadoop project's autoconf infrastructure --- Key: HDFS-431 URL: https://issues.apache.org/jira/browse/HDFS-431 Project: Hadoop HDFS Issue Type: Bug Components: fuse-dfs Reporter: Pete Wyckoff Priority: Minor Although fuse-dfs has its own autoconf macros and such, better to use one set of macros and in some places the macros could be improved. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HDFS-803) eclipse-files target needs to depend on 'ivy-retrieve-test'
[ https://issues.apache.org/jira/browse/HDFS-803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-803. -- Resolution: Not A Problem Resolving as 'Not a Problem' (anymore) as we've long since moved onto using Maven instead of ant on trunk and on the 2.x stable releases. eclipse-files target needs to depend on 'ivy-retrieve-test' --- Key: HDFS-803 URL: https://issues.apache.org/jira/browse/HDFS-803 Project: Hadoop HDFS Issue Type: Bug Components: build Reporter: Konstantin Boudnik Priority: Minor Attachments: hdfs-803.patch When {{ant eclipse-files}} is executed only common jars are guarantee to be pulled in. To pull test jars one needs to manually run {{ant ivy-retrieve-test}} first. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HDFS-2892) Some of property descriptions are not given(hdfs-default.xml)
[ https://issues.apache.org/jira/browse/HDFS-2892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-2892. --- Resolution: Invalid Target Version/s: (was: 2.0.0-alpha, 3.0.0) Resolving as Invalid as these were user questions. Some of property descriptions are not given(hdfs-default.xml) -- Key: HDFS-2892 URL: https://issues.apache.org/jira/browse/HDFS-2892 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 0.23.0 Reporter: Brahma Reddy Battula Priority: Trivial Hi..I taken 23.0 release form http://hadoop.apache.org/common/releases.html#11+Nov%2C+2011%3A+release+0.23.0+available I just gone through all properties provided in the hdfs-default.xml..Some of the property description not mentioned..It's better to give description of property and usage(how to configure ) and Only MapReduce related jars only provided..Please check following two configurations *No Description* {noformat} property namedfs.datanode.https.address/name value0.0.0.0:50475/value /property property namedfs.namenode.https-address/name value0.0.0.0:50470/value /property {noformat} Better to mention example usage (what to configure...format(syntax))in desc,here I did not get what default mean whether this name of n/w interface or something else property namedfs.datanode.dns.interface/name valuedefault/value descriptionThe name of the Network Interface from which a data node should report its IP address. /description /property The following property is commented..If it is not supported better to remove. property namedfs.cluster.administrators/name valueACL for the admins/value descriptionThis configuration is used to control who can access the default servlets in the namenode, etc. /description /property Small clarification for following property..if some value configured this then NN will be safe mode upto this much time.. May I know usage of the following property... property namedfs.blockreport.initialDelay/name value0/value descriptionDelay for first block report in seconds./description /property -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HDFS-198) org.apache.hadoop.dfs.LeaseExpiredException during dfs write
[ https://issues.apache.org/jira/browse/HDFS-198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-198. -- Resolution: Not A Problem This one has gone very stale and we have not seen any properly true reports of lease renewals going amiss during long waiting tasks recently. Marking as 'Not a Problem' (anymore). If there's a proper new report of this behaviour, please lets file a new JIRA with the newer data. [~bugcy013] - Your problem is pretty different from what OP appears to have reported in an older version. Your problem arises out of MR tasks not utilising an attempt ID based directory (which Hive appears to do sometimes), in which case two different running attempts (out of speculative exec. or otherwise) can cause one of them to run into this error as a result of the file overwrite. Best to investigate further on a mailing list rather than here. org.apache.hadoop.dfs.LeaseExpiredException during dfs write Key: HDFS-198 URL: https://issues.apache.org/jira/browse/HDFS-198 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client, namenode Reporter: Runping Qi Many long running cpu intensive map tasks failed due to org.apache.hadoop.dfs.LeaseExpiredException. See [a comment below|https://issues.apache.org/jira/browse/HDFS-198?focusedCommentId=12910298page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12910298] for the exceptions from the log: -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5802) NameNode does not check for inode type before traversing down a path
Harsh J created HDFS-5802: - Summary: NameNode does not check for inode type before traversing down a path Key: HDFS-5802 URL: https://issues.apache.org/jira/browse/HDFS-5802 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.0.0-alpha Reporter: Harsh J Priority: Trivial This came up during the discussion on a forum at http://community.cloudera.com/t5/Batch-Processing-and-Workflow/Permission-denied-access-EXECUTE-on-getting-the-status-of-a-file/m-p/5049#M162 surrounding an fs.exists(…) check running on a path /foo/bar, where /foo is a file and not a directory. In such a case, NameNode yields a user-confusing message of {{Permission denied: user=foo, access=EXECUTE, inode=/foo:foo:foo:-rw-r--r--}} instead of clearly saying (and realising) /foo is not a directory or /foo is a file before it tries to traverse further down to locate the requested path. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5189) Rename the CorruptBlocks metric to CorruptReplicas
Harsh J created HDFS-5189: - Summary: Rename the CorruptBlocks metric to CorruptReplicas Key: HDFS-5189 URL: https://issues.apache.org/jira/browse/HDFS-5189 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.1.0-beta Reporter: Harsh J Assignee: Harsh J Priority: Minor The NameNode increments a CorruptBlocks metric even if only one of the block's replicas is reported corrupt (genuine checksum fail, or even if a replica has a bad genstamp). In cases where this is incremented, fsck still reports a healthy state. This is confusing to users and causes false alarm as they feel this is to be monitored (instead of MissingBlocks). The metric is truly trying to report only corrupt replicas, not whole blocks, and ought to be renamed. FWIW, the dfsadmin -report reports a proper string of Blocks with corrupt replicas: when printing this count. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-5046) Hang when add/remove a datanode into/from a 2 datanode cluster
[ https://issues.apache.org/jira/browse/HDFS-5046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-5046. --- Resolution: Not A Problem bq. a). decommission progress hangs and the status always be 'Waiting DataNode status: Decommissioned'. But, if I execute 'hadoop dfs -setrep -R 2 /', the decommission continues and will be completed finally. The step (a) points to your problem and solution both. You have files being created with repl=3 on a 2 DN cluster which will prevent decommission. This is not a bug. Hang when add/remove a datanode into/from a 2 datanode cluster -- Key: HDFS-5046 URL: https://issues.apache.org/jira/browse/HDFS-5046 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 1.1.1 Environment: Red Hat Enterprise Linux Server release 5.3, 64 bit Reporter: sam liu 1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2. And, in hdfs-site.xml, set the 'dfs.replication' to 2 2. Add node dn3 into the cluster as a new datanode, and did not change the 'dfs.replication' value in hdfs-site.xml and keep it as 2 note: step 2 passed 3. Decommission dn3 from the cluster Expected result: dn3 could be decommissioned successfully Actual result: a). decommission progress hangs and the status always be 'Waiting DataNode status: Decommissioned'. But, if I execute 'hadoop dfs -setrep -R 2 /', the decommission continues and will be completed finally. b). However, if the initial cluster includes = 3 datanodes, this issue won't be encountered when add/remove another datanode. For example, if I setup a cluster with 3 datanodes, and then I can successfully add the 4th datanode into it, and then also can successfully remove the 4th datanode from the cluster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-4991) HDFSSeek API fails to seek to position when file is opened in write mode.
[ https://issues.apache.org/jira/browse/HDFS-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-4991. --- Resolution: Invalid First off, please do not use the JIRA as a QA medium. The HDFS project provides and maintains a development list at hdfs-dev@hadoop.apache.org you should mail to with such questions. Only file valid issues on the JIRA please. Onto your question, HDFS has no random-write features, it does not support it yet, hence there exists no API. If you plan to add such a feature, a design document for your implementation idea and discussion on the hdfs-dev@ lists is very welcome. Merely adding an API will not solve this - you will need to understand why its a limitation at the architecture level currently first. Resolving as invalid. Please use lists for general QA. HDFSSeek API fails to seek to position when file is opened in write mode. - Key: HDFS-4991 URL: https://issues.apache.org/jira/browse/HDFS-4991 Project: Hadoop HDFS Issue Type: Bug Components: libhdfs Affects Versions: 0.20.1 Environment: Redhat Linux Reporter: Dayakar Reddy Hi, hdfsSeek API fails to seek to position when file is opened in write mode. I studied in documentation that hdfsSeek is only supported when file is opened in read mode. We have a requirement of replacing the file resided on hadoop environment. Is there any possibility of having HDFSSeek to be supported when file is opened in write mode? Regards, Dayakar -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4983) Numeric usernames do not work with WebHDFS FS
Harsh J created HDFS-4983: - Summary: Numeric usernames do not work with WebHDFS FS Key: HDFS-4983 URL: https://issues.apache.org/jira/browse/HDFS-4983 Project: Hadoop HDFS Issue Type: Improvement Components: webhdfs Affects Versions: 2.0.0-alpha Reporter: Harsh J Per the file hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/resources/UserParam.java, the DOMAIN pattern is set to: {{^[A-Za-z_][A-Za-z0-9._-]*[$]?$}}. Given this, using a username such as 123 seems to fail for some reason (tried on insecure setup): {code} [123@host-1 ~]$ whoami 123 [123@host-1 ~]$ hadoop fs -fs webhdfs://host-2.domain.com -ls / -ls: Invalid value: 123 does not belong to the domain ^[A-Za-z_][A-Za-z0-9._-]*[$]?$ Usage: hadoop fs [generic options] -ls [-d] [-h] [-R] [path ...] {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4936) Handle overflow condition for txid going over Long.MAX_VALUE
Harsh J created HDFS-4936: - Summary: Handle overflow condition for txid going over Long.MAX_VALUE Key: HDFS-4936 URL: https://issues.apache.org/jira/browse/HDFS-4936 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.0.0-alpha Reporter: Harsh J Priority: Minor Hat tip to [~fengdon...@gmail.com] for the question that lead to this (on mailing lists). I hacked up my local NN's txids manually to go very large (close to max) and decided to try out if this causes any harm. I basically bumped up the freshly formatted files' starting txid to 9223372036854775805 (and ensured image references the same by hex-editing it): {code} ➜ current ls VERSION fsimage_9223372036854775805.md5 fsimage_9223372036854775805 seen_txid ➜ current cat seen_txid 9223372036854775805 {code} NameNode started up as expected. {code} 13/06/25 18:30:08 INFO namenode.FSImage: Image file of size 129 loaded in 0 seconds. 13/06/25 18:30:08 INFO namenode.FSImage: Loaded image for txid 9223372036854775805 from /temp-space/tmp-default/dfs-cdh4/name/current/fsimage_9223372036854775805 13/06/25 18:30:08 INFO namenode.FSEditLog: Starting log segment at 9223372036854775806 {code} I could create a bunch of files and do regular ops (counting to much after the long max increments). I created over 10 files, just to make it go well over the Long.MAX_VALUE. Quitting NameNode and restarting fails though, with the following error: {code} 13/06/25 18:31:08 INFO namenode.FileJournalManager: Recovering unfinalized segments in /Users/harshchouraria/Work/installs/temp-space/tmp-default/dfs-cdh4/name/current 13/06/25 18:31:08 INFO namenode.FileJournalManager: Finalizing edits file /Users/harshchouraria/Work/installs/temp-space/tmp-default/dfs-cdh4/name/current/edits_inprogress_9223372036854775806 - /Users/harshchouraria/Work/installs/temp-space/tmp-default/dfs-cdh4/name/current/edits_9223372036854775806-9223372036854775807 13/06/25 18:31:08 FATAL namenode.NameNode: Exception in namenode join java.io.IOException: Gap in transactions. Expected to be able to read up until at least txid 9223372036854775806 but unable to find any edit logs containing txid -9223372036854775808 at org.apache.hadoop.hdfs.server.namenode.FSEditLog.checkForGaps(FSEditLog.java:1194) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1152) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:616) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:267) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:592) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:435) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:397) at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:399) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:433) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:609) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:590) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1141) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1205) {code} Looks like we also lose some edits when we restart, as noted by the finalized edits filename: {code} VERSION edits_9223372036854775806-9223372036854775807 fsimage_9223372036854775805 fsimage_9223372036854775805.md5 seen_txid {code} It seems like we won't be able to handle the case where txid overflows. Its a very very large number so that's not an immediate concern but seemed worthy of a report. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-4936) Handle overflow condition for txid going over Long.MAX_VALUE
[ https://issues.apache.org/jira/browse/HDFS-4936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-4936. --- Resolution: Not A Problem Handle overflow condition for txid going over Long.MAX_VALUE Key: HDFS-4936 URL: https://issues.apache.org/jira/browse/HDFS-4936 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.0.0-alpha Reporter: Harsh J Priority: Minor Hat tip to [~fengdon...@gmail.com] for the question that lead to this (on mailing lists). I hacked up my local NN's txids manually to go very large (close to max) and decided to try out if this causes any harm. I basically bumped up the freshly formatted files' starting txid to 9223372036854775805 (and ensured image references the same by hex-editing it): {code} ➜ current ls VERSION fsimage_9223372036854775805.md5 fsimage_9223372036854775805 seen_txid ➜ current cat seen_txid 9223372036854775805 {code} NameNode started up as expected. {code} 13/06/25 18:30:08 INFO namenode.FSImage: Image file of size 129 loaded in 0 seconds. 13/06/25 18:30:08 INFO namenode.FSImage: Loaded image for txid 9223372036854775805 from /temp-space/tmp-default/dfs-cdh4/name/current/fsimage_9223372036854775805 13/06/25 18:30:08 INFO namenode.FSEditLog: Starting log segment at 9223372036854775806 {code} I could create a bunch of files and do regular ops (counting to much after the long max increments). I created over 10 files, just to make it go well over the Long.MAX_VALUE. Quitting NameNode and restarting fails though, with the following error: {code} 13/06/25 18:31:08 INFO namenode.FileJournalManager: Recovering unfinalized segments in /Users/harshchouraria/Work/installs/temp-space/tmp-default/dfs-cdh4/name/current 13/06/25 18:31:08 INFO namenode.FileJournalManager: Finalizing edits file /Users/harshchouraria/Work/installs/temp-space/tmp-default/dfs-cdh4/name/current/edits_inprogress_9223372036854775806 - /Users/harshchouraria/Work/installs/temp-space/tmp-default/dfs-cdh4/name/current/edits_9223372036854775806-9223372036854775807 13/06/25 18:31:08 FATAL namenode.NameNode: Exception in namenode join java.io.IOException: Gap in transactions. Expected to be able to read up until at least txid 9223372036854775806 but unable to find any edit logs containing txid -9223372036854775808 at org.apache.hadoop.hdfs.server.namenode.FSEditLog.checkForGaps(FSEditLog.java:1194) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1152) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:616) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:267) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:592) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:435) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:397) at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:399) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:433) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:609) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:590) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1141) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1205) {code} Looks like we also lose some edits when we restart, as noted by the finalized edits filename: {code} VERSION edits_9223372036854775806-9223372036854775807 fsimage_9223372036854775805 fsimage_9223372036854775805.md5 seen_txid {code} It seems like we won't be able to handle the case where txid overflows. Its a very very large number so that's not an immediate concern but seemed worthy of a report. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-2316) [umbrella] WebHDFS: a complete FileSystem implementation for accessing HDFS over HTTP
[ https://issues.apache.org/jira/browse/HDFS-2316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-2316. --- Resolution: Fixed Target Version/s: (was: 0.22.1) [umbrella] WebHDFS: a complete FileSystem implementation for accessing HDFS over HTTP - Key: HDFS-2316 URL: https://issues.apache.org/jira/browse/HDFS-2316 Project: Hadoop HDFS Issue Type: New Feature Components: webhdfs Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Labels: critical-0.22.0 Fix For: 1.0.0, 0.23.1 Attachments: test-webhdfs, test-webhdfs-0.20s, WebHdfsAPI20111020.pdf, WebHdfsAPI2003.pdf, WebHdfsAPI2011.pdf We current have hftp for accessing HDFS over HTTP. However, hftp is a read-only FileSystem and does not provide write accesses. In HDFS-2284, we propose to have WebHDFS for providing a complete FileSystem implementation for accessing HDFS over HTTP. The is the umbrella JIRA for the tasks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-4630) Datanode is going OOM due to small files in hdfs
[ https://issues.apache.org/jira/browse/HDFS-4630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-4630. --- Resolution: Invalid Closing again per Suresh's comment, as this is by design and you're merely required to raise your heap to accommodate more files (and thereby, blocks). Please also see HDFS-4465 and HDFS-4461 on optimizations of this. Datanode is going OOM due to small files in hdfs Key: HDFS-4630 URL: https://issues.apache.org/jira/browse/HDFS-4630 Project: Hadoop HDFS Issue Type: Bug Components: datanode, namenode Affects Versions: 2.0.0-alpha Environment: Ubuntu, Java 1.6 Reporter: Ankush Bhatiya Priority: Blocker Hi, We have very small files(size ranging 10KB-1MB) in our hdfs and no of files are in tens of millions. Due to this namenode and datanode both going out of memory very frequently. When we analyse the head dump of datanode most of the memory was used by ReplicaMap. Can we use EhCache or other to not to store all the data in memory? Thanks Ankush -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-4624) eclipse plugin for hadoop 2.0.0-alpha
[ https://issues.apache.org/jira/browse/HDFS-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-4624. --- Resolution: Invalid eclipse plugin for hadoop 2.0.0-alpha - Key: HDFS-4624 URL: https://issues.apache.org/jira/browse/HDFS-4624 Project: Hadoop HDFS Issue Type: Wish Components: federation Environment: ubuntu 12.04, java 1.7, Reporter: Sreevatson Is there an eclipse plug in available for hadoop 2.0.0-alpha? i am currently working on a project to device a solution for small files problem and i am using hdfs federation. I want to integrate our web server with hdfs. So I need eclipse plugin for this version. Please help me out. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-4509) Provide a way to ask Balancer to exclude certain DataNodes in its computation and/or work.
[ https://issues.apache.org/jira/browse/HDFS-4509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-4509. --- Resolution: Duplicate Dupe of HDFS-4420. Provide a way to ask Balancer to exclude certain DataNodes in its computation and/or work. -- Key: HDFS-4509 URL: https://issues.apache.org/jira/browse/HDFS-4509 Project: Hadoop HDFS Issue Type: Improvement Components: balancer Reporter: Harsh J Priority: Minor This comes particularly useful in clusters that have a split between DNs used for regular purpose and DNs used for HBase RSes specifically. By asking the balancer to exclude the DNs that RSes run on, its possible to avoid impacting HBase's local reads performance, and the balancing of these nodes can be deferred to a later time. An alternate, and perhaps simpler approach would be to make the Balancer file-aware and ask it to skip a specific directory's file's blocks (i.e. that of /hbase for example). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4508) Two minor improvements to the QJM Deployment docs
Harsh J created HDFS-4508: - Summary: Two minor improvements to the QJM Deployment docs Key: HDFS-4508 URL: https://issues.apache.org/jira/browse/HDFS-4508 Project: Hadoop HDFS Issue Type: Bug Components: documentation Affects Versions: 2.0.3-alpha Reporter: Harsh J Priority: Minor Suggested by ML user Azurry, the docs at http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html#Deployment_details can be improved for two specific lines: {quote} * If you have already formatted the NameNode, or are converting a non-HA-enabled cluster to be HA-enabled, you should now copy over the contents of your NameNode metadata directories to the other, unformatted NameNode by running the command hdfs namenode -bootstrapStandby on the unformatted NameNode. Running this command will also ensure that the JournalNodes (as configured by dfs.namenode.shared.edits.dir) contain sufficient edits transactions to be able to start both NameNodes. * If you are converting a non-HA NameNode to be HA, you should run the command hdfs -initializeSharedEdits, which will initialize the JournalNodes with the edits data from the local NameNode edits directories. {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4509) Provide a way to ask Balancer to exclude certain DataNodes in its computation and/or work.
Harsh J created HDFS-4509: - Summary: Provide a way to ask Balancer to exclude certain DataNodes in its computation and/or work. Key: HDFS-4509 URL: https://issues.apache.org/jira/browse/HDFS-4509 Project: Hadoop HDFS Issue Type: Improvement Components: balancer Reporter: Harsh J Priority: Minor This comes particularly useful in clusters that have a split between DNs used for regular purpose and DNs used for HBase RSes specifically. By asking the balancer to exclude the DNs that RSes run on, its possible to avoid impacting HBase's local reads performance, and the balancing of these nodes can be deferred to a later time. An alternate, and perhaps simpler approach would be to make the Balancer file-aware and ask it to skip a specific directory's file's blocks (i.e. that of /hbase for example). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-976) Hot Standby for NameNode
[ https://issues.apache.org/jira/browse/HDFS-976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-976. -- Resolution: Duplicate A working HDFS HA mode has been implemented via HDFS-1623. Closing this one out as a 'dupe'. Hot Standby for NameNode Key: HDFS-976 URL: https://issues.apache.org/jira/browse/HDFS-976 Project: Hadoop HDFS Issue Type: New Feature Components: namenode Reporter: dhruba borthakur Assignee: Dmytro Molkov Attachments: 0001-0.20.3_rc2-AvatarNode.patch, AvatarNode.20.patch, AvatarNodeDescription.txt, AvatarNode.patch, AvatarPatch.2.patch This is a place holder to share our code and experiences about implementing a Hot Standby for the HDFS NameNode for hadoop 0.20. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4449) When a decommission is awaiting closure of live blocks, show the block IDs on the NameNode's UI report
Harsh J created HDFS-4449: - Summary: When a decommission is awaiting closure of live blocks, show the block IDs on the NameNode's UI report Key: HDFS-4449 URL: https://issues.apache.org/jira/browse/HDFS-4449 Project: Hadoop HDFS Issue Type: Improvement Reporter: Harsh J Assignee: Harsh J It is rather common for people to be complaining about 'DN decommission' hangs cause of live blocks waiting to get completed by some app (especially certain HBase specifics cause a file to be open for a longer time, as compared with MR/etc.). While they can see a count of the blocks that are live, we should add some more details to that view. Particularly add the list of live blocks waiting to be closed, so that a user may understand better on why its hung and also be able to trace back the block to files manually if needed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-3801) Provide a way to disable browsing of files from the web UI
[ https://issues.apache.org/jira/browse/HDFS-3801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-3801. --- Resolution: Won't Fix Hi Suresh and others, Yes I agree, we can close this. It is better to go with a filter. Provide a way to disable browsing of files from the web UI -- Key: HDFS-3801 URL: https://issues.apache.org/jira/browse/HDFS-3801 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.0.0-alpha Reporter: Harsh J Assignee: Harsh J Priority: Minor Attachments: HDFS-3801.patch A few times we've had requests from users who wish to disable browsing of the filesystem in the web UI completely, while keeping other servlet functionality enabled (such as fsck, etc.). Right now, the cheap way to do this is by blocking out the DN web port (50075) from access by clients, but that also hampers HFTP transfers. We should instead provide a toggle config for the JSPs to use and disallow browsing if the toggle's enabled. The config can be true by default, to not change the behavior. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-4425) NameNode low on available disk space
[ https://issues.apache.org/jira/browse/HDFS-4425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-4425. --- Resolution: Invalid The Apache JIRA is not for user help but only for confirmed bug reports. Please send usage help requests such as your questions to u...@hadoop.apache.org. I'm resolving this as Invalid; lets carry forward on your email instead. Many have already answered you there. The key to tweak the default is dfs.namenode.resource.du.reserved. NameNode low on available disk space Key: HDFS-4425 URL: https://issues.apache.org/jira/browse/HDFS-4425 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.0.0-alpha Reporter: project Priority: Critical Hi, Namenode switches into safemode when it has low disk space on the root fs / i have to manually run a command to leave it. Below are log messages for low space on root / fs. Is there any parameter so that i can reduce reserved amount. 2013-01-21 01:22:52,217 WARN org.apache.hadoop.hdfs.server.namenode.NameNodeResourceChecker: Space available on volume '/dev/mapper/vg_lv_root' is 10653696, which is below the configured reserved amount 104857600 2013-01-21 01:22:52,218 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: NameNode low on available disk space. Entering safe mode. 2013-01-21 01:22:52,218 INFO org.apache.hadoop.hdfs.StateChange: STATE* Safe mode is ON. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4255) Useless stacktrace shown in DN when there's an error writing a block
Harsh J created HDFS-4255: - Summary: Useless stacktrace shown in DN when there's an error writing a block Key: HDFS-4255 URL: https://issues.apache.org/jira/browse/HDFS-4255 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.0.2-alpha Reporter: Harsh J Priority: Minor The DN sometimes carries these, especially when its asked to shutdown and there's ongoing write activity. The stacktrace is absolutely useless and may be improved, and the message it comes as part of is an INFO, which should not be the case when a stacktrace is necessary to be print (indicative of a trouble). {code} 2012-12-01 19:10:23,167 INFO datanode.DataNode (BlockReceiver.java:run(955)) - PacketResponder: BP-1493454111-192.168.2.1-1354369220726:blk_-8775461920430955284_1002, type=HAS_DOWNSTREAM_IN_PIPELINE java.io.EOFException: Premature EOF: no length prefix available at org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:171) at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:116) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:905) at java.lang.Thread.run(Thread.java:680) {code} Full scenario log in comments. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4259) Improve pipeline DN replacement failure message
Harsh J created HDFS-4259: - Summary: Improve pipeline DN replacement failure message Key: HDFS-4259 URL: https://issues.apache.org/jira/browse/HDFS-4259 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Affects Versions: 2.0.2-alpha Reporter: Harsh J Priority: Minor The current message shown is something such as below: bq. Failed to add a datanode. User may turn off this feature by setting X.policy in configuration, where the current policy is Y. (Nodes: current=[foo], original=[bar]) This reads off like failing is a feature (but the intention and the reason we hit this isn't indicated strongly), and can be bettered. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-4069) File mode bits of some scripts in rpm package are incorrect
[ https://issues.apache.org/jira/browse/HDFS-4069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-4069. --- Resolution: Won't Fix File mode bits of some scripts in rpm package are incorrect --- Key: HDFS-4069 URL: https://issues.apache.org/jira/browse/HDFS-4069 Project: Hadoop HDFS Issue Type: Bug Components: scripts Affects Versions: 1.0.3, 1.1.0 Environment: Fedora 17 3.3.4-5.fc17.x86_64, OpenJDK Runtime Environment 1.7.0_06-icedtea, Rackspace Cloud Reporter: Haoquan Wang Priority: Minor Labels: patch Original Estimate: 1h Remaining Estimate: 1h These scripts should have execute permission(755). It only happens to rpm package, deb package does not have this problem. {noformat}-rw-r--r--. 1 root root 2143 Oct 4 22:12 /usr/sbin/slaves.sh -rw-r--r--. 1 root root 1166 Oct 4 22:12 /usr/sbin/start-all.sh -rw-r--r--. 1 root root 1065 Oct 4 22:12 /usr/sbin/start-balancer.sh -rw-r--r--. 1 root root 1745 Oct 4 22:12 /usr/sbin/start-dfs.sh -rw-r--r--. 1 root root 1145 Oct 4 22:12 /usr/sbin/start-jobhistoryserver.sh -rw-r--r--. 1 root root 1259 Oct 4 22:12 /usr/sbin/start-mapred.sh -rw-r--r--. 1 root root 1119 Oct 4 22:12 /usr/sbin/stop-all.sh -rw-r--r--. 1 root root 1116 Oct 4 22:12 /usr/sbin/stop-balancer.sh -rw-r--r--. 1 root root 1246 Oct 4 22:12 /usr/sbin/stop-dfs.sh -rw-r--r--. 1 root root 1131 Oct 4 22:12 /usr/sbin/stop-jobhistoryserver.sh -rw-r--r--. 1 root root 1168 Oct 4 22:12 /usr/sbin/stop-mapred.sh -rw-r--r--. 1 root root 4210 Oct 4 22:12 /usr/sbin/update-hadoop-env.sh{noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4002) Tool-ize OfflineImageViewer and make sure it returns proper return codes upon exit
Harsh J created HDFS-4002: - Summary: Tool-ize OfflineImageViewer and make sure it returns proper return codes upon exit Key: HDFS-4002 URL: https://issues.apache.org/jira/browse/HDFS-4002 Project: Hadoop HDFS Issue Type: Bug Components: tools Affects Versions: 2.0.0-alpha Reporter: Harsh J Priority: Minor We should make OfflineImageViewer structured (code-wise) in the same way as OfflineEditsViewer is. Particularly, OIV must implement the Tool interface, and must return proper exit codes upon success/failure conditions. Right now, it returns 0 in both successful parse and unsuccessful ones. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3968) TestPersistBlocks seems to fail intermittently
Harsh J created HDFS-3968: - Summary: TestPersistBlocks seems to fail intermittently Key: HDFS-3968 URL: https://issues.apache.org/jira/browse/HDFS-3968 Project: Hadoop HDFS Issue Type: Test Components: test Affects Versions: 3.0.0 Reporter: Harsh J Received on HADOOP-8158: {code} -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestPersistBlocks {code} Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1503//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1503//console But test seems to pass on my local build. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-2530) Add testcases for -n option of FSshell cat
[ https://issues.apache.org/jira/browse/HDFS-2530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-2530. --- Resolution: Invalid Hi Xie, Please post the patch along with the parent JIRA itself, to keep the commits single for this new feature. Add testcases for -n option of FSshell cat -- Key: HDFS-2530 URL: https://issues.apache.org/jira/browse/HDFS-2530 Project: Hadoop HDFS Issue Type: Test Components: test Affects Versions: 0.24.0 Reporter: XieXianshan Priority: Trivial Attachments: HDFS-2530.patch Add test cases for HADOOP-7795. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3886) Shutdown requests can possibly check for checkpoint issues (corrupted edits) and save a good namespace copy before closing down?
Harsh J created HDFS-3886: - Summary: Shutdown requests can possibly check for checkpoint issues (corrupted edits) and save a good namespace copy before closing down? Key: HDFS-3886 URL: https://issues.apache.org/jira/browse/HDFS-3886 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 2.0.0-alpha Reporter: Harsh J Priority: Minor HDFS-3878 sorta gives me this idea. Aside of having a method to download it to a different location, we can also lock up the namesystem (or deactivate the client rpc server) and save the namesystem before we complete up the shutdown. The init.d/shutdown scripts would have to work with this somehow though, to not kill -9 it when in-process. Also, the new image may be stored in a shutdown.chkpt directory, to not interfere in the regular dirs, but still allow easier recovery. Obviously this will still not work if all directories are broken. So maybe we could have some configs to tackle that as well? I haven't thought this through, so let me know what part is wrong to do :) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3801) Provide a way to disable browsing of the files from the web UI
Harsh J created HDFS-3801: - Summary: Provide a way to disable browsing of the files from the web UI Key: HDFS-3801 URL: https://issues.apache.org/jira/browse/HDFS-3801 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 2.0.0-alpha Reporter: Harsh J Priority: Minor A few times we've had requests from users who wish to disable browsing of the filesystem in the web UI completely, while keeping other servlet functionality enabled (such as fsck, etc.). Right now, the cheap way to do this is by blocking out the DN web port (50075) from access by clients, but that also hampers HFTP transfers. We should instead provide a toggle config for the JSPs to use and disallow browsing if the toggle's enabled. The config can be true by default, to not change the behavior. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-3647) Backport HDFS-2868 (Add number of active transfer threads to the DataNode status) to branch-1
[ https://issues.apache.org/jira/browse/HDFS-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-3647. --- Resolution: Fixed Fix Version/s: 1.2.0 Target Version/s: (was: 1.2.0) Hadoop Flags: Reviewed Thanks Todd, I've committed this backport to branch-1. Backport HDFS-2868 (Add number of active transfer threads to the DataNode status) to branch-1 - Key: HDFS-3647 URL: https://issues.apache.org/jira/browse/HDFS-3647 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, performance Affects Versions: 0.20.2 Reporter: Steve Hoffman Assignee: Harsh J Fix For: 1.2.0 Attachments: HDFS-3647.patch, Screen Shot 2012-07-14 at 12.41.07 AM.png Not sure if this is in a newer version of Hadoop, but in CDH3u3 it isn't there. There is a lot of mystery surrounding how large to set dfs.datanode.max.xcievers. Most people say to just up it to 4096, but given that exceeding this will cause an HBase RegionServer shutdown (see Lars' blog post here: http://www.larsgeorge.com/2012/03/hadoop-hbase-and-xceivers.html), it would be nice if we could expose the current count via the built-in metrics framework (most likely under dfs). In this way we could watch it to see if we have it set too high, too low, time to bump it up, etc. Thoughts? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3628) The dfsadmin -setBalancerBandwidth command does not check for superuser privileges
Harsh J created HDFS-3628: - Summary: The dfsadmin -setBalancerBandwidth command does not check for superuser privileges Key: HDFS-3628 URL: https://issues.apache.org/jira/browse/HDFS-3628 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, name-node Affects Versions: 0.23.0, 0.20.205.0 Reporter: Harsh J Assignee: Harsh J Priority: Blocker The changes from HDFS-2202 failed to add in a checkSuperuserPrivilege();, and hence any user (not admins alone) can reset the balancer bandwidth across the cluster if they wished to. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3621) Add a main method to HdfsConfiguration, for debug purposes
Harsh J created HDFS-3621: - Summary: Add a main method to HdfsConfiguration, for debug purposes Key: HDFS-3621 URL: https://issues.apache.org/jira/browse/HDFS-3621 Project: Hadoop HDFS Issue Type: Improvement Reporter: Harsh J Priority: Trivial Just like Configuration has a main() func that dumps XML out for debug purposes, we should have a similar function under the HdfsConfiguration class that does the same. This is useful in testing out app classpath setups at times. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3611) NameNode prints unnecessary WARNs about edit log normally skipping a few bytes
Harsh J created HDFS-3611: - Summary: NameNode prints unnecessary WARNs about edit log normally skipping a few bytes Key: HDFS-3611 URL: https://issues.apache.org/jira/browse/HDFS-3611 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 2.0.0-alpha Reporter: Harsh J Priority: Trivial The NameNode currently warns these form of lines at every startup, even if there's no trouble really. For instance, the below is from a NN startup that was only just freshly formatted. {code} 12/07/08 20:00:22 WARN namenode.EditLogInputStream: skipping 1048563 bytes at the end of edit log '/Users/harshchouraria/Work/installs/temp-space/tmp-default/dfs-cdh4/data/current/edits_003-003': reached txid 3 out of 3 {code} If this skipping is not really a cause for warning, we should not log it at a WARN level but at an INFO or even DEBUG one. Avoids users getting unnecessarily concerned. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3612) Single namenode image directory config warning can be improved
Harsh J created HDFS-3612: - Summary: Single namenode image directory config warning can be improved Key: HDFS-3612 URL: https://issues.apache.org/jira/browse/HDFS-3612 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 2.0.0-alpha Reporter: Harsh J Priority: Trivial Currently, if you configure the NameNode to run with just one dfs.namenode.name.dir directory, it prints: {code} 12/07/08 20:00:22 WARN namenode.FSNamesystem: Only one dfs.namenode.name.dir directory configured , beware data loss!{code} We can improve this in a few ways as it is slightly ambiguous: # Fix punctuation spacing, there's always a space after a punctuation mark but never before one. # Perhaps the message is better printed with a reason of why it may cause a scare of data loss. For instance, we can print Detected a single storage directory in dfs.namenode.name.dir configuration. Beware of dataloss due to lack of redundant storage directories or so. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3613) GSet prints some INFO level values, which aren't really very useful to all
Harsh J created HDFS-3613: - Summary: GSet prints some INFO level values, which aren't really very useful to all Key: HDFS-3613 URL: https://issues.apache.org/jira/browse/HDFS-3613 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: Harsh J Priority: Trivial The following has long been seen in NameNode but I have never seen it being valued by anyone other than a HDFS developer: {code} 12/07/08 20:00:22 INFO util.GSet: VM type = 64-bit 12/07/08 20:00:22 INFO util.GSet: 2% max memory = 19.75 MB 12/07/08 20:00:22 INFO util.GSet: capacity = 2^21 = 2097152 entries 12/07/08 20:00:22 INFO util.GSet: recommended=2097152, actual=2097152 {code} Lets switch it down to DEBUG. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-1125) Removing a datanode (failed or decommissioned) should not require a namenode restart
[ https://issues.apache.org/jira/browse/HDFS-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-1125. --- Resolution: Duplicate Resolved via HDFS-1773. It was in the version after the one Allen tried above I think, thats why he may not have seen it? Please reopen if not. Removing a datanode (failed or decommissioned) should not require a namenode restart Key: HDFS-1125 URL: https://issues.apache.org/jira/browse/HDFS-1125 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 0.20.2 Reporter: Alex Loddengaard Priority: Blocker I've heard of several Hadoop users using dfsadmin -report to monitor the number of dead nodes, and alert if that number is not 0. This mechanism tends to work pretty well, except when a node is decommissioned or fails, because then the namenode requires a restart for said node to be entirely removed from HDFS. More details here: http://markmail.org/search/?q=decommissioned%20node%20showing%20up%20ad%20dead%20node%20in%20web%20based%09interface%20to%20namenode#query:decommissioned%20node%20showing%20up%20ad%20dead%20node%20in%20web%20based%09interface%20to%20namenode+page:1+mid:7gwqwdkobgfuszb4+state:results Removal from the exclude file and a refresh should get rid of the dead node. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3567) Provide a way to enforce clearing of trash data immediately
Harsh J created HDFS-3567: - Summary: Provide a way to enforce clearing of trash data immediately Key: HDFS-3567 URL: https://issues.apache.org/jira/browse/HDFS-3567 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 3.0.0 Reporter: Harsh J Priority: Minor As discussed at http://search-hadoop.com/m/r1lMa13eN7O, it would be good to have a dfsadmin sub-command (or similar) that admins can use to enforce a trash emptier option from the NameNode, instead of waiting for the trash clearance interval to pass. Can come handy when attempting to quickly delete away data in a filling up cluster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3570) Balancer shouldn't rely on DFS Space Used % as that ignores non-DFS used space
Harsh J created HDFS-3570: - Summary: Balancer shouldn't rely on DFS Space Used % as that ignores non-DFS used space Key: HDFS-3570 URL: https://issues.apache.org/jira/browse/HDFS-3570 Project: Hadoop HDFS Issue Type: Bug Components: balancer Affects Versions: 2.0.0-alpha Reporter: Harsh J Priority: Minor Report from a user here: https://groups.google.com/a/cloudera.org/d/msg/cdh-user/pIhNyDVxdVY/b7ENZmEvBjIJ, post archived at http://pastebin.com/eVFkk0A0 This user had a specific DN that had a large non-DFS usage among dfs.data.dirs, and very little DFS usage (which is computed against total possible capacity). Balancer apparently only looks at the usage, and ignores to consider that non-DFS usage may also be high on a DN/cluster. Hence, it thinks that if a DFS Usage report from DN is 8% only, its got a lot of free space to write more blocks, when that isn't true as shown by the case of this user. It went on scheduling writes to the DN to balance it out, but the DN simply can't accept any more blocks as a result of its disks' state. I think it would be better if we _computed_ the actual utilization based on {{(100-(actual remaining space))/(capacity)}}, as opposed to the current {{(dfs used)/(capacity)}}. Thoughts? This isn't very critical, however, cause it is very rare to see DN space being used for non DN data, but it does expose a valid bug. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3560) Enable automatic NameNode name-directory restore by default
Harsh J created HDFS-3560: - Summary: Enable automatic NameNode name-directory restore by default Key: HDFS-3560 URL: https://issues.apache.org/jira/browse/HDFS-3560 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 3.0.0 Reporter: Harsh J Priority: Minor HADOOP-4885 and several of its friends added in this feature since 0.21 (versions of these are also in 1.x). However, the feature is disabled by default currently. Since we've had it in use for a long time now, we should enable it by default (with any side-changes if necessary) - it is a helpful feature and since it has been working well for several users now without any issues, I do not see why it should be turned off by default anymore. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (HDFS-3522) If NN is in safemode, it should throw SafeModeException when getBlockLocations has zero locations
[ https://issues.apache.org/jira/browse/HDFS-3522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J reopened HDFS-3522: --- Reopening issue until comments and test are both addressed, just to have it noticed :) If NN is in safemode, it should throw SafeModeException when getBlockLocations has zero locations - Key: HDFS-3522 URL: https://issues.apache.org/jira/browse/HDFS-3522 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 3.0.0 Reporter: Brandon Li Assignee: Brandon Li Fix For: 2.0.1-alpha Attachments: HDFS-3522.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-3522) If NN is in safemode, it should throw SafeModeException when getBlockLocations has zero locations
[ https://issues.apache.org/jira/browse/HDFS-3522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-3522. --- Resolution: Fixed I am good with this change. Resolving. Thanks Nicholas for catching my blunder :) If NN is in safemode, it should throw SafeModeException when getBlockLocations has zero locations - Key: HDFS-3522 URL: https://issues.apache.org/jira/browse/HDFS-3522 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 3.0.0 Reporter: Brandon Li Assignee: Brandon Li Fix For: 2.0.1-alpha Attachments: HDFS-3522.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3475) Make the replication monitor multipliers configurable
Harsh J created HDFS-3475: - Summary: Make the replication monitor multipliers configurable Key: HDFS-3475 URL: https://issues.apache.org/jira/browse/HDFS-3475 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 3.0.0 Reporter: Harsh J Assignee: Harsh J Priority: Trivial BlockManager currently hardcodes the following two constants: {code} private static final int INVALIDATE_WORK_PCT_PER_ITERATION = 32; private static final int REPLICATION_WORK_MULTIPLIER_PER_ITERATION = 2; {code} These are used to throttle/limit the amount of deletion and replication-to-other-DN work done per heartbeat interval of a live DN. Not many have had reasons to want these changed so far but there have been a few requests I've faced over the past year from a variety of clusters I've helped maintain. I think with the improvements in disks and network thats already started to be rolled out in production environments out there, changing these may start making sense to some. Lets at least make it advanced-configurable with proper docs that warn adequately, with the defaults being what they are today. With hardcodes, it comes down to a recompile for admins, which is not something they may like. Please let me know your thoughts. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3476) Correct the default used in TestDFSClientRetries.busyTest() after HDFS-3462
Harsh J created HDFS-3476: - Summary: Correct the default used in TestDFSClientRetries.busyTest() after HDFS-3462 Key: HDFS-3476 URL: https://issues.apache.org/jira/browse/HDFS-3476 Project: Hadoop HDFS Issue Type: Improvement Components: test Affects Versions: 3.0.0 Reporter: Harsh J Priority: Minor Per Konstantin on HDFS-3462, the current default value specified in the changes made there is 0 and has to instead be the proper transceivers-# default. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3358) Specify explicitly that the NN UI status total is talking of persistent objects on heap.
Harsh J created HDFS-3358: - Summary: Specify explicitly that the NN UI status total is talking of persistent objects on heap. Key: HDFS-3358 URL: https://issues.apache.org/jira/browse/HDFS-3358 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 2.0.0 Reporter: Harsh J Assignee: Harsh J Priority: Trivial The NN shows, on its web UI, something like: {{223 files and directories, 138 blocks = 361 total.}} Followed by heap stats. We should clarify this line is talking of objects and is related to the heap summaries. Perhaps just being explicit about java-terms would be nicer: {{223 files and directories, 138 blocks = 361 total objects.}} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3366) Some stacktraces are now too lengthy and sometimes no good
Harsh J created HDFS-3366: - Summary: Some stacktraces are now too lengthy and sometimes no good Key: HDFS-3366 URL: https://issues.apache.org/jira/browse/HDFS-3366 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.0 Reporter: Harsh J Priority: Minor This is a high-on-nitpick ticket for the benefit of troubleshooting. This is partially related to all the PB-changes we've had. And also partially related to Java/JVMs. Take a case of an AccessControlException, which is pretty common in HDFS permissions layer. We now get, due to several more calls added at the RPC layer for PB (or maybe something else, if am mistaken): {code} Caused by: org.apache.hadoop.security.AccessControlException: org.apache.hadoop.security.AccessControlException: Permission denied: user=yarn, access=WRITE, inode=/:hdfs:supergroup:drwxr-xr-x at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:205) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:186) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:135) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4204) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:4175) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:2565) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:2529) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:640) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:412) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:42618) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:448) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:891) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1661) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1657) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1204) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1655) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:205) at $Proxy10.mkdirs(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:165) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:84) at $Proxy10.mkdirs(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:430) at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:1717) ... 9 more {code} The 9 more is what I was looking for, to identify the caller to debug on/find the exact directory. However it now gets eaten away cause just the mkdir-to-exception trace itself has grown quite a bit. Comparing this to 0.20, we have much fewer calls and that helps us see at least the real caller of mkdirs. I'm actually not sure what causes Java to print ... X more in these form of exception prints, but if thats controllable am all in favor of increasing its amount for HDFS (using new default java opts?). So that when an exception does occur, we don't get a nearly-unusable stacktrace. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-3366) Some stacktraces are now too lengthy and sometimes no good
[ https://issues.apache.org/jira/browse/HDFS-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-3366. --- Resolution: Invalid Actually, never mind. The http://stackoverflow.com/questions/1043378/print-full-call-stack-on-printstacktrace posts explain it all. And indeed, the docs be right. This is an invalid ticket. Please excuse the noise. There's no trouble :) Some stacktraces are now too lengthy and sometimes no good -- Key: HDFS-3366 URL: https://issues.apache.org/jira/browse/HDFS-3366 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.0 Reporter: Harsh J Priority: Minor This is a high-on-nitpick ticket for the benefit of troubleshooting. This is partially related to all the PB-changes we've had. And also partially related to Java/JVMs. Take a case of an AccessControlException, which is pretty common in HDFS permissions layer. We now get, due to several more calls added at the RPC layer for PB (or maybe something else, if am mistaken): {code} Caused by: org.apache.hadoop.security.AccessControlException: org.apache.hadoop.security.AccessControlException: Permission denied: user=yarn, access=WRITE, inode=/:hdfs:supergroup:drwxr-xr-x at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:205) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:186) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:135) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4204) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:4175) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:2565) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:2529) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:640) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:412) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:42618) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:448) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:891) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1661) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1657) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1204) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1655) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:205) at $Proxy10.mkdirs(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:165) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:84) at $Proxy10.mkdirs(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:430) at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:1717) ... 9 more {code} The 9 more is what I was looking for, to identify the caller to debug on/find the exact directory. However it now gets eaten away cause just the mkdir-to-exception trace itself has grown quite a bit. Comparing this to 0.20, we have much fewer calls and that helps us see at least the real caller of mkdirs. I'm actually not sure what causes Java to print ... X more in these form of exception prints, but if thats controllable am all in favor of increasing its amount for HDFS (using new default java opts?). So that when an exception does occur, we don't get a nearly-unusable stacktrace. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see:
[jira] [Created] (HDFS-2349) DN should log a WARN, not an INFO when it detects a corruption during block transfer
DN should log a WARN, not an INFO when it detects a corruption during block transfer Key: HDFS-2349 URL: https://issues.apache.org/jira/browse/HDFS-2349 Project: Hadoop HDFS Issue Type: Improvement Components: data-node Affects Versions: 0.20.204.0 Reporter: Harsh J Assignee: Harsh J Priority: Trivial Fix For: 0.24.0 Currently, in DataNode.java, we have: {code} LOG.info(Can't replicate block + block + because on-disk length + onDiskLength + is shorter than NameNode recorded length + block.getNumBytes()); {code} This log is better off as a WARN as it indicates (and also reports) a corruption. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2311) With just 1 block on the HDFS cluster, NN exits safemode on startup immediately.
With just 1 block on the HDFS cluster, NN exits safemode on startup immediately. Key: HDFS-2311 URL: https://issues.apache.org/jira/browse/HDFS-2311 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.20.203.0 Reporter: Harsh J Priority: Minor This is cause: (int) ((1 Block) * (0.999f Threshold Default Pct)) == 0, in which case SafeModeInfo's checks of modes makes an simple, direct exit of the safemode. Faulty code is possibly in FSNamesystem#setBlockTotal. This is a non major issue since with 2 blocks it would work fine, and will work fine with 1.0f Threshold Pct too. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-112) ClusterTestDFS fails
[ https://issues.apache.org/jira/browse/HDFS-112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-112. -- Resolution: Not A Problem This JIRA has grown stale over years and needs to be closed. The tests framework has changed considerably since '06. With the current mini clusters, giving a hosts array is possible for different hostnamed daemons, and by all the tests it carries, it does appear to work alright if you wanna use it for such purposes. ClusterTestDFS fails Key: HDFS-112 URL: https://issues.apache.org/jira/browse/HDFS-112 Project: Hadoop HDFS Issue Type: Bug Environment: local workstation (windows) Reporter: alan wootton Assignee: Sameer Paranjpye Attachments: ClusterTestFixes.patch, fix_clustertestdfs.patch The dfs unit tests, from the ant target 'cluster' have been failing. (ClusterTestDFSNamespaceLogging, ClusterTestDFS). I don't know if anyone but me cares about these tests, but I do. I would like to write better tests for dns. I think we all need that. They have been partially broken since test.dfs.same.host.targets.allowed went away and replication ceased for these tests. They got really broken when NameNode stopped automatically formatting itself . Since they seem to be ignored, I took the liberty of changing how they work. The main thing is, you must put this into your hosts file: 127.0.0.1 localhost0 127.0.0.1 localhost1 127.0.0.1 localhost2 127.0.0.1 localhost3 127.0.0.1 localhost4 127.0.0.1 localhost5 127.0.0.1 localhost6 127.0.0.1 localhost7 127.0.0.1 localhost8 127.0.0.1 localhost9 127.0.0.1 localhost10 127.0.0.1 localhost11 127.0.0.1 localhost12 127.0.0.1 localhost13 127.0.0.1 localhost14 127.0.0.1 localhost15 This way you can start DataNodes, and TaskTrackers (up to 16 of them) with unique hostnames. Also, I changed all the places that used to call InetAddress.getLocalHost().getHostName() to get it from a new method in Configuration (this issue is the same as http://issues.apache.org/jira/browse/HADOOP-197 ). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-70) Data node should shutdown when a critical error is returned by the name node
[ https://issues.apache.org/jira/browse/HDFS-70?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-70. - Resolution: Won't Fix HADOOP-266 was resolved as a Won't Fix and the DN currently works OK with the way it analyzes the exception classnames and determines if it has to shutdown. Marking this one as Won't Fix as well, following HADOOP-266 :) Data node should shutdown when a critical error is returned by the name node -- Key: HDFS-70 URL: https://issues.apache.org/jira/browse/HDFS-70 Project: Hadoop HDFS Issue Type: Bug Reporter: Konstantin Shvachko Assignee: Sameer Paranjpye Priority: Minor Currently data node does not distinguish between critical and non critical exceptions. Any exception is treated as a signal to sleep and then try again. See org.apache.hadoop.dfs.DataNode.run() This is happening because RPC always throws the same RemoteException. In some cases (like UnregisteredDatanodeException, IncorrectVersionException) the data node should shutdown rather than retry. This logic naturally belongs to the org.apache.hadoop.dfs.DataNode.offerService() but can be reasonably implemented (without examining the RemoteException.className field) after HADOOP-266 (2) is fixed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-339) Periodically move blocks from full nodes to those with space
[ https://issues.apache.org/jira/browse/HDFS-339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-339. -- Resolution: Not A Problem The introduction of the DFS Balancer (post '06, apparently) provides the feature of moving blocks around and balancing the DFS DNs. I personally do not think its a good idea to add a monitor to the NN for auto-triggering balancers since it would use up bandwidth without the user/admin ever knowing about it. One could surely write external tools to achieve this monitoring and running separately though. Resolving a Not-A-Problem, but do reopen if you feel strongly that the NN would really benefit from an additional service as this. Periodically move blocks from full nodes to those with space - Key: HDFS-339 URL: https://issues.apache.org/jira/browse/HDFS-339 Project: Hadoop HDFS Issue Type: Improvement Reporter: Bryan Pendleton Assignee: Sameer Paranjpye Continuance of Hadoop-386. The patch to that issue makes it possible to redistribute blocks (change replication up, wait for replication to succeed, then lower replication again). However, this requires a lot more space, is not automatic, and doesn't respect a reasonable I/O limit. I have actually had MapReduce jobs fail from block missing execptions after having recently changed the replication level (from 3 to 4, with no underreplications to start with) because the datanodes were too slow responding to requests while performing the necessary replications. A good fix to this problem would be a low-priority thread on the NameNode that schedules low-priority replications of blocks on over-full machines, followed by the removal of the extra replications. It might be worth having a specific prototocol for asking for these low-priority copies to happen in the datanodes, so that they continue to service (and be available to service) normal block requests. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-251) Automatically increase replication of often used files/blocks
[ https://issues.apache.org/jira/browse/HDFS-251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-251. -- Resolution: Duplicate Duplicate of https://issues.apache.org/jira/browse/HDFS-782 (closing this one, since the other has more discussions in it regarding dynamic replication) Automatically increase replication of often used files/blocks - Key: HDFS-251 URL: https://issues.apache.org/jira/browse/HDFS-251 Project: Hadoop HDFS Issue Type: New Feature Reporter: Johan Oskarsson Assignee: Sameer Paranjpye It would be interesting to see if a patch that makes the namenode save the number of times a certain file (or block if possible) is used. And then increase the replication of these files to increase performance. Any ideas on how to implement? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-128) downloading a file from dfs using the WI, using firefox, creates local files that start with a '-'
[ https://issues.apache.org/jira/browse/HDFS-128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-128. -- Resolution: Not A Problem This is more of a browser issue. For instance, Chrome chooses to replace the filename's {{/}} as {{_}} instead of a {{-}} like Safari and Firefox use. I do not think HDFS can/should take an action here and tweak filenames while sending things out (if you see the headers, the filename is intact with {{/}}s. downloading a file from dfs using the WI, using firefox, creates local files that start with a '-' -- Key: HDFS-128 URL: https://issues.apache.org/jira/browse/HDFS-128 Project: Hadoop HDFS Issue Type: Bug Reporter: Yoram Arnon Assignee: Sameer Paranjpye Priority: Minor '/' characters are converted to '-' when downloading a file from dfs using the WI. That's a good thing. But using firefox, where file names can not be modified when saving to disk, creates local files that start with a '-', which is inconvenient on some OS's. the first '/' character should be dropped rather than converted to a '-' -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-186) directory browser can not list all the entries for a large directory
[ https://issues.apache.org/jira/browse/HDFS-186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-186. -- Resolution: Not A Problem I just loaded 2000 files to test this. The web browser loads the list just fine. Even with larger numbers, I think it'd only take time depending on the NN's responsiveness. Probably best not to browse such directories in the browser page by page and get to things directly via the URLs. Resolving as not a problem. (This was filed circa '07) directory browser can not list all the entries for a large directory - Key: HDFS-186 URL: https://issues.apache.org/jira/browse/HDFS-186 Project: Hadoop HDFS Issue Type: Bug Environment: IE Firefox Safari Reporter: Hairong Kuang Assignee: Sameer Paranjpye When browsing a large directory, for example, one with 500 files, web browser is not able to display all the entries. Instead, it stops loading the page in the middle. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2111) Add tests for ensuring that the DN will start with a few bad data directories (Part 1 of testing DiskChecker)
Add tests for ensuring that the DN will start with a few bad data directories (Part 1 of testing DiskChecker) - Key: HDFS-2111 URL: https://issues.apache.org/jira/browse/HDFS-2111 Project: Hadoop HDFS Issue Type: Test Components: data-node, test Affects Versions: 0.23.0 Reporter: Harsh J Assignee: Harsh J Fix For: 0.23.0 Add tests to ensure that given multiple data dirs, if a single is bad, the DN should still start up. This is to check DiskChecker's functionality used in instantiating DataNodes -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (HDFS-1454) Update the documentation to reflect true client caching strategy
[ https://issues.apache.org/jira/browse/HDFS-1454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J reopened HDFS-1454: --- @Eli - Failed to actually verify the entire documents. Patch was intended to only remove staging, but I should've fixed other things as well I guess. Update the documentation to reflect true client caching strategy Key: HDFS-1454 URL: https://issues.apache.org/jira/browse/HDFS-1454 Project: Hadoop HDFS Issue Type: Improvement Components: documentation, hdfs client Affects Versions: 0.20.2 Reporter: Jeff Hammerbacher Assignee: Harsh J Fix For: 0.22.0 Attachments: HDFS-1454.r1.diff As noted on the mailing list (http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-user/201010.mbox/%3CAANLkTi=2csK+aY05bTOuO-UZv=o4w6ox2pq4nxgpd...@mail.gmail.com%3E), the Staging section of http://hadoop.apache.org/hdfs/docs/r0.21.0/hdfs_design.html#Data+Organization is out of date. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-1454) Update the documentation to reflect true client caching strategy
[ https://issues.apache.org/jira/browse/HDFS-1454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-1454. --- Resolution: Fixed Re-resolving per Todd's suggestion. Opened: HDFS-2036 Update the documentation to reflect true client caching strategy Key: HDFS-1454 URL: https://issues.apache.org/jira/browse/HDFS-1454 Project: Hadoop HDFS Issue Type: Improvement Components: documentation, hdfs client Affects Versions: 0.20.2 Reporter: Jeff Hammerbacher Assignee: Harsh J Fix For: 0.22.0 Attachments: HDFS-1454-reop.r1.diff, HDFS-1454-reop.r1.diff, HDFS-1454.r1.diff As noted on the mailing list (http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-user/201010.mbox/%3CAANLkTi=2csK+aY05bTOuO-UZv=o4w6ox2pq4nxgpd...@mail.gmail.com%3E), the Staging section of http://hadoop.apache.org/hdfs/docs/r0.21.0/hdfs_design.html#Data+Organization is out of date. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2036) Revise the HDFS design documentation
Revise the HDFS design documentation Key: HDFS-2036 URL: https://issues.apache.org/jira/browse/HDFS-2036 Project: Hadoop HDFS Issue Type: Improvement Components: documentation Affects Versions: 0.20.2 Reporter: Harsh J Attachments: HDFS-2036.r1.diff Although HDFS-1454 covered one change related to the staging feature, I think it would be a better idea to revise the entire document once more for any stale info it may carry (cause misleading of new adopters). Attached is one fix that fixes the default packet size (was: 4 KB, is: 64 KB). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira