[jira] [Commented] (HDFS-6645) Add test for successive Snapshots between XAttr modifications
[ https://issues.apache.org/jira/browse/HDFS-6645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14055882#comment-14055882 ] Jing Zhao commented on HDFS-6645: - +1. Thanks [~schu]! I will commit it tomorrow morning. Add test for successive Snapshots between XAttr modifications - Key: HDFS-6645 URL: https://issues.apache.org/jira/browse/HDFS-6645 Project: Hadoop HDFS Issue Type: Test Components: snapshots, test Affects Versions: 3.0.0, 2.6.0 Reporter: Stephen Chu Assignee: Stephen Chu Priority: Minor Attachments: HDFS-6645.001.patch In the current TestXAttrWithSnapshot unit tests, we create a single snapshot per test. We should test taking multiple snapshots on a path in between XAttr modifications of that path. We should also verify that deletion of a snapshot does not somehow alter the XAttrs of the other snapshots of the same path. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6646) [ HDFS Rolling Upgrade - Shell ] shutdownDatanode and getDatanodeInfo usage is missed
[ https://issues.apache.org/jira/browse/HDFS-6646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B updated HDFS-6646: Resolution: Fixed Fix Version/s: (was: 3.0.0) 2.6.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) committed to trunk and branch-2. Thanks [~brahmareddy]. [ HDFS Rolling Upgrade - Shell ] shutdownDatanode and getDatanodeInfo usage is missed -- Key: HDFS-6646 URL: https://issues.apache.org/jira/browse/HDFS-6646 Project: Hadoop HDFS Issue Type: Bug Components: tools Affects Versions: 2.4.1 Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Fix For: 2.6.0 Attachments: HDFS-6646.patch, HDFS-6646_1.patch Usage message is missed for shutdownDatanode and getdatanodeinfo Please check the following for same..(It's printing whole usage for dfsadmin) hdfs dfsadmin -shutdownDatanode Usage: java DFSAdmin Note: Administrative commands can only be run as the HDFS superuser. [-report] [-safemode enter | leave | get | wait] [-allowSnapshot snapshotDir] [-disallowSnapshot snapshotDir] [-saveNamespace] [-rollEdits] [-restoreFailedStorage true|false|check] [-refreshNodes] [-finalizeUpgrade] [-rollingUpgrade [query|prepare|finalize]] [-metasave filename] [-refreshServiceAcl] [-refreshUserToGroupsMappings] [-refreshSuperUserGroupsConfiguration] [-refreshCallQueue] [-printTopology] [-refreshNamenodes datanodehost:port] [-deleteBlockPool datanode-host:port blockpoolId [force]] [-setQuota quota dirname...dirname] [-clrQuota dirname...dirname] [-setSpaceQuota quota dirname...dirname] [-clrSpaceQuota dirname...dirname] [-setBalancerBandwidth bandwidth in bytes per second] [-fetchImage local directory] [-shutdownDatanode datanode_host:ipc_port [upgrade]] [-getDatanodeInfo datanode_host:ipc_port] [-help [cmd]] Generic options supported are -conf configuration file specify an application configuration file -D property=valueuse value for given property -fs local|namenode:port specify a namenode -jt local|jobtracker:portspecify a job tracker -files comma separated list of filesspecify comma separated files to be copied to the map reduce cluster -libjars comma separated list of jarsspecify comma separated jar files to include in the classpath. -archives comma separated list of archivesspecify comma separated archives to be unarchived on the compute machines. The general command line syntax is bin/hadoop command [genericOptions] [commandOptions] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6646) [ HDFS Rolling Upgrade - Shell ] shutdownDatanode and getDatanodeInfo usage is missed
[ https://issues.apache.org/jira/browse/HDFS-6646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14055885#comment-14055885 ] Hudson commented on HDFS-6646: -- FAILURE: Integrated in Hadoop-trunk-Commit #5848 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5848/]) HDFS-6646. [ HDFS Rolling Upgrade - Shell ] shutdownDatanode and getDatanodeInfo usage is missed ( Contributed by Brahma Reddy Battula) (vinayakumarb: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1609020) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java [ HDFS Rolling Upgrade - Shell ] shutdownDatanode and getDatanodeInfo usage is missed -- Key: HDFS-6646 URL: https://issues.apache.org/jira/browse/HDFS-6646 Project: Hadoop HDFS Issue Type: Bug Components: tools Affects Versions: 2.4.1 Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Fix For: 2.6.0 Attachments: HDFS-6646.patch, HDFS-6646_1.patch Usage message is missed for shutdownDatanode and getdatanodeinfo Please check the following for same..(It's printing whole usage for dfsadmin) hdfs dfsadmin -shutdownDatanode Usage: java DFSAdmin Note: Administrative commands can only be run as the HDFS superuser. [-report] [-safemode enter | leave | get | wait] [-allowSnapshot snapshotDir] [-disallowSnapshot snapshotDir] [-saveNamespace] [-rollEdits] [-restoreFailedStorage true|false|check] [-refreshNodes] [-finalizeUpgrade] [-rollingUpgrade [query|prepare|finalize]] [-metasave filename] [-refreshServiceAcl] [-refreshUserToGroupsMappings] [-refreshSuperUserGroupsConfiguration] [-refreshCallQueue] [-printTopology] [-refreshNamenodes datanodehost:port] [-deleteBlockPool datanode-host:port blockpoolId [force]] [-setQuota quota dirname...dirname] [-clrQuota dirname...dirname] [-setSpaceQuota quota dirname...dirname] [-clrSpaceQuota dirname...dirname] [-setBalancerBandwidth bandwidth in bytes per second] [-fetchImage local directory] [-shutdownDatanode datanode_host:ipc_port [upgrade]] [-getDatanodeInfo datanode_host:ipc_port] [-help [cmd]] Generic options supported are -conf configuration file specify an application configuration file -D property=valueuse value for given property -fs local|namenode:port specify a namenode -jt local|jobtracker:portspecify a job tracker -files comma separated list of filesspecify comma separated files to be copied to the map reduce cluster -libjars comma separated list of jarsspecify comma separated jar files to include in the classpath. -archives comma separated list of archivesspecify comma separated archives to be unarchived on the compute machines. The general command line syntax is bin/hadoop command [genericOptions] [commandOptions] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6455) NFS: Exception should be added in NFS log for invalid separator in allowed.hosts
[ https://issues.apache.org/jira/browse/HDFS-6455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14055897#comment-14055897 ] Abhiraj Butala commented on HDFS-6455: -- Thanks for reviewing the patch [~brandonli]. This is the output of showmount command: {code} abutala@abutala-vBox:~$ showmount -e 127.0.1.1 rpc mount export: RPC: Timed out {code} I don't see any errors or log messages in NFS server output. What should be the correct behavior of showmount in this case? NFS: Exception should be added in NFS log for invalid separator in allowed.hosts Key: HDFS-6455 URL: https://issues.apache.org/jira/browse/HDFS-6455 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.2.0 Reporter: Yesha Vora Attachments: HDFS-6455.patch The error for invalid separator in dfs.nfs.exports.allowed.hosts property should be added in nfs log file instead nfs.out file. Steps to reproduce: 1. Pass invalid separator in dfs.nfs.exports.allowed.hosts {noformat} propertynamedfs.nfs.exports.allowed.hosts/namevaluehost1 ro:host2 rw/value/property {noformat} 2. restart NFS server. NFS server fails to start and print exception console. {noformat} [hrt_qa@host1 hwqe]$ ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null host1 sudo su - -c \/usr/lib/hadoop/sbin/hadoop-daemon.sh start nfs3\ hdfs starting nfs3, logging to /tmp/log/hadoop/hdfs/hadoop-hdfs-nfs3-horst1.out DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. Exception in thread main java.lang.IllegalArgumentException: Incorrectly formatted line 'host1 ro:host2 rw' at org.apache.hadoop.nfs.NfsExports.getMatch(NfsExports.java:356) at org.apache.hadoop.nfs.NfsExports.init(NfsExports.java:151) at org.apache.hadoop.nfs.NfsExports.getInstance(NfsExports.java:54) at org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.init(RpcProgramNfs3.java:176) at org.apache.hadoop.hdfs.nfs.nfs3.Nfs3.init(Nfs3.java:43) at org.apache.hadoop.hdfs.nfs.nfs3.Nfs3.main(Nfs3.java:59) {noformat} NFS log does not print any error message. It directly shuts down. {noformat} STARTUP_MSG: java = 1.6.0_31 / 2014-05-27 18:47:13,972 INFO nfs3.Nfs3Base (SignalLogger.java:register(91)) - registered UNIX signal handlers for [TERM, HUP, INT] 2014-05-27 18:47:14,169 INFO nfs3.IdUserGroup (IdUserGroup.java:updateMapInternal(159)) - Updated user map size:259 2014-05-27 18:47:14,179 INFO nfs3.IdUserGroup (IdUserGroup.java:updateMapInternal(159)) - Updated group map size:73 2014-05-27 18:47:14,192 INFO nfs3.Nfs3Base (StringUtils.java:run(640)) - SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down Nfs3 at {noformat} NFS.out file has exception. {noformat} EPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. Exception in thread main java.lang.IllegalArgumentException: Incorrectly formatted line 'host1 ro:host2 rw' at org.apache.hadoop.nfs.NfsExports.getMatch(NfsExports.java:356) at org.apache.hadoop.nfs.NfsExports.init(NfsExports.java:151) at org.apache.hadoop.nfs.NfsExports.getInstance(NfsExports.java:54) at org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.init(RpcProgramNfs3.java:176) at org.apache.hadoop.hdfs.nfs.nfs3.Nfs3.init(Nfs3.java:43) at org.apache.hadoop.hdfs.nfs.nfs3.Nfs3.main(Nfs3.java:59) ulimit -a for user hdfs core file size (blocks, -c) 409600 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 188893 max locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) unlimited open files (-n) 32768 pipe size(512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 10240 cpu time (seconds, -t) unlimited max user processes (-u) 65536 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6647) Edit log corruption when pipeline recovery occurs for deleted file present in snapshot
Aaron T. Myers created HDFS-6647: Summary: Edit log corruption when pipeline recovery occurs for deleted file present in snapshot Key: HDFS-6647 URL: https://issues.apache.org/jira/browse/HDFS-6647 Project: Hadoop HDFS Issue Type: Bug Components: namenode, snapshots Affects Versions: 2.4.1 Reporter: Aaron T. Myers I've encountered a situation wherein an OP_UPDATE_BLOCKS can appear in the edit log for a file after an OP_DELETE has previously been logged for that file. Such an edit log sequence cannot then be successfully read by the NameNode. More details in the first comment. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6647) Edit log corruption when pipeline recovery occurs for deleted file present in snapshot
[ https://issues.apache.org/jira/browse/HDFS-6647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-6647: - Attachment: HDFS-6647-failing-test.patch I'm attaching a test case which illustrates the problem. When this problem occurs, the NN will fail to be able to read the edit log and will fail to start with an error like the following: {noformat} java.io.FileNotFoundException: File does not exist: /test-file at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:64) at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:54) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:444) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:227) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:136) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:816) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:676) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:279) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:964) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:711) at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:530) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:586) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:752) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:736) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1412) {noformat} The sequence of events that I've identified that can cause this are the following: # A file is opened for write and some data has been written/flushed to it, causing a block to be allocated. # A snapshot is taken which includes the file. # The file is deleted from the present file system, though the client has not yet closed the file. This will log an OP_DELETE to the edit log. # Some error happens triggering pipeline recovery, which log an OP_UPDATE_BLOCKS to the edit log. The reason it's possible for this to happen is basically because the {{updatePipeline}} RPC never checks if the file actually exists, but instead just finds the file INode based on the block ID being replaced in the pipeline. Later, when we're reading the {{OP_UPDATE_BLOCKS}} from the edit log, however, we try to find the file INode based on the path name of the file, which no longer exists because of the previous delete. It's not entirely obvious to me what the right solution to this issue should be. It shouldn't be difficult to change the {{FSEditLogLoader}} to be able to read the {{OP_UPDATE_BLOCKS}} op if we just change it to look up the INode by block ID. On the other hand, however, I'm not entirely sure we should even be allowing this sequence of edit log ops in the first place. It doesn't seem unreasonable to me that we might check that the file actually exists in the present file system in the {{updatePipeline}} RPC call and throw an error if it doesn't, since continuing to write to a file that only exists in a snapshot doesn't make much sense. Along similar lines, it seems a little odd to me that an INode that only exists in the snapshot would continue to be considered under-construction, but perhaps that's not unreasonable in itself. Would love to hear others' thoughts on this. Edit log corruption when pipeline recovery occurs for deleted file present in snapshot -- Key: HDFS-6647 URL: https://issues.apache.org/jira/browse/HDFS-6647 Project: Hadoop HDFS Issue Type: Bug Components: namenode, snapshots Affects Versions: 2.4.1 Reporter: Aaron T. Myers Attachments: HDFS-6647-failing-test.patch I've encountered a situation wherein an OP_UPDATE_BLOCKS can appear in the edit log for a file after an OP_DELETE has previously been logged for that file. Such an edit log sequence cannot then be successfully read by the NameNode. More details in the first comment. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6647) Edit log corruption when pipeline recovery occurs for deleted file present in snapshot
[ https://issues.apache.org/jira/browse/HDFS-6647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-6647: - Priority: Blocker (was: Major) Edit log corruption when pipeline recovery occurs for deleted file present in snapshot -- Key: HDFS-6647 URL: https://issues.apache.org/jira/browse/HDFS-6647 Project: Hadoop HDFS Issue Type: Bug Components: namenode, snapshots Affects Versions: 2.4.1 Reporter: Aaron T. Myers Priority: Blocker Attachments: HDFS-6647-failing-test.patch I've encountered a situation wherein an OP_UPDATE_BLOCKS can appear in the edit log for a file after an OP_DELETE has previously been logged for that file. Such an edit log sequence cannot then be successfully read by the NameNode. More details in the first comment. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6631) TestPread#testHedgedReadLoopTooManyTimes fails intermittently.
[ https://issues.apache.org/jira/browse/HDFS-6631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056006#comment-14056006 ] Liang Xie commented on HDFS-6631: - I see, compared with my dev box logfile, found inside attached org.apache.hadoop.hdfs.TestPread-output.txt file, it did not trigger a real hedged read. I only could find log like Waited 50ms to read from 127.0.0.1:x spawning hedged read in my logfile. In your file, the execution sequence is : -read from 127.0.0.1:53908 - here the counter is 1 -throw Checksum Exception -read from 127.0.0.1:53919 - here the counter is 2 -return result , line 1127 that means all two read path gone to the if (futures.isEmpty()) { flow (L1112) so the root question is if we set hedged.read.threshold = 50ms, and Mockito.doAnswer has a Thread.sleep(50+1), this statement: {code} FutureByteBuffer future = hedgedService.poll( dfsClient.getHedgedReadTimeout(), TimeUnit.MILLISECONDS); {code} In my dev box, it did just like Javadoc says: {code} Retrieves and removes the Future representing the next completed task, waiting if necessary up to the specified wait time if none are yet present. Parameters: timeout how long to wait before giving up, in units of unit unit a TimeUnit determining how to interpret the timeout parameter Returns: the Future representing the next completed task or null if the specified waiting time elapses before one is present Throws: InterruptedException - if interrupted while waiting {code} so the future will be null. but in Chris's box, the exception from thread pool will jump out firstly, so gone to L1140 directly: catch (ExecutionException e) so per my current understanding, it should be related with os thread schedule (granularity) , we probably need to enlarge the Mockito sleep interval. TestPread#testHedgedReadLoopTooManyTimes fails intermittently. -- Key: HDFS-6631 URL: https://issues.apache.org/jira/browse/HDFS-6631 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client, test Affects Versions: 3.0.0, 2.5.0 Reporter: Chris Nauroth Attachments: org.apache.hadoop.hdfs.TestPread-output.txt {{TestPread#testHedgedReadLoopTooManyTimes}} fails intermittently. It looks like a race condition on counting the expected number of loop iterations. I can repro the test failure more consistently on Windows. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6634) inotify in HDFS
[ https://issues.apache.org/jira/browse/HDFS-6634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056010#comment-14056010 ] Steve Loughran commented on HDFS-6634: -- I'd argue that making the audit log generally accessible via some kafka stream would be broadly useful, as it would support other use cases being discussed such as identify and delete temporary files, or, as spotify do with their audit log -identify files that haven't been read for a while. If custom code is needed, that's what we call a library inotify in HDFS --- Key: HDFS-6634 URL: https://issues.apache.org/jira/browse/HDFS-6634 Project: Hadoop HDFS Issue Type: New Feature Components: hdfs-client, namenode, qjm Reporter: James Thomas Assignee: James Thomas Attachments: inotify-intro.pdf Design a mechanism for applications like search engines to access the HDFS edit stream. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6631) TestPread#testHedgedReadLoopTooManyTimes fails intermittently.
[ https://issues.apache.org/jira/browse/HDFS-6631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Xie updated HDFS-6631: Attachment: HDFS-6631.txt TestPread#testHedgedReadLoopTooManyTimes fails intermittently. -- Key: HDFS-6631 URL: https://issues.apache.org/jira/browse/HDFS-6631 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client, test Affects Versions: 3.0.0, 2.5.0 Reporter: Chris Nauroth Attachments: HDFS-6631.txt, org.apache.hadoop.hdfs.TestPread-output.txt {{TestPread#testHedgedReadLoopTooManyTimes}} fails intermittently. It looks like a race condition on counting the expected number of loop iterations. I can repro the test failure more consistently on Windows. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6631) TestPread#testHedgedReadLoopTooManyTimes fails intermittently.
[ https://issues.apache.org/jira/browse/HDFS-6631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Xie updated HDFS-6631: Status: Patch Available (was: Open) TestPread#testHedgedReadLoopTooManyTimes fails intermittently. -- Key: HDFS-6631 URL: https://issues.apache.org/jira/browse/HDFS-6631 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client, test Affects Versions: 3.0.0, 2.5.0 Reporter: Chris Nauroth Attachments: HDFS-6631.txt, org.apache.hadoop.hdfs.TestPread-output.txt {{TestPread#testHedgedReadLoopTooManyTimes}} fails intermittently. It looks like a race condition on counting the expected number of loop iterations. I can repro the test failure more consistently on Windows. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6631) TestPread#testHedgedReadLoopTooManyTimes fails intermittently.
[ https://issues.apache.org/jira/browse/HDFS-6631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056019#comment-14056019 ] Liang Xie commented on HDFS-6631: - I just uploaded a tentitive patch, [~cnauroth], could you try it in your easy-repro env ? thank you very much! TestPread#testHedgedReadLoopTooManyTimes fails intermittently. -- Key: HDFS-6631 URL: https://issues.apache.org/jira/browse/HDFS-6631 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client, test Affects Versions: 3.0.0, 2.5.0 Reporter: Chris Nauroth Attachments: HDFS-6631.txt, org.apache.hadoop.hdfs.TestPread-output.txt {{TestPread#testHedgedReadLoopTooManyTimes}} fails intermittently. It looks like a race condition on counting the expected number of loop iterations. I can repro the test failure more consistently on Windows. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HDFS-6631) TestPread#testHedgedReadLoopTooManyTimes fails intermittently.
[ https://issues.apache.org/jira/browse/HDFS-6631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Xie reassigned HDFS-6631: --- Assignee: Liang Xie TestPread#testHedgedReadLoopTooManyTimes fails intermittently. -- Key: HDFS-6631 URL: https://issues.apache.org/jira/browse/HDFS-6631 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client, test Affects Versions: 3.0.0, 2.5.0 Reporter: Chris Nauroth Assignee: Liang Xie Attachments: HDFS-6631.txt, org.apache.hadoop.hdfs.TestPread-output.txt {{TestPread#testHedgedReadLoopTooManyTimes}} fails intermittently. It looks like a race condition on counting the expected number of loop iterations. I can repro the test failure more consistently on Windows. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6614) shorten TestPread run time with a smaller retry timeout setting
[ https://issues.apache.org/jira/browse/HDFS-6614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056079#comment-14056079 ] Hudson commented on HDFS-6614: -- FAILURE: Integrated in Hadoop-Yarn-trunk #608 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/608/]) HDFS-6614. Addendum patch to shorten TestPread run time with smaller retry timeout setting. Contributed by Liang Xie. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1608846) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestPread.java shorten TestPread run time with a smaller retry timeout setting --- Key: HDFS-6614 URL: https://issues.apache.org/jira/browse/HDFS-6614 Project: Hadoop HDFS Issue Type: Test Components: test Affects Versions: 3.0.0, 2.5.0 Reporter: Liang Xie Assignee: Liang Xie Priority: Minor Fix For: 3.0.0, 2.5.0 Attachments: HDFS-6614-addmium.txt, HDFS-6614.txt Just notice logs like this from TestPread: DFS chooseDataNode: got # 3 IOException, will wait for 9909.622860072854 msec so i tried to set a smaller retry window value. Before patch: T E S T S --- Running org.apache.hadoop.hdfs.TestPread Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 154.812 sec - in org.apache.hadoop.hdfs.TestPread After the change: T E S T S --- Running org.apache.hadoop.hdfs.TestPread Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 131.724 sec - in org.apache.hadoop.hdfs.TestPread -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6638) shorten test run time with a smaller retry timeout setting
[ https://issues.apache.org/jira/browse/HDFS-6638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056077#comment-14056077 ] Hudson commented on HDFS-6638: -- FAILURE: Integrated in Hadoop-Yarn-trunk #608 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/608/]) HDFS-6638. Shorten test run time with a smaller retry timeout setting. Contributed by Liang Xie. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1608905) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockMissingException.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockReaderLocalLegacy.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestClientReportBadBlock.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestCrcCorruption.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSShell.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestEncryptedTransfer.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestMissingBlocksAlert.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockTokenWithDFS.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsck.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestListCorruptFileBlocks.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestFailoverWithBlockTokensEnabled.java shorten test run time with a smaller retry timeout setting -- Key: HDFS-6638 URL: https://issues.apache.org/jira/browse/HDFS-6638 Project: Hadoop HDFS Issue Type: Test Components: test Affects Versions: 3.0.0, 2.5.0 Reporter: Liang Xie Assignee: Liang Xie Fix For: 3.0.0, 2.6.0 Attachments: HDFS-6638.txt similiar with HDFS-6614, i think it's a general test duration optimization tip, so i grep IOException, will wait for from a full test run under hdfs project, found several test cases could be optimized, so made a simple patch. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6646) [ HDFS Rolling Upgrade - Shell ] shutdownDatanode and getDatanodeInfo usage is missed
[ https://issues.apache.org/jira/browse/HDFS-6646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056083#comment-14056083 ] Hudson commented on HDFS-6646: -- FAILURE: Integrated in Hadoop-Yarn-trunk #608 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/608/]) HDFS-6646. [ HDFS Rolling Upgrade - Shell ] shutdownDatanode and getDatanodeInfo usage is missed ( Contributed by Brahma Reddy Battula) (vinayakumarb: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1609020) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java [ HDFS Rolling Upgrade - Shell ] shutdownDatanode and getDatanodeInfo usage is missed -- Key: HDFS-6646 URL: https://issues.apache.org/jira/browse/HDFS-6646 Project: Hadoop HDFS Issue Type: Bug Components: tools Affects Versions: 2.4.1 Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Fix For: 2.6.0 Attachments: HDFS-6646.patch, HDFS-6646_1.patch Usage message is missed for shutdownDatanode and getdatanodeinfo Please check the following for same..(It's printing whole usage for dfsadmin) hdfs dfsadmin -shutdownDatanode Usage: java DFSAdmin Note: Administrative commands can only be run as the HDFS superuser. [-report] [-safemode enter | leave | get | wait] [-allowSnapshot snapshotDir] [-disallowSnapshot snapshotDir] [-saveNamespace] [-rollEdits] [-restoreFailedStorage true|false|check] [-refreshNodes] [-finalizeUpgrade] [-rollingUpgrade [query|prepare|finalize]] [-metasave filename] [-refreshServiceAcl] [-refreshUserToGroupsMappings] [-refreshSuperUserGroupsConfiguration] [-refreshCallQueue] [-printTopology] [-refreshNamenodes datanodehost:port] [-deleteBlockPool datanode-host:port blockpoolId [force]] [-setQuota quota dirname...dirname] [-clrQuota dirname...dirname] [-setSpaceQuota quota dirname...dirname] [-clrSpaceQuota dirname...dirname] [-setBalancerBandwidth bandwidth in bytes per second] [-fetchImage local directory] [-shutdownDatanode datanode_host:ipc_port [upgrade]] [-getDatanodeInfo datanode_host:ipc_port] [-help [cmd]] Generic options supported are -conf configuration file specify an application configuration file -D property=valueuse value for given property -fs local|namenode:port specify a namenode -jt local|jobtracker:portspecify a job tracker -files comma separated list of filesspecify comma separated files to be copied to the map reduce cluster -libjars comma separated list of jarsspecify comma separated jar files to include in the classpath. -archives comma separated list of archivesspecify comma separated archives to be unarchived on the compute machines. The general command line syntax is bin/hadoop command [genericOptions] [commandOptions] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-4286) Changes from BOOKKEEPER-203 broken capability of including bookkeeper-server jar in hidden package of BKJM
[ https://issues.apache.org/jira/browse/HDFS-4286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056080#comment-14056080 ] Hudson commented on HDFS-4286: -- FAILURE: Integrated in Hadoop-Yarn-trunk #608 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/608/]) HDFS-4286. Changes from BOOKKEEPER-203 broken capability of including bookkeeper-server jar in hidden package of BKJM. Contributed by Rakesh R. (umamahesh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1608764) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/pom.xml * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HDFSHighAvailabilityWithNFS.apt.vm Changes from BOOKKEEPER-203 broken capability of including bookkeeper-server jar in hidden package of BKJM -- Key: HDFS-4286 URL: https://issues.apache.org/jira/browse/HDFS-4286 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Vinayakumar B Assignee: Rakesh R Fix For: 3.0.0, 2.5.0 Attachments: HDFS-4286.patch, HDFS-4286.patch BOOKKEEPER-203 introduced changes to LedgerLayout to include ManagerFactoryClass instead of ManagerFactoryName. So because of this, BKJM cannot shade the bookkeeper-server jar inside BKJM jar LAYOUT znode created by BookieServer is not readable by the BKJM as it have classes in hidden packages. (same problem vice versa) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-4221) Remove the format limitation point from BKJM documentation as HDFS-3810 closed
[ https://issues.apache.org/jira/browse/HDFS-4221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056081#comment-14056081 ] Hudson commented on HDFS-4221: -- FAILURE: Integrated in Hadoop-Yarn-trunk #608 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/608/]) HDFS-4221. Remove the format limitation point from BKJM documentation as HDFS-3810 closed. Contributed by Rakesh R. (umamahesh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1608776) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HDFSHighAvailabilityWithNFS.apt.vm Remove the format limitation point from BKJM documentation as HDFS-3810 closed -- Key: HDFS-4221 URL: https://issues.apache.org/jira/browse/HDFS-4221 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: 3.0.0, 2.0.3-alpha Reporter: Uma Maheswara Rao G Assignee: Rakesh R Fix For: 3.0.0, 2.5.0 Attachments: HDFS-4221.patch Remove the format limitation point from BKJM documentation as HDFS-3810 closed -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5411) Update Bookkeeper dependency to 4.2.3
[ https://issues.apache.org/jira/browse/HDFS-5411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056082#comment-14056082 ] Hudson commented on HDFS-5411: -- FAILURE: Integrated in Hadoop-Yarn-trunk #608 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/608/]) HDFS-5411. Update Bookkeeper dependency to 4.2.3. Contributed by Rakesh R. (umamahesh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1608781) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/main/java/org/apache/hadoop/contrib/bkjournal/BookKeeperJournalManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/test/java/org/apache/hadoop/contrib/bkjournal/BKJMUtil.java * /hadoop/common/trunk/hadoop-project/pom.xml Update Bookkeeper dependency to 4.2.3 - Key: HDFS-5411 URL: https://issues.apache.org/jira/browse/HDFS-5411 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: 2.2.0 Reporter: Robert Rati Assignee: Rakesh R Priority: Minor Fix For: 3.0.0, 2.5.0 Attachments: HDFS-5411.patch, HDFS-5411.patch Update the bookkeeper dependency to 4.2.3. This eases compilation on Fedora platforms -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6627) Rename DataNode#checkWriteAccess to checkReadAccess.
[ https://issues.apache.org/jira/browse/HDFS-6627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056078#comment-14056078 ] Hudson commented on HDFS-6627: -- FAILURE: Integrated in Hadoop-Yarn-trunk #608 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/608/]) HDFS-6627. Rename DataNode#checkWriteAccess to checkReadAccess. Contributed by Liang Xie. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1608940) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java Rename DataNode#checkWriteAccess to checkReadAccess. Key: HDFS-6627 URL: https://issues.apache.org/jira/browse/HDFS-6627 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 3.0.0, 2.5.0 Reporter: Liang Xie Assignee: Liang Xie Fix For: 3.0.0, 2.6.0 Attachments: HDFS-6627.txt Just read getReplicaVisibleLength() code and found it, DataNode.checkWriteAccess is only invoked by DataNode.getReplicaVisibleLength(), let's rename it to checkReadAccess to avoid confusing, since the real impl here is check AccessMode.READ: {code} blockPoolTokenSecretManager.checkAccess(id, null, block, BlockTokenSecretManager.AccessMode.READ); {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-3810) Implement format() for BKJM
[ https://issues.apache.org/jira/browse/HDFS-3810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056076#comment-14056076 ] Hudson commented on HDFS-3810: -- FAILURE: Integrated in Hadoop-Yarn-trunk #608 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/608/]) HDFS-4221. Remove the format limitation point from BKJM documentation as HDFS-3810 closed. Contributed by Rakesh R. (umamahesh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1608776) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HDFSHighAvailabilityWithNFS.apt.vm Implement format() for BKJM --- Key: HDFS-3810 URL: https://issues.apache.org/jira/browse/HDFS-3810 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: 3.0.0 Reporter: Ivan Kelly Assignee: Ivan Kelly Fix For: 3.0.0, 2.0.3-alpha Attachments: HDFS-3810.diff, HDFS-3810.diff, HDFS-3810.diff At the moment, formatting for BKJM is done on initialization. Reinitializing is a manual process. This JIRA is to implement the JournalManager#format API, so that BKJM can be formatting along with all other storage methods. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6422) getfattr in CLI doesn't throw exception or return non-0 return code when xattr doesn't exist
[ https://issues.apache.org/jira/browse/HDFS-6422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056098#comment-14056098 ] Charles Lamb commented on HDFS-6422: [~yi.a.liu], [~umamaheswararao], Given Uma's work on HDFS-6556, I want to clarify what remains to be done on this patch. Earlier in the comments, I said: {quote} Throw an exception if: . the caller requests an attribute that doesn't exist, . the caller requests an attribute and they don't have proper permissions, . the caller requests an attribute and they don't have permission to the namespace. This applies to the trusted namespace. . the caller specifies an unknown namespace. The gist of Linux extended attribute permissions is that you need access to the inode to read/write xattr names and you need access to the entity itself (i.e. a file or directory) to read/write xattr values. The former is determined by the parent directory permissions and the latter by the entity's permissions (i.e. the thing on which the extended attributes are associated). You need scan/execute permissions on the parent (owning) directory to access extended attribute names. You need read permission on the entity itself to read extended attribute values and you need write permission to modify them. {quote} Do you believe you have the permissions checking correct now and that only the exception throwing needs to be fixed or is there still more work to be done on read permissions? Specifically, that we should be validating xattr name access based on scan permission on the inode's owning directory and xattr value access based on the inode's permission? getfattr in CLI doesn't throw exception or return non-0 return code when xattr doesn't exist Key: HDFS-6422 URL: https://issues.apache.org/jira/browse/HDFS-6422 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 2.5.0 Reporter: Charles Lamb Assignee: Charles Lamb Attachments: HDFS-6422.1.patch, HDFS-6422.2.patch, HDFS-6422.3.patch If you do hdfs dfs -getfattr -n user.blah /foo and user.blah doesn't exist, the command prints # file: /foo and a 0 return code. It should print an exception and return a non-0 return code instead. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6630) Unable to fetch the block information by Browsing the file system on Namenode UI through IE9
[ https://issues.apache.org/jira/browse/HDFS-6630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B updated HDFS-6630: Attachment: HDFS-6630.patch Attached the patch which fixes the issue in IE. I have verified the issue in IE-10 of my Windows 8 PC. Hi [~wheat9], Can you take a look at this if possible... Thanks in Advance. Unable to fetch the block information by Browsing the file system on Namenode UI through IE9 - Key: HDFS-6630 URL: https://issues.apache.org/jira/browse/HDFS-6630 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.1 Reporter: J.Andreina Attachments: HDFS-6630.patch On IE9 follow the below steps NNUI -- Utilities - Browse the File system - click on File name Instead of displaying the Block information , it displays as {noformat} Failed to retreive data from /webhdfs/v1/4?op=GET_BLOCK_LOCATIONS: No Transport {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6614) shorten TestPread run time with a smaller retry timeout setting
[ https://issues.apache.org/jira/browse/HDFS-6614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056124#comment-14056124 ] Hudson commented on HDFS-6614: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1799 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1799/]) HDFS-6614. Addendum patch to shorten TestPread run time with smaller retry timeout setting. Contributed by Liang Xie. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1608846) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestPread.java shorten TestPread run time with a smaller retry timeout setting --- Key: HDFS-6614 URL: https://issues.apache.org/jira/browse/HDFS-6614 Project: Hadoop HDFS Issue Type: Test Components: test Affects Versions: 3.0.0, 2.5.0 Reporter: Liang Xie Assignee: Liang Xie Priority: Minor Fix For: 3.0.0, 2.5.0 Attachments: HDFS-6614-addmium.txt, HDFS-6614.txt Just notice logs like this from TestPread: DFS chooseDataNode: got # 3 IOException, will wait for 9909.622860072854 msec so i tried to set a smaller retry window value. Before patch: T E S T S --- Running org.apache.hadoop.hdfs.TestPread Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 154.812 sec - in org.apache.hadoop.hdfs.TestPread After the change: T E S T S --- Running org.apache.hadoop.hdfs.TestPread Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 131.724 sec - in org.apache.hadoop.hdfs.TestPread -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6627) Rename DataNode#checkWriteAccess to checkReadAccess.
[ https://issues.apache.org/jira/browse/HDFS-6627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056123#comment-14056123 ] Hudson commented on HDFS-6627: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1799 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1799/]) HDFS-6627. Rename DataNode#checkWriteAccess to checkReadAccess. Contributed by Liang Xie. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1608940) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java Rename DataNode#checkWriteAccess to checkReadAccess. Key: HDFS-6627 URL: https://issues.apache.org/jira/browse/HDFS-6627 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 3.0.0, 2.5.0 Reporter: Liang Xie Assignee: Liang Xie Fix For: 3.0.0, 2.6.0 Attachments: HDFS-6627.txt Just read getReplicaVisibleLength() code and found it, DataNode.checkWriteAccess is only invoked by DataNode.getReplicaVisibleLength(), let's rename it to checkReadAccess to avoid confusing, since the real impl here is check AccessMode.READ: {code} blockPoolTokenSecretManager.checkAccess(id, null, block, BlockTokenSecretManager.AccessMode.READ); {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-3810) Implement format() for BKJM
[ https://issues.apache.org/jira/browse/HDFS-3810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056120#comment-14056120 ] Hudson commented on HDFS-3810: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1799 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1799/]) HDFS-4221. Remove the format limitation point from BKJM documentation as HDFS-3810 closed. Contributed by Rakesh R. (umamahesh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1608776) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HDFSHighAvailabilityWithNFS.apt.vm Implement format() for BKJM --- Key: HDFS-3810 URL: https://issues.apache.org/jira/browse/HDFS-3810 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: 3.0.0 Reporter: Ivan Kelly Assignee: Ivan Kelly Fix For: 3.0.0, 2.0.3-alpha Attachments: HDFS-3810.diff, HDFS-3810.diff, HDFS-3810.diff At the moment, formatting for BKJM is done on initialization. Reinitializing is a manual process. This JIRA is to implement the JournalManager#format API, so that BKJM can be formatting along with all other storage methods. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-4286) Changes from BOOKKEEPER-203 broken capability of including bookkeeper-server jar in hidden package of BKJM
[ https://issues.apache.org/jira/browse/HDFS-4286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056125#comment-14056125 ] Hudson commented on HDFS-4286: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1799 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1799/]) HDFS-4286. Changes from BOOKKEEPER-203 broken capability of including bookkeeper-server jar in hidden package of BKJM. Contributed by Rakesh R. (umamahesh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1608764) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/pom.xml * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HDFSHighAvailabilityWithNFS.apt.vm Changes from BOOKKEEPER-203 broken capability of including bookkeeper-server jar in hidden package of BKJM -- Key: HDFS-4286 URL: https://issues.apache.org/jira/browse/HDFS-4286 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Vinayakumar B Assignee: Rakesh R Fix For: 3.0.0, 2.5.0 Attachments: HDFS-4286.patch, HDFS-4286.patch BOOKKEEPER-203 introduced changes to LedgerLayout to include ManagerFactoryClass instead of ManagerFactoryName. So because of this, BKJM cannot shade the bookkeeper-server jar inside BKJM jar LAYOUT znode created by BookieServer is not readable by the BKJM as it have classes in hidden packages. (same problem vice versa) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6638) shorten test run time with a smaller retry timeout setting
[ https://issues.apache.org/jira/browse/HDFS-6638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056121#comment-14056121 ] Hudson commented on HDFS-6638: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1799 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1799/]) HDFS-6638. Shorten test run time with a smaller retry timeout setting. Contributed by Liang Xie. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1608905) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockMissingException.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockReaderLocalLegacy.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestClientReportBadBlock.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestCrcCorruption.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSShell.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestEncryptedTransfer.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestMissingBlocksAlert.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockTokenWithDFS.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsck.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestListCorruptFileBlocks.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestFailoverWithBlockTokensEnabled.java shorten test run time with a smaller retry timeout setting -- Key: HDFS-6638 URL: https://issues.apache.org/jira/browse/HDFS-6638 Project: Hadoop HDFS Issue Type: Test Components: test Affects Versions: 3.0.0, 2.5.0 Reporter: Liang Xie Assignee: Liang Xie Fix For: 3.0.0, 2.6.0 Attachments: HDFS-6638.txt similiar with HDFS-6614, i think it's a general test duration optimization tip, so i grep IOException, will wait for from a full test run under hdfs project, found several test cases could be optimized, so made a simple patch. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5411) Update Bookkeeper dependency to 4.2.3
[ https://issues.apache.org/jira/browse/HDFS-5411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056127#comment-14056127 ] Hudson commented on HDFS-5411: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1799 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1799/]) HDFS-5411. Update Bookkeeper dependency to 4.2.3. Contributed by Rakesh R. (umamahesh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1608781) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/main/java/org/apache/hadoop/contrib/bkjournal/BookKeeperJournalManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/test/java/org/apache/hadoop/contrib/bkjournal/BKJMUtil.java * /hadoop/common/trunk/hadoop-project/pom.xml Update Bookkeeper dependency to 4.2.3 - Key: HDFS-5411 URL: https://issues.apache.org/jira/browse/HDFS-5411 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: 2.2.0 Reporter: Robert Rati Assignee: Rakesh R Priority: Minor Fix For: 3.0.0, 2.5.0 Attachments: HDFS-5411.patch, HDFS-5411.patch Update the bookkeeper dependency to 4.2.3. This eases compilation on Fedora platforms -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-4221) Remove the format limitation point from BKJM documentation as HDFS-3810 closed
[ https://issues.apache.org/jira/browse/HDFS-4221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056126#comment-14056126 ] Hudson commented on HDFS-4221: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1799 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1799/]) HDFS-4221. Remove the format limitation point from BKJM documentation as HDFS-3810 closed. Contributed by Rakesh R. (umamahesh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1608776) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HDFSHighAvailabilityWithNFS.apt.vm Remove the format limitation point from BKJM documentation as HDFS-3810 closed -- Key: HDFS-4221 URL: https://issues.apache.org/jira/browse/HDFS-4221 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: 3.0.0, 2.0.3-alpha Reporter: Uma Maheswara Rao G Assignee: Rakesh R Fix For: 3.0.0, 2.5.0 Attachments: HDFS-4221.patch Remove the format limitation point from BKJM documentation as HDFS-3810 closed -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6646) [ HDFS Rolling Upgrade - Shell ] shutdownDatanode and getDatanodeInfo usage is missed
[ https://issues.apache.org/jira/browse/HDFS-6646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056128#comment-14056128 ] Hudson commented on HDFS-6646: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1799 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1799/]) HDFS-6646. [ HDFS Rolling Upgrade - Shell ] shutdownDatanode and getDatanodeInfo usage is missed ( Contributed by Brahma Reddy Battula) (vinayakumarb: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1609020) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java [ HDFS Rolling Upgrade - Shell ] shutdownDatanode and getDatanodeInfo usage is missed -- Key: HDFS-6646 URL: https://issues.apache.org/jira/browse/HDFS-6646 Project: Hadoop HDFS Issue Type: Bug Components: tools Affects Versions: 2.4.1 Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Fix For: 2.6.0 Attachments: HDFS-6646.patch, HDFS-6646_1.patch Usage message is missed for shutdownDatanode and getdatanodeinfo Please check the following for same..(It's printing whole usage for dfsadmin) hdfs dfsadmin -shutdownDatanode Usage: java DFSAdmin Note: Administrative commands can only be run as the HDFS superuser. [-report] [-safemode enter | leave | get | wait] [-allowSnapshot snapshotDir] [-disallowSnapshot snapshotDir] [-saveNamespace] [-rollEdits] [-restoreFailedStorage true|false|check] [-refreshNodes] [-finalizeUpgrade] [-rollingUpgrade [query|prepare|finalize]] [-metasave filename] [-refreshServiceAcl] [-refreshUserToGroupsMappings] [-refreshSuperUserGroupsConfiguration] [-refreshCallQueue] [-printTopology] [-refreshNamenodes datanodehost:port] [-deleteBlockPool datanode-host:port blockpoolId [force]] [-setQuota quota dirname...dirname] [-clrQuota dirname...dirname] [-setSpaceQuota quota dirname...dirname] [-clrSpaceQuota dirname...dirname] [-setBalancerBandwidth bandwidth in bytes per second] [-fetchImage local directory] [-shutdownDatanode datanode_host:ipc_port [upgrade]] [-getDatanodeInfo datanode_host:ipc_port] [-help [cmd]] Generic options supported are -conf configuration file specify an application configuration file -D property=valueuse value for given property -fs local|namenode:port specify a namenode -jt local|jobtracker:portspecify a job tracker -files comma separated list of filesspecify comma separated files to be copied to the map reduce cluster -libjars comma separated list of jarsspecify comma separated jar files to include in the classpath. -archives comma separated list of archivesspecify comma separated archives to be unarchived on the compute machines. The general command line syntax is bin/hadoop command [genericOptions] [commandOptions] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6648) Order of namenodes in ConfiguredFailoverProxyProvider is not defined by order in hdfs-site.xml
Rafal Wojdyla created HDFS-6648: --- Summary: Order of namenodes in ConfiguredFailoverProxyProvider is not defined by order in hdfs-site.xml Key: HDFS-6648 URL: https://issues.apache.org/jira/browse/HDFS-6648 Project: Hadoop HDFS Issue Type: Bug Components: ha, hdfs-client Affects Versions: 2.2.0 Reporter: Rafal Wojdyla In org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider, in the constructor, there's a map nameservice : service-id : service-rpc-address (DFSUtil.getHaNnRpcAddresses). It's a LinkedHashMap of HashMaps. The order is kept for _nameservices_. Then to find active namenode, for nameservice, we get HashMap of service-id : service-rpc-address for requested nameservice (taken from URI request), And for this HashMap we get values - order of this collection is not strictly defined! In the code: {code} CollectionInetSocketAddress addressesOfNns = addressesInNN.values(); {code} And then we put these values (in not defined order) into ArrayList of proxies, and then in getProxy we start from first proxy in the list and failover to next if needed. It would make sense for ConfiguredFailoverProxyProvider to keep order of proxies/namenodes defined in hdfs-site.xml. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-4265) BKJM doesn't take advantage of speculative reads
[ https://issues.apache.org/jira/browse/HDFS-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-4265: --- Attachment: 003-HDFS-4265.patch BKJM doesn't take advantage of speculative reads Key: HDFS-4265 URL: https://issues.apache.org/jira/browse/HDFS-4265 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: 2.2.0 Reporter: Ivan Kelly Assignee: Rakesh R Attachments: 001-HDFS-4265.patch, 002-HDFS-4265.patch, 003-HDFS-4265.patch BookKeeperEditLogInputStream reads entry at a time, so it doesn't take advantage of the speculative read mechanism introduced by BOOKKEEPER-336. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6631) TestPread#testHedgedReadLoopTooManyTimes fails intermittently.
[ https://issues.apache.org/jira/browse/HDFS-6631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056270#comment-14056270 ] Hadoop QA commented on HDFS-6631: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12654772/HDFS-6631.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7301//console This message is automatically generated. TestPread#testHedgedReadLoopTooManyTimes fails intermittently. -- Key: HDFS-6631 URL: https://issues.apache.org/jira/browse/HDFS-6631 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client, test Affects Versions: 3.0.0, 2.5.0 Reporter: Chris Nauroth Assignee: Liang Xie Attachments: HDFS-6631.txt, org.apache.hadoop.hdfs.TestPread-output.txt {{TestPread#testHedgedReadLoopTooManyTimes}} fails intermittently. It looks like a race condition on counting the expected number of loop iterations. I can repro the test failure more consistently on Windows. -- This message was sent by Atlassian JIRA (v6.2#6252)
webhdfs kerberos not working with multiple users
Hi, We are facing issue with multiple crendentials present in the Kerberos credential cache and when other users trying to connect curl fails and throwing expecting only the user from the primary cache. We have 2 different principals each attached to the same realm and when trying to connect using the curl, it always loading the primary cache and not searching for other credentials in the cache and failing. klist -A output snippet showing 2 different credentials, Ticket cache: DIR::/etc/netwitness/wc_cache_dir/tktSQ8abu Default principal: javascript:void(0); gpad...@example.com Valid starting ExpiresService principal 07/09/14 18:31:12 07/10/14 18:22:55 krbtgt/ javascript:void(0); example@example.com renew until 07/09/14 18:31:12 Ticket cache: DIR::/etc/netwitness/wc_cache_dir/tktEJgnPE Default principal: hdfs/ javascript:void(0); pivhdsne.krb...@example.com Valid starting ExpiresService principal 07/09/14 18:30:54 07/10/14 18:22:38 krbtgt/ javascript:void(0); example@example.com renew until 07/09/14 18:30:54 Here our cache has 2 users gpadmin and hdfs, when user tries to connect with gpadmin user curl works fine and when user switches to hdfs curl fails with error. Is there any way to provide the username parameter in the curl negotiate, even though we are proving the users in the -u hdfs: it's not considering the curl user and authentication fails. curl -i --negotiate -u hdfs: http://www.rediffmail.com/cgi-bin/red.cgi?red=http%3A%2F%2F10.31.251.254%3A 50070%2Fwebhdfs%2Fv1%2F%3Fuser.name%3Dhdfs%26amp%3Bop%3DLISTSTATUS%22isImag e=0BlockImage=0rediffng=0rogue=7463cc5314a72bb6a967958fd283c6f87beafc96 http://10.31.251.254:50070/webhdfs/v1/?user.name=hdfsop=LISTSTATUS; HTTP/1.1 401 Date: Wed, 09 Jul 2014 13:19:56 GMT Pragma: no-cache Date: Wed, 09 Jul 2014 13:19:56 GMT Pragma: no-cache WWW-Authenticate: Negotiate Set-Cookie: hadoop.auth=;Path=/;Expires=Thu, 01-Jan-1970 00:00:00 GMT Content-Type: text/html;charset=ISO-8859-1 Cache-Control: must-revalidate,no-cache,no-store Content-Length: 1358 Server: Jetty(7.6.10.v20130312) HTTP/1.1 401 Unauthorized Date: Wed, 09 Jul 2014 13:19:56 GMT Pragma: no-cache Cache-Control: no-cache Date: Wed, 09 Jul 2014 13:19:56 GMT Pragma: no-cache Set-Cookie: hadoop.auth=u=gpadminp= javascript:void(0); gpad...@example.comt=kerberose=1404947996223s=KfBg3KDnhd5dxYvHMUYmDPqdEy4 =;Path=/ Expires: Thu, 01 Jan 1970 00:00:00 GMT Content-Type: application/json Transfer-Encoding: chunked Server: Jetty(7.6.10.v20130312) {RemoteException:{exception:SecurityException,javaClassName:java.la ng.SecurityException,message:Failed to obtain user group information: java.io.IOException: Usernames not matched: name=hdfs != expected=gpadmin}} Can anyone suggest how to make curl library to scan kerberos directory cache and load the proper principal for the particular user. Are there any options required in the webhdfs front for support multiple users with Kerberos. Regards Sathish Valluri smime.p7s Description: S/MIME cryptographic signature
[jira] [Commented] (HDFS-4265) BKJM doesn't take advantage of speculative reads
[ https://issues.apache.org/jira/browse/HDFS-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056290#comment-14056290 ] Hadoop QA commented on HDFS-4265: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12654804/003-HDFS-4265.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7302//console This message is automatically generated. BKJM doesn't take advantage of speculative reads Key: HDFS-4265 URL: https://issues.apache.org/jira/browse/HDFS-4265 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: 2.2.0 Reporter: Ivan Kelly Assignee: Rakesh R Attachments: 001-HDFS-4265.patch, 002-HDFS-4265.patch, 003-HDFS-4265.patch BookKeeperEditLogInputStream reads entry at a time, so it doesn't take advantage of the speculative read mechanism introduced by BOOKKEEPER-336. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-4266) BKJM: Separate write and ack quorum
[ https://issues.apache.org/jira/browse/HDFS-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-4266: --- Attachment: 002-HDFS-4266.patch BKJM: Separate write and ack quorum --- Key: HDFS-4266 URL: https://issues.apache.org/jira/browse/HDFS-4266 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Reporter: Ivan Kelly Assignee: Rakesh R Attachments: 001-HDFS-4266.patch, 002-HDFS-4266.patch BOOKKEEPER-208 allows the ack and write quorums to be different sizes to allow writes to be unaffected by any bookie failure. BKJM should be able to take advantage of this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-4266) BKJM: Separate write and ack quorum
[ https://issues.apache.org/jira/browse/HDFS-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-4266: --- Status: Patch Available (was: Open) BKJM: Separate write and ack quorum --- Key: HDFS-4266 URL: https://issues.apache.org/jira/browse/HDFS-4266 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Reporter: Ivan Kelly Assignee: Rakesh R Attachments: 001-HDFS-4266.patch, 002-HDFS-4266.patch BOOKKEEPER-208 allows the ack and write quorums to be different sizes to allow writes to be unaffected by any bookie failure. BKJM should be able to take advantage of this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-4266) BKJM: Separate write and ack quorum
[ https://issues.apache.org/jira/browse/HDFS-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056313#comment-14056313 ] Rakesh R commented on HDFS-4266: Thanks [~ikelly] for the review. bq.The patch makes an ack quorum mandatory. This breaks existing configs. The ack quorum should default to the write quorum, if the configuration is missing. Hdfs configuration will return the 'quorumSize' if the configuration is missing. {code} ackQuorumSize = conf.getInt(BKJM_BOOKKEEPER_ACK_QUORUM_SIZE, quorumSize); {code} I've included test case to verify this behavior. Could you have a look at the latest patch.Thanks! BKJM: Separate write and ack quorum --- Key: HDFS-4266 URL: https://issues.apache.org/jira/browse/HDFS-4266 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Reporter: Ivan Kelly Assignee: Rakesh R Attachments: 001-HDFS-4266.patch, 002-HDFS-4266.patch BOOKKEEPER-208 allows the ack and write quorums to be different sizes to allow writes to be unaffected by any bookie failure. BKJM should be able to take advantage of this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-4266) BKJM: Separate write and ack quorum
[ https://issues.apache.org/jira/browse/HDFS-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056326#comment-14056326 ] Hadoop QA commented on HDFS-4266: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12654811/002-HDFS-4266.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7303//console This message is automatically generated. BKJM: Separate write and ack quorum --- Key: HDFS-4266 URL: https://issues.apache.org/jira/browse/HDFS-4266 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Reporter: Ivan Kelly Assignee: Rakesh R Attachments: 001-HDFS-4266.patch, 002-HDFS-4266.patch BOOKKEEPER-208 allows the ack and write quorums to be different sizes to allow writes to be unaffected by any bookie failure. BKJM should be able to take advantage of this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6645) Add test for successive Snapshots between XAttr modifications
[ https://issues.apache.org/jira/browse/HDFS-6645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Chu updated HDFS-6645: -- Status: Patch Available (was: Open) Add test for successive Snapshots between XAttr modifications - Key: HDFS-6645 URL: https://issues.apache.org/jira/browse/HDFS-6645 Project: Hadoop HDFS Issue Type: Test Components: snapshots, test Affects Versions: 3.0.0, 2.6.0 Reporter: Stephen Chu Assignee: Stephen Chu Priority: Minor Attachments: HDFS-6645.001.patch In the current TestXAttrWithSnapshot unit tests, we create a single snapshot per test. We should test taking multiple snapshots on a path in between XAttr modifications of that path. We should also verify that deletion of a snapshot does not somehow alter the XAttrs of the other snapshots of the same path. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6645) Add test for successive Snapshots between XAttr modifications
[ https://issues.apache.org/jira/browse/HDFS-6645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Chu updated HDFS-6645: -- Status: Open (was: Patch Available) Thank you, [~ajisakaa] and [~jingzhao]! I am going to cancel and re-submit patch. The Hadoop QA jenkins job didn't run properly because of the svn upgrade issue. Add test for successive Snapshots between XAttr modifications - Key: HDFS-6645 URL: https://issues.apache.org/jira/browse/HDFS-6645 Project: Hadoop HDFS Issue Type: Test Components: snapshots, test Affects Versions: 3.0.0, 2.6.0 Reporter: Stephen Chu Assignee: Stephen Chu Priority: Minor Attachments: HDFS-6645.001.patch In the current TestXAttrWithSnapshot unit tests, we create a single snapshot per test. We should test taking multiple snapshots on a path in between XAttr modifications of that path. We should also verify that deletion of a snapshot does not somehow alter the XAttrs of the other snapshots of the same path. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6645) Add test for successive Snapshots between XAttr modifications
[ https://issues.apache.org/jira/browse/HDFS-6645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Chu updated HDFS-6645: -- Attachment: HDFS-6645.001.patch Reattach the same patch to trigger Hadoop QA jenkins job. Add test for successive Snapshots between XAttr modifications - Key: HDFS-6645 URL: https://issues.apache.org/jira/browse/HDFS-6645 Project: Hadoop HDFS Issue Type: Test Components: snapshots, test Affects Versions: 3.0.0, 2.6.0 Reporter: Stephen Chu Assignee: Stephen Chu Priority: Minor Attachments: HDFS-6645.001.patch, HDFS-6645.001.patch In the current TestXAttrWithSnapshot unit tests, we create a single snapshot per test. We should test taking multiple snapshots on a path in between XAttr modifications of that path. We should also verify that deletion of a snapshot does not somehow alter the XAttrs of the other snapshots of the same path. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6645) Add test for successive Snapshots between XAttr modifications
[ https://issues.apache.org/jira/browse/HDFS-6645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056362#comment-14056362 ] Hadoop QA commented on HDFS-6645: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12654816/HDFS-6645.001.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7304//console This message is automatically generated. Add test for successive Snapshots between XAttr modifications - Key: HDFS-6645 URL: https://issues.apache.org/jira/browse/HDFS-6645 Project: Hadoop HDFS Issue Type: Test Components: snapshots, test Affects Versions: 3.0.0, 2.6.0 Reporter: Stephen Chu Assignee: Stephen Chu Priority: Minor Attachments: HDFS-6645.001.patch, HDFS-6645.001.patch In the current TestXAttrWithSnapshot unit tests, we create a single snapshot per test. We should test taking multiple snapshots on a path in between XAttr modifications of that path. We should also verify that deletion of a snapshot does not somehow alter the XAttrs of the other snapshots of the same path. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6649) Documentation for setrep is wrong
Alexander Fahlke created HDFS-6649: -- Summary: Documentation for setrep is wrong Key: HDFS-6649 URL: https://issues.apache.org/jira/browse/HDFS-6649 Project: Hadoop HDFS Issue Type: Bug Components: documentation Affects Versions: 1.0.4 Reporter: Alexander Fahlke Priority: Trivial The documentation in: http://hadoop.apache.org/docs/r1.0.4/file_system_shell.html#setrep states that one must use the command as follows: - {{Usage: hdfs dfs -setrep [-R] path}} - {{Example: hdfs dfs -setrep -w 3 -R /user/hadoop/dir1}} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6649) Documentation for setrep is wrong
[ https://issues.apache.org/jira/browse/HDFS-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Fahlke updated HDFS-6649: --- Description: The documentation in: http://hadoop.apache.org/docs/r1.0.4/file_system_shell.html#setrep states that one must use the command as follows: - {{Usage: hdfs dfs -setrep [-R] path}} - {{Example: hdfs dfs -setrep -w 3 -R /user/hadoop/dir1}} Correct would be to state that setrep needs the replication factor and the replication factor needs to be right before the DFS path. Must look like this: - {{Usage: hdfs dfs -setrep [-R] [-w] rep path/file}} - {{Example: hdfs dfs -setrep -w -R 3 /user/hadoop/dir1}} was: The documentation in: http://hadoop.apache.org/docs/r1.0.4/file_system_shell.html#setrep states that one must use the command as follows: - {{Usage: hdfs dfs -setrep [-R] path}} - {{Example: hdfs dfs -setrep -w 3 -R /user/hadoop/dir1}} Documentation for setrep is wrong - Key: HDFS-6649 URL: https://issues.apache.org/jira/browse/HDFS-6649 Project: Hadoop HDFS Issue Type: Bug Components: documentation Affects Versions: 1.0.4 Reporter: Alexander Fahlke Priority: Trivial The documentation in: http://hadoop.apache.org/docs/r1.0.4/file_system_shell.html#setrep states that one must use the command as follows: - {{Usage: hdfs dfs -setrep [-R] path}} - {{Example: hdfs dfs -setrep -w 3 -R /user/hadoop/dir1}} Correct would be to state that setrep needs the replication factor and the replication factor needs to be right before the DFS path. Must look like this: - {{Usage: hdfs dfs -setrep [-R] [-w] rep path/file}} - {{Example: hdfs dfs -setrep -w -R 3 /user/hadoop/dir1}} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6645) Add test for successive Snapshots between XAttr modifications
[ https://issues.apache.org/jira/browse/HDFS-6645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056372#comment-14056372 ] Stephen Chu commented on HDFS-6645: --- I think the above is not a problem with the patch, but a Hadoop QA/jenkins issue that is hitting other PreCommit-HDFS-Builds. Will look into it. Add test for successive Snapshots between XAttr modifications - Key: HDFS-6645 URL: https://issues.apache.org/jira/browse/HDFS-6645 Project: Hadoop HDFS Issue Type: Test Components: snapshots, test Affects Versions: 3.0.0, 2.6.0 Reporter: Stephen Chu Assignee: Stephen Chu Priority: Minor Attachments: HDFS-6645.001.patch, HDFS-6645.001.patch In the current TestXAttrWithSnapshot unit tests, we create a single snapshot per test. We should test taking multiple snapshots on a path in between XAttr modifications of that path. We should also verify that deletion of a snapshot does not somehow alter the XAttrs of the other snapshots of the same path. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6622) Rename and AddBlock may race and produce invalid edits
[ https://issues.apache.org/jira/browse/HDFS-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-6622: - Attachment: HDFS-6622.v2.patch Rename and AddBlock may race and produce invalid edits -- Key: HDFS-6622 URL: https://issues.apache.org/jira/browse/HDFS-6622 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.0 Reporter: Kihwal Lee Priority: Blocker Attachments: HDFS-6622.patch, HDFS-6622.v2.patch While investigating HDFS-6618, we have discovered that rename happening in the middle of {{getAdditionalBlock()}} can lead to logging of invalid edit entry. In {{getAdditionalBlock()}} , the path is resolved once while holding the read lock and the same resolved path will be used in the edit log in the second half of the method holding the write lock. If a rename happens in between two locks, the path may no longer exist. When replaying the {{AddBlockOp}}, it will fail with FileNotFound. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HDFS-6622) Rename and AddBlock may race and produce invalid edits
[ https://issues.apache.org/jira/browse/HDFS-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee reassigned HDFS-6622: Assignee: Kihwal Lee Rename and AddBlock may race and produce invalid edits -- Key: HDFS-6622 URL: https://issues.apache.org/jira/browse/HDFS-6622 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.0 Reporter: Kihwal Lee Assignee: Kihwal Lee Priority: Blocker Attachments: HDFS-6622.patch, HDFS-6622.v2.patch While investigating HDFS-6618, we have discovered that rename happening in the middle of {{getAdditionalBlock()}} can lead to logging of invalid edit entry. In {{getAdditionalBlock()}} , the path is resolved once while holding the read lock and the same resolved path will be used in the edit log in the second half of the method holding the write lock. If a rename happens in between two locks, the path may no longer exist. When replaying the {{AddBlockOp}}, it will fail with FileNotFound. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6618) Remove deleted INodes from INodeMap right away
[ https://issues.apache.org/jira/browse/HDFS-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056451#comment-14056451 ] Kihwal Lee commented on HDFS-6618: -- Ok, I will take the simple approach. Remove deleted INodes from INodeMap right away -- Key: HDFS-6618 URL: https://issues.apache.org/jira/browse/HDFS-6618 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.0 Reporter: Kihwal Lee Assignee: Kihwal Lee Priority: Blocker Attachments: HDFS-6618.AbstractList.patch, HDFS-6618.inodeRemover.patch, HDFS-6618.inodeRemover.v2.patch, HDFS-6618.patch After HDFS-6527, we have not seen the edit log corruption for weeks on multiple clusters until yesterday. Previously, we would see it within 30 minutes on a cluster. But the same condition was reproduced even with HDFS-6527. The only explanation is that the RPC handler thread serving {{addBlock()}} was accessing stale parent value. Although nulling out parent is done inside the {{FSNamesystem}} and {{FSDirectory}} write lock, there is no memory barrier because there is no synchronized block involved in the process. I suggest making parent volatile. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-4265) BKJM doesn't take advantage of speculative reads
[ https://issues.apache.org/jira/browse/HDFS-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056454#comment-14056454 ] Rakesh R commented on HDFS-4265: Attached new patch addressing [~ikelly]'s comments. Please review the patch. Thanks! It looks like jenkins report is not proper. BKJM doesn't take advantage of speculative reads Key: HDFS-4265 URL: https://issues.apache.org/jira/browse/HDFS-4265 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: 2.2.0 Reporter: Ivan Kelly Assignee: Rakesh R Attachments: 001-HDFS-4265.patch, 002-HDFS-4265.patch, 003-HDFS-4265.patch BookKeeperEditLogInputStream reads entry at a time, so it doesn't take advantage of the speculative read mechanism introduced by BOOKKEEPER-336. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6622) Rename and AddBlock may race and produce invalid edits
[ https://issues.apache.org/jira/browse/HDFS-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056449#comment-14056449 ] Kihwal Lee commented on HDFS-6622: -- The reason I added the strict check was to prevent incorrect operations based on potentially incorrect result from getFullPathName(). If the inode's parent is not null (stale), but one of the ancestor's parent is null, it will assume that inode is directly under /. This could happen with the delayed inode removal. But since we are going to remove inodes from inodeMap while holding FSNamesystem write lock, this should not happen. So what you suggest will be sufficient. I also wanted to reduce the number of times getFullPathName() is called. I will simply remove the comparison and fix the test to check the correctness of edit. Rename and AddBlock may race and produce invalid edits -- Key: HDFS-6622 URL: https://issues.apache.org/jira/browse/HDFS-6622 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.0 Reporter: Kihwal Lee Priority: Blocker Attachments: HDFS-6622.patch, HDFS-6622.v2.patch While investigating HDFS-6618, we have discovered that rename happening in the middle of {{getAdditionalBlock()}} can lead to logging of invalid edit entry. In {{getAdditionalBlock()}} , the path is resolved once while holding the read lock and the same resolved path will be used in the edit log in the second half of the method holding the write lock. If a rename happens in between two locks, the path may no longer exist. When replaying the {{AddBlockOp}}, it will fail with FileNotFound. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6622) Rename and AddBlock may race and produce invalid edits
[ https://issues.apache.org/jira/browse/HDFS-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056469#comment-14056469 ] Hadoop QA commented on HDFS-6622: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12654832/HDFS-6622.v2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7305//console This message is automatically generated. Rename and AddBlock may race and produce invalid edits -- Key: HDFS-6622 URL: https://issues.apache.org/jira/browse/HDFS-6622 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.0 Reporter: Kihwal Lee Assignee: Kihwal Lee Priority: Blocker Attachments: HDFS-6622.patch, HDFS-6622.v2.patch While investigating HDFS-6618, we have discovered that rename happening in the middle of {{getAdditionalBlock()}} can lead to logging of invalid edit entry. In {{getAdditionalBlock()}} , the path is resolved once while holding the read lock and the same resolved path will be used in the edit log in the second half of the method holding the write lock. If a rename happens in between two locks, the path may no longer exist. When replaying the {{AddBlockOp}}, it will fail with FileNotFound. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6618) Remove deleted INodes from INodeMap right away
[ https://issues.apache.org/jira/browse/HDFS-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-6618: - Attachment: HDFS-6618.simpler.patch Remove deleted INodes from INodeMap right away -- Key: HDFS-6618 URL: https://issues.apache.org/jira/browse/HDFS-6618 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.0 Reporter: Kihwal Lee Assignee: Kihwal Lee Priority: Blocker Attachments: HDFS-6618.AbstractList.patch, HDFS-6618.inodeRemover.patch, HDFS-6618.inodeRemover.v2.patch, HDFS-6618.patch, HDFS-6618.simpler.patch After HDFS-6527, we have not seen the edit log corruption for weeks on multiple clusters until yesterday. Previously, we would see it within 30 minutes on a cluster. But the same condition was reproduced even with HDFS-6527. The only explanation is that the RPC handler thread serving {{addBlock()}} was accessing stale parent value. Although nulling out parent is done inside the {{FSNamesystem}} and {{FSDirectory}} write lock, there is no memory barrier because there is no synchronized block involved in the process. I suggest making parent volatile. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6634) inotify in HDFS
[ https://issues.apache.org/jira/browse/HDFS-6634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056474#comment-14056474 ] James Thomas commented on HDFS-6634: That would require this project to be a third-party service, much like the work presented in http://www.youtube.com/watch?v=7KumMKqBtr8 So I think the same concerns raised in the Q A (starting around 25:30) would apply here -- in particular, changes to the format of the audit log would cause problems for user applications. I think tighter integration with HDFS and exposure of the edits in ProtoBuf form is important. inotify in HDFS --- Key: HDFS-6634 URL: https://issues.apache.org/jira/browse/HDFS-6634 Project: Hadoop HDFS Issue Type: New Feature Components: hdfs-client, namenode, qjm Reporter: James Thomas Assignee: James Thomas Attachments: inotify-intro.pdf Design a mechanism for applications like search engines to access the HDFS edit stream. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6618) Remove deleted INodes from INodeMap right away
[ https://issues.apache.org/jira/browse/HDFS-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056477#comment-14056477 ] Kihwal Lee commented on HDFS-6618: -- With the new patch {{removedINodes}} is passed to {{FSNamesystem#removePathAndBlocks()}} while in the write lock. The method was modified to conditionally acquire the directory lock. I didn't move the removal to {{FSDirectory}}, since we may want to do something with the inodes in {{FSNamesystem}} later as a part of failure handling in a separate jira. Remove deleted INodes from INodeMap right away -- Key: HDFS-6618 URL: https://issues.apache.org/jira/browse/HDFS-6618 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.0 Reporter: Kihwal Lee Assignee: Kihwal Lee Priority: Blocker Attachments: HDFS-6618.AbstractList.patch, HDFS-6618.inodeRemover.patch, HDFS-6618.inodeRemover.v2.patch, HDFS-6618.patch, HDFS-6618.simpler.patch After HDFS-6527, we have not seen the edit log corruption for weeks on multiple clusters until yesterday. Previously, we would see it within 30 minutes on a cluster. But the same condition was reproduced even with HDFS-6527. The only explanation is that the RPC handler thread serving {{addBlock()}} was accessing stale parent value. Although nulling out parent is done inside the {{FSNamesystem}} and {{FSDirectory}} write lock, there is no memory barrier because there is no synchronized block involved in the process. I suggest making parent volatile. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6618) Remove deleted INodes from INodeMap right away
[ https://issues.apache.org/jira/browse/HDFS-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056481#comment-14056481 ] Jing Zhao commented on HDFS-6618: - Thanks [~kihwal]. The patch looks good to me. +1 pending Jenkins. Remove deleted INodes from INodeMap right away -- Key: HDFS-6618 URL: https://issues.apache.org/jira/browse/HDFS-6618 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.0 Reporter: Kihwal Lee Assignee: Kihwal Lee Priority: Blocker Attachments: HDFS-6618.AbstractList.patch, HDFS-6618.inodeRemover.patch, HDFS-6618.inodeRemover.v2.patch, HDFS-6618.patch, HDFS-6618.simpler.patch After HDFS-6527, we have not seen the edit log corruption for weeks on multiple clusters until yesterday. Previously, we would see it within 30 minutes on a cluster. But the same condition was reproduced even with HDFS-6527. The only explanation is that the RPC handler thread serving {{addBlock()}} was accessing stale parent value. Although nulling out parent is done inside the {{FSNamesystem}} and {{FSDirectory}} write lock, there is no memory barrier because there is no synchronized block involved in the process. I suggest making parent volatile. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6650) API to get the root of an encryption zone for a path
Andrew Wang created HDFS-6650: - Summary: API to get the root of an encryption zone for a path Key: HDFS-6650 URL: https://issues.apache.org/jira/browse/HDFS-6650 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134) Reporter: Andrew Wang Assignee: Andrew Wang It'd be useful to be able to query, given a path within an encryption zone, the root of the encryption zone. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6618) Remove deleted INodes from INodeMap right away
[ https://issues.apache.org/jira/browse/HDFS-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056486#comment-14056486 ] Hadoop QA commented on HDFS-6618: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12654834/HDFS-6618.simpler.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7306//console This message is automatically generated. Remove deleted INodes from INodeMap right away -- Key: HDFS-6618 URL: https://issues.apache.org/jira/browse/HDFS-6618 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.0 Reporter: Kihwal Lee Assignee: Kihwal Lee Priority: Blocker Attachments: HDFS-6618.AbstractList.patch, HDFS-6618.inodeRemover.patch, HDFS-6618.inodeRemover.v2.patch, HDFS-6618.patch, HDFS-6618.simpler.patch After HDFS-6527, we have not seen the edit log corruption for weeks on multiple clusters until yesterday. Previously, we would see it within 30 minutes on a cluster. But the same condition was reproduced even with HDFS-6527. The only explanation is that the RPC handler thread serving {{addBlock()}} was accessing stale parent value. Although nulling out parent is done inside the {{FSNamesystem}} and {{FSDirectory}} write lock, there is no memory barrier because there is no synchronized block involved in the process. I suggest making parent volatile. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6651) Deletion failure can leak inodes permanently.
Kihwal Lee created HDFS-6651: Summary: Deletion failure can leak inodes permanently. Key: HDFS-6651 URL: https://issues.apache.org/jira/browse/HDFS-6651 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Priority: Critical As discussed in HDFS-6618, if a deletion of tree fails in the middle, any collected inodes and blocks will not be removed from {{INodeMap}} and {{BlocksMap}}. Since fsimage is saved by iterating over {{INodeMap}}, the leak will persist across name node restart. Although blanked out inodes will not have reference to blocks, blocks will still refer to the inode as {{BlockCollection}}. As long as it is not null, blocks will live on. The leaked blocks from blanked out inodes will go away after restart. Options (when delete fails in the middle) - Complete the partial delete: edit log the partial delete and remove inodes and blocks. - Somehow undo the partial delete. - Check quota for snapshot diff beforehand for the whole subtree. - Ignore quota check during delete even if snapshot is present. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6618) Remove deleted INodes from INodeMap right away
[ https://issues.apache.org/jira/browse/HDFS-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056499#comment-14056499 ] Kihwal Lee commented on HDFS-6618: -- Filed HDFS-6651 for the leak problem. Remove deleted INodes from INodeMap right away -- Key: HDFS-6618 URL: https://issues.apache.org/jira/browse/HDFS-6618 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.0 Reporter: Kihwal Lee Assignee: Kihwal Lee Priority: Blocker Attachments: HDFS-6618.AbstractList.patch, HDFS-6618.inodeRemover.patch, HDFS-6618.inodeRemover.v2.patch, HDFS-6618.patch, HDFS-6618.simpler.patch After HDFS-6527, we have not seen the edit log corruption for weeks on multiple clusters until yesterday. Previously, we would see it within 30 minutes on a cluster. But the same condition was reproduced even with HDFS-6527. The only explanation is that the RPC handler thread serving {{addBlock()}} was accessing stale parent value. Although nulling out parent is done inside the {{FSNamesystem}} and {{FSDirectory}} write lock, there is no memory barrier because there is no synchronized block involved in the process. I suggest making parent volatile. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6618) Remove deleted INodes from INodeMap right away
[ https://issues.apache.org/jira/browse/HDFS-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056512#comment-14056512 ] Kihwal Lee commented on HDFS-6618: -- The build failed because of this. {panel} [exec] CMake Error at /usr/share/cmake-2.8/Modules/FindPackageHandleStandardArgs.cmake:108 (message): [exec] Could NOT find PkgConfig (missing: PKG_CONFIG_EXECUTABLE) [exec] Call Stack (most recent call first): [exec] /usr/share/cmake-2.8/Modules/FindPackageHandleStandardArgs.cmake:315 (_FPHSA_FAILURE_MESSAGE) [exec] /usr/share/cmake-2.8/Modules/FindPkgConfig.cmake:106 (find_package_handle_standard_args) [exec] main/native/fuse-dfs/CMakeLists.txt:23 (find_package) {panel} Remove deleted INodes from INodeMap right away -- Key: HDFS-6618 URL: https://issues.apache.org/jira/browse/HDFS-6618 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.0 Reporter: Kihwal Lee Assignee: Kihwal Lee Priority: Blocker Attachments: HDFS-6618.AbstractList.patch, HDFS-6618.inodeRemover.patch, HDFS-6618.inodeRemover.v2.patch, HDFS-6618.patch, HDFS-6618.simpler.patch After HDFS-6527, we have not seen the edit log corruption for weeks on multiple clusters until yesterday. Previously, we would see it within 30 minutes on a cluster. But the same condition was reproduced even with HDFS-6527. The only explanation is that the RPC handler thread serving {{addBlock()}} was accessing stale parent value. Although nulling out parent is done inside the {{FSNamesystem}} and {{FSDirectory}} write lock, there is no memory barrier because there is no synchronized block involved in the process. I suggest making parent volatile. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6647) Edit log corruption when pipeline recovery occurs for deleted file present in snapshot
[ https://issues.apache.org/jira/browse/HDFS-6647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056517#comment-14056517 ] Colin Patrick McCabe commented on HDFS-6647: The simplest thing is probably just to have {{updatePipeline}} throw an exception if the file doesn't exist (or exists only in snapshots). bq. It shouldn't be difficult to change the FSEditLogLoader to be able to read the OP_UPDATE_BLOCKS op if we just change it to look up the INode by block ID. We could do that when recovery mode is on. I don't think we want to do that normally since snapshotted blocks are not supposed to be mutable Edit log corruption when pipeline recovery occurs for deleted file present in snapshot -- Key: HDFS-6647 URL: https://issues.apache.org/jira/browse/HDFS-6647 Project: Hadoop HDFS Issue Type: Bug Components: namenode, snapshots Affects Versions: 2.4.1 Reporter: Aaron T. Myers Priority: Blocker Attachments: HDFS-6647-failing-test.patch I've encountered a situation wherein an OP_UPDATE_BLOCKS can appear in the edit log for a file after an OP_DELETE has previously been logged for that file. Such an edit log sequence cannot then be successfully read by the NameNode. More details in the first comment. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6647) Edit log corruption when pipeline recovery occurs for deleted file present in snapshot
[ https://issues.apache.org/jira/browse/HDFS-6647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056523#comment-14056523 ] Jing Zhao commented on HDFS-6647: - In HDFS-6527 we do not allow users to get an additional block if the file has been deleted (but can be in a snapshot). Maybe here we should also fail the {{updatePipeline}} call to make it consistent? But in the meanwhile, I think in the future it will be better to weaken the dependency between the states of blocks and files, e.g., letting RPC calls like {{updatePipeline}} only update and check the state of blocks. This can make work like separating block management out as a service (HDFS-5477) easier. Edit log corruption when pipeline recovery occurs for deleted file present in snapshot -- Key: HDFS-6647 URL: https://issues.apache.org/jira/browse/HDFS-6647 Project: Hadoop HDFS Issue Type: Bug Components: namenode, snapshots Affects Versions: 2.4.1 Reporter: Aaron T. Myers Priority: Blocker Attachments: HDFS-6647-failing-test.patch I've encountered a situation wherein an OP_UPDATE_BLOCKS can appear in the edit log for a file after an OP_DELETE has previously been logged for that file. Such an edit log sequence cannot then be successfully read by the NameNode. More details in the first comment. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HDFS-172) Quota exceed exception creates file of size 0
[ https://issues.apache.org/jira/browse/HDFS-172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze resolved HDFS-172. -- Resolution: Not a Problem Resolve this as not-a-problem. Please feel free to reopen if you disagree. Quota exceed exception creates file of size 0 - Key: HDFS-172 URL: https://issues.apache.org/jira/browse/HDFS-172 Project: Hadoop HDFS Issue Type: Bug Reporter: Ravi Phulari Empty file of size 0 is created when QuotaExceed exception occurs while copying a file. This file is created with the same name of which file copy is tried . I.E if operation Hadoop fs -copyFromLocal testFile1 /testDir Fails due to quota exceed exception then testFile1 of size 0 is created in testDir on HDFS. Steps to verify 1) Create testDir and apply space quota of 16kb 2) Copy file say testFile of size greater than 16kb from local file system 3) You should see QuotaException error 4) testFile of size 0 is created in testDir which is not expected . -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6622) Rename and AddBlock may race and produce invalid edits
[ https://issues.apache.org/jira/browse/HDFS-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056596#comment-14056596 ] Colin Patrick McCabe commented on HDFS-6622: bq. I also wanted to reduce the number of times getFullPathName() is called. I will simply remove the comparison and fix the test to check the correctness of edit. OK. From what I can see, the path should be recomputed while under the lock (rather than simply trusting that it will stay the same since we last released the lock). That should fix things. It looks like you introduced the FileState object in order to avoid calling getFullPathName() twice while holding the lock... fair enough. +1 Rename and AddBlock may race and produce invalid edits -- Key: HDFS-6622 URL: https://issues.apache.org/jira/browse/HDFS-6622 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.0 Reporter: Kihwal Lee Assignee: Kihwal Lee Priority: Blocker Attachments: HDFS-6622.patch, HDFS-6622.v2.patch While investigating HDFS-6618, we have discovered that rename happening in the middle of {{getAdditionalBlock()}} can lead to logging of invalid edit entry. In {{getAdditionalBlock()}} , the path is resolved once while holding the read lock and the same resolved path will be used in the edit log in the second half of the method holding the write lock. If a rename happens in between two locks, the path may no longer exist. When replaying the {{AddBlockOp}}, it will fail with FileNotFound. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6621) Hadoop Balancer prematurely exits iterations
[ https://issues.apache.org/jira/browse/HDFS-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Bowman updated HDFS-6621: -- Status: Patch Available (was: Open) Hadoop Balancer prematurely exits iterations Key: HDFS-6621 URL: https://issues.apache.org/jira/browse/HDFS-6621 Project: Hadoop HDFS Issue Type: Bug Components: balancer Affects Versions: 2.4.0, 2.2.0 Environment: Red Hat Enterprise Linux Server release 5.8 with Hadoop 2.4.0 Reporter: Benjamin Bowman Labels: balancer Attachments: HDFS-6621.patch I have been having an issue with the balancing being too slow. The issue was not with the speed with which blocks were moved, but rather the balancer would prematurely exit out of it's balancing iterations. It would move ~10 blocks or 100 MB then exit the current iteration (in which it said it was planning on moving about 10 GB). I looked in the Balancer.java code and believe I found and solved the issue. In the dispatchBlocks() function there is a variable, noPendingBlockIteration, which counts the number of iterations in which a pending block to move cannot be found. Once this number gets to 5, the balancer exits the overall balancing iteration. I believe the desired functionality is 5 consecutive no pending block iterations - however this variable is never reset to 0 upon block moves. So once this number reaches 5 - even if there have been thousands of blocks moved in between these no pending block iterations - the overall balancing iteration will prematurely end. The fix I applied was to set noPendingBlockIteration = 0 when a pending block is found and scheduled. In this way, my iterations do not prematurely exit unless there is 5 consecutive no pending block iterations. Below is a copy of my dispatchBlocks() function with the change I made. private void dispatchBlocks() { long startTime = Time.now(); long scheduledSize = getScheduledSize(); this.blocksToReceive = 2*scheduledSize; boolean isTimeUp = false; int noPendingBlockIteration = 0; while(!isTimeUp getScheduledSize()0 (!srcBlockList.isEmpty() || blocksToReceive0)) { PendingBlockMove pendingBlock = chooseNextBlockToMove(); if (pendingBlock != null) { noPendingBlockIteration = 0; // move the block pendingBlock.scheduleBlockMove(); continue; } /* Since we can not schedule any block to move, * filter any moved blocks from the source block list and * check if we should fetch more blocks from the namenode */ filterMovedBlocks(); // filter already moved blocks if (shouldFetchMoreBlocks()) { // fetch new blocks try { blocksToReceive -= getBlockList(); continue; } catch (IOException e) { LOG.warn(Exception while getting block list, e); return; } } else { // source node cannot find a pendingBlockToMove, iteration +1 noPendingBlockIteration++; // in case no blocks can be moved for source node's task, // jump out of while-loop after 5 iterations. if (noPendingBlockIteration = MAX_NO_PENDING_BLOCK_ITERATIONS) { setScheduledSize(0); } } // check if time is up or not if (Time.now()-startTime MAX_ITERATION_TIME) { isTimeUp = true; continue; } /* Now we can not schedule any block to move and there are * no new blocks added to the source block list, so we wait. */ try { synchronized(Balancer.this) { Balancer.this.wait(1000); // wait for targets/sources to be idle } } catch (InterruptedException ignored) { } } } } -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6618) Remove deleted INodes from INodeMap right away
[ https://issues.apache.org/jira/browse/HDFS-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056608#comment-14056608 ] Colin Patrick McCabe commented on HDFS-6618: It looks like someone is updating the build slaves, and somehow pkg-config got uninstalled? I will kick the build again. Remove deleted INodes from INodeMap right away -- Key: HDFS-6618 URL: https://issues.apache.org/jira/browse/HDFS-6618 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.0 Reporter: Kihwal Lee Assignee: Kihwal Lee Priority: Blocker Attachments: HDFS-6618.AbstractList.patch, HDFS-6618.inodeRemover.patch, HDFS-6618.inodeRemover.v2.patch, HDFS-6618.patch, HDFS-6618.simpler.patch After HDFS-6527, we have not seen the edit log corruption for weeks on multiple clusters until yesterday. Previously, we would see it within 30 minutes on a cluster. But the same condition was reproduced even with HDFS-6527. The only explanation is that the RPC handler thread serving {{addBlock()}} was accessing stale parent value. Although nulling out parent is done inside the {{FSNamesystem}} and {{FSDirectory}} write lock, there is no memory barrier because there is no synchronized block involved in the process. I suggest making parent volatile. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6618) Remove deleted INodes from INodeMap right away
[ https://issues.apache.org/jira/browse/HDFS-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056610#comment-14056610 ] Colin Patrick McCabe commented on HDFS-6618: +1 for the patch. Just one small note... I'd prefer to see lock... unlock blocks around removePathAndBlocks when appropriate, rather than a boolean lock me passed in, but we can address that in the refactoring, I guess Remove deleted INodes from INodeMap right away -- Key: HDFS-6618 URL: https://issues.apache.org/jira/browse/HDFS-6618 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.0 Reporter: Kihwal Lee Assignee: Kihwal Lee Priority: Blocker Attachments: HDFS-6618.AbstractList.patch, HDFS-6618.inodeRemover.patch, HDFS-6618.inodeRemover.v2.patch, HDFS-6618.patch, HDFS-6618.simpler.patch After HDFS-6527, we have not seen the edit log corruption for weeks on multiple clusters until yesterday. Previously, we would see it within 30 minutes on a cluster. But the same condition was reproduced even with HDFS-6527. The only explanation is that the RPC handler thread serving {{addBlock()}} was accessing stale parent value. Although nulling out parent is done inside the {{FSNamesystem}} and {{FSDirectory}} write lock, there is no memory barrier because there is no synchronized block involved in the process. I suggest making parent volatile. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6469) Coordinated replication of the namespace using ConsensusNode
[ https://issues.apache.org/jira/browse/HDFS-6469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056634#comment-14056634 ] Konstantin Shvachko commented on HDFS-6469: --- You are right, on the client hflush() calls NN's fsync() once in the beginning of each block, because it does not update block's length. Thank you for the correction, Nicholas. Coordinated replication of the namespace using ConsensusNode Key: HDFS-6469 URL: https://issues.apache.org/jira/browse/HDFS-6469 Project: Hadoop HDFS Issue Type: New Feature Components: namenode Affects Versions: 3.0.0 Reporter: Konstantin Shvachko Assignee: Konstantin Shvachko Attachments: CNodeDesign.pdf This is a proposal to introduce ConsensusNode - an evolution of the NameNode, which enables replication of the namespace on multiple nodes of an HDFS cluster by means of a Coordination Engine. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6634) inotify in HDFS
[ https://issues.apache.org/jira/browse/HDFS-6634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Thomas updated HDFS-6634: --- Attachment: inotify-intro.2.pdf Updated the design doc in response to Andrew's comments. I think we can start by exposing the entire edits stream to just superusers. inotify in HDFS --- Key: HDFS-6634 URL: https://issues.apache.org/jira/browse/HDFS-6634 Project: Hadoop HDFS Issue Type: New Feature Components: hdfs-client, namenode, qjm Reporter: James Thomas Assignee: James Thomas Attachments: inotify-intro.2.pdf, inotify-intro.pdf Design a mechanism for applications like search engines to access the HDFS edit stream. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6422) getfattr in CLI doesn't throw exception or return non-0 return code when xattr doesn't exist
[ https://issues.apache.org/jira/browse/HDFS-6422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056664#comment-14056664 ] Uma Maheswara Rao G commented on HDFS-6422: --- Thanks for summarizing pending things. {quote} that we should be validating xattr name access based on scan permission on the inode's owning directory and xattr value access based on the inode's permission? {quote} For writing Xattrs, the current permissions covered. From your comment in list Xattrs, we may need owner's check as well I think. {code} /* To access xattr names, you need EXECUTE in the owning directory. */ checkParentAccess(pc, src, FsAction.EXECUTE); {code} current check validates only execute permissions on parent dir. But it is not caring whether you are owner for the current directory or not. What do you say? For getXattrs, it will actually get the values there and it has pathaccess check on inode. So, this should be fine. SetXattrs, RemoveXattrs treated as writing xattrs and covered the permission checks appropriately as documented for namespace categories. getfattr in CLI doesn't throw exception or return non-0 return code when xattr doesn't exist Key: HDFS-6422 URL: https://issues.apache.org/jira/browse/HDFS-6422 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 2.5.0 Reporter: Charles Lamb Assignee: Charles Lamb Attachments: HDFS-6422.1.patch, HDFS-6422.2.patch, HDFS-6422.3.patch If you do hdfs dfs -getfattr -n user.blah /foo and user.blah doesn't exist, the command prints # file: /foo and a 0 return code. It should print an exception and return a non-0 return code instead. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6634) inotify in HDFS
[ https://issues.apache.org/jira/browse/HDFS-6634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056708#comment-14056708 ] Steve Loughran commented on HDFS-6634: -- I saw that talk, though it [was this one|https://www.youtube.com/watch?v=XZWwwc-qeJoindex=35list=PLSAiKuajRe2kIxG-WKmTOZNlDpSgYwEBs] that I was thinking of. inotify in HDFS --- Key: HDFS-6634 URL: https://issues.apache.org/jira/browse/HDFS-6634 Project: Hadoop HDFS Issue Type: New Feature Components: hdfs-client, namenode, qjm Reporter: James Thomas Assignee: James Thomas Attachments: inotify-intro.2.pdf, inotify-intro.pdf Design a mechanism for applications like search engines to access the HDFS edit stream. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HDFS-6650) API to get the root of an encryption zone for a path
[ https://issues.apache.org/jira/browse/HDFS-6650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Lamb resolved HDFS-6650. Resolution: Duplicate Duplicate of HDFS-6546. API to get the root of an encryption zone for a path Key: HDFS-6650 URL: https://issues.apache.org/jira/browse/HDFS-6650 Project: Hadoop HDFS Issue Type: Sub-task Components: security Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134) Reporter: Andrew Wang Assignee: Andrew Wang It'd be useful to be able to query, given a path within an encryption zone, the root of the encryption zone. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-5202) umbrella JIRA for Windows support in HDFS caching
[ https://issues.apache.org/jira/browse/HDFS-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-5202: Component/s: datanode Target Version/s: 3.0.0, 2.6.0 (was: HDFS-4949) Affects Version/s: 2.5.0 3.0.0 Assignee: Chris Nauroth umbrella JIRA for Windows support in HDFS caching - Key: HDFS-5202 URL: https://issues.apache.org/jira/browse/HDFS-5202 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 3.0.0, 2.5.0 Reporter: Colin Patrick McCabe Assignee: Chris Nauroth This is an umbrella JIRA for adding Windows support for HDFS caching. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-5202) Support Centralized Cache Management on Windows.
[ https://issues.apache.org/jira/browse/HDFS-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-5202: Description: HDFS caching currently is implemented using POSIX syscalls for checking ulimit and locking pages of memory into the process's address space. These POSIX syscalls do not exist on Windows. This issue will implement equivalent functionality so that Windows deployments can use Centralized Cache Management. (was: This is an umbrella JIRA for adding Windows support for HDFS caching.) Summary: Support Centralized Cache Management on Windows. (was: umbrella JIRA for Windows support in HDFS caching) I've changed the summary and description to remove the word umbrella. This patch is actually going to be quite small, and the word umbrella just seemed ominous. :-) Support Centralized Cache Management on Windows. Key: HDFS-5202 URL: https://issues.apache.org/jira/browse/HDFS-5202 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 3.0.0, 2.5.0 Reporter: Colin Patrick McCabe Assignee: Chris Nauroth HDFS caching currently is implemented using POSIX syscalls for checking ulimit and locking pages of memory into the process's address space. These POSIX syscalls do not exist on Windows. This issue will implement equivalent functionality so that Windows deployments can use Centralized Cache Management. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6422) getfattr in CLI doesn't throw exception or return non-0 return code when xattr doesn't exist
[ https://issues.apache.org/jira/browse/HDFS-6422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056772#comment-14056772 ] Charles Lamb commented on HDFS-6422: Thanks [~umamaheswararao]. So I'll add code to throw exceptions as previously specified and checking the owner for listXAttrs. getfattr in CLI doesn't throw exception or return non-0 return code when xattr doesn't exist Key: HDFS-6422 URL: https://issues.apache.org/jira/browse/HDFS-6422 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 2.5.0 Reporter: Charles Lamb Assignee: Charles Lamb Attachments: HDFS-6422.1.patch, HDFS-6422.2.patch, HDFS-6422.3.patch If you do hdfs dfs -getfattr -n user.blah /foo and user.blah doesn't exist, the command prints # file: /foo and a 0 return code. It should print an exception and return a non-0 return code instead. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6647) Edit log corruption when pipeline recovery occurs for deleted file present in snapshot
[ https://issues.apache.org/jira/browse/HDFS-6647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056773#comment-14056773 ] Kihwal Lee commented on HDFS-6647: -- It is already checking if the file is deleted. It's just that the check is incomplete. Edit log corruption when pipeline recovery occurs for deleted file present in snapshot -- Key: HDFS-6647 URL: https://issues.apache.org/jira/browse/HDFS-6647 Project: Hadoop HDFS Issue Type: Bug Components: namenode, snapshots Affects Versions: 2.4.1 Reporter: Aaron T. Myers Priority: Blocker Attachments: HDFS-6647-failing-test.patch I've encountered a situation wherein an OP_UPDATE_BLOCKS can appear in the edit log for a file after an OP_DELETE has previously been logged for that file. Such an edit log sequence cannot then be successfully read by the NameNode. More details in the first comment. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6621) Hadoop Balancer prematurely exits iterations
[ https://issues.apache.org/jira/browse/HDFS-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056774#comment-14056774 ] Hadoop QA commented on HDFS-6621: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12653835/HDFS-6621.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7307//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7307//console This message is automatically generated. Hadoop Balancer prematurely exits iterations Key: HDFS-6621 URL: https://issues.apache.org/jira/browse/HDFS-6621 Project: Hadoop HDFS Issue Type: Bug Components: balancer Affects Versions: 2.2.0, 2.4.0 Environment: Red Hat Enterprise Linux Server release 5.8 with Hadoop 2.4.0 Reporter: Benjamin Bowman Labels: balancer Attachments: HDFS-6621.patch I have been having an issue with the balancing being too slow. The issue was not with the speed with which blocks were moved, but rather the balancer would prematurely exit out of it's balancing iterations. It would move ~10 blocks or 100 MB then exit the current iteration (in which it said it was planning on moving about 10 GB). I looked in the Balancer.java code and believe I found and solved the issue. In the dispatchBlocks() function there is a variable, noPendingBlockIteration, which counts the number of iterations in which a pending block to move cannot be found. Once this number gets to 5, the balancer exits the overall balancing iteration. I believe the desired functionality is 5 consecutive no pending block iterations - however this variable is never reset to 0 upon block moves. So once this number reaches 5 - even if there have been thousands of blocks moved in between these no pending block iterations - the overall balancing iteration will prematurely end. The fix I applied was to set noPendingBlockIteration = 0 when a pending block is found and scheduled. In this way, my iterations do not prematurely exit unless there is 5 consecutive no pending block iterations. Below is a copy of my dispatchBlocks() function with the change I made. private void dispatchBlocks() { long startTime = Time.now(); long scheduledSize = getScheduledSize(); this.blocksToReceive = 2*scheduledSize; boolean isTimeUp = false; int noPendingBlockIteration = 0; while(!isTimeUp getScheduledSize()0 (!srcBlockList.isEmpty() || blocksToReceive0)) { PendingBlockMove pendingBlock = chooseNextBlockToMove(); if (pendingBlock != null) { noPendingBlockIteration = 0; // move the block pendingBlock.scheduleBlockMove(); continue; } /* Since we can not schedule any block to move, * filter any moved blocks from the source block list and * check if we should fetch more blocks from the namenode */ filterMovedBlocks(); // filter already moved blocks if (shouldFetchMoreBlocks()) { // fetch new blocks try { blocksToReceive -= getBlockList(); continue; } catch (IOException e) { LOG.warn(Exception while getting block list, e); return; } } else { // source node cannot find a pendingBlockToMove, iteration +1 noPendingBlockIteration++; // in case no blocks can be moved for source node's task, // jump out of while-loop after 5 iterations. if (noPendingBlockIteration =
[jira] [Updated] (HDFS-6647) Edit log corruption when pipeline recovery occurs for deleted file present in snapshot
[ https://issues.apache.org/jira/browse/HDFS-6647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-6647: - Attachment: HDFS-6647.patch The patch adds {{isFileDeleted()}} method. This depends to HDFS-6618. I also made checkLease() call this method. The Aaron's test case has been slightly modified. Edit log corruption when pipeline recovery occurs for deleted file present in snapshot -- Key: HDFS-6647 URL: https://issues.apache.org/jira/browse/HDFS-6647 Project: Hadoop HDFS Issue Type: Bug Components: namenode, snapshots Affects Versions: 2.4.1 Reporter: Aaron T. Myers Priority: Blocker Attachments: HDFS-6647-failing-test.patch, HDFS-6647.patch I've encountered a situation wherein an OP_UPDATE_BLOCKS can appear in the edit log for a file after an OP_DELETE has previously been logged for that file. Such an edit log sequence cannot then be successfully read by the NameNode. More details in the first comment. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6647) Edit log corruption when pipeline recovery occurs for deleted file present in snapshot
[ https://issues.apache.org/jira/browse/HDFS-6647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056783#comment-14056783 ] Kihwal Lee commented on HDFS-6647: -- Marking HDFS-6618 as a dependency. I won't submit the patch until it is committed. Edit log corruption when pipeline recovery occurs for deleted file present in snapshot -- Key: HDFS-6647 URL: https://issues.apache.org/jira/browse/HDFS-6647 Project: Hadoop HDFS Issue Type: Bug Components: namenode, snapshots Affects Versions: 2.4.1 Reporter: Aaron T. Myers Priority: Blocker Attachments: HDFS-6647-failing-test.patch, HDFS-6647.patch I've encountered a situation wherein an OP_UPDATE_BLOCKS can appear in the edit log for a file after an OP_DELETE has previously been logged for that file. Such an edit log sequence cannot then be successfully read by the NameNode. More details in the first comment. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-5202) Support Centralized Cache Management on Windows.
[ https://issues.apache.org/jira/browse/HDFS-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-5202: Attachment: HDFS-5202.1.patch The attached patch gets DataNode caching working on Windows. This mixes changes in Common and HDFS. I can spin off a separate HADOOP jira for the Common changes after this gets reviewed and approved. This is actually pretty simple stuff. We just need to swap out the POSIX syscalls for some Windows specifics. The relevant Windows syscalls are: * [VirtualLock|http://msdn.microsoft.com/en-us/library/windows/desktop/aa366895(v=vs.85).aspx] * [GetCurrentProcess|http://msdn.microsoft.com/en-us/library/windows/desktop/ms683179(v=vs.85).aspx] * [GetProcessWorkingSetSize|http://msdn.microsoft.com/en-us/library/windows/desktop/ms683226(v=vs.85).aspx] * [SetProcessWorkingSetSizeEx|http://msdn.microsoft.com/en-us/library/windows/desktop/ms686237(v=vs.85).aspx] Summary of changes: # {{NativeIO}}: I added {{extendWorkingSetSize}}, which is a new Windows-only JNI method that extends the minimum and maximum working set size of a Windows process. Ultimately, this is what governs how much memory a Windows process is allowed to lock. Full details are in the MSDN links above. I also implemented {{mlock_1native}} to call {{VirtualLock}} on Windows. # {{hdfs.cmd}}: I added cacheadmin to the supported commands on Windows. # {{DataNode}}: Windows does not have a direct equivalent of {{ulimit -l}}. Instead of looking for a ulimit and enforcing that our configuration doesn't exceed it, we attempt to extend the working set size when running on Windows. # {{CentralizedCacheManagement.apt.vm}}: I updated the documentation with a few clarifications about how it works on Windows. # {{TestFsDatasetCache}}: We no longer need to skip this test suite on Windows. The tests had a few file descriptor leaks that caused test failures on Windows, so I fixed that. In addition to running the JUnit tests, I ran manual tests. I used Systeinternals VMMap to confirm that the block files were getting memory-mapped and locked into the virtual address space of the DataNode JVM process. http://technet.microsoft.com/en-us/sysinternals/dd535533.aspx Support Centralized Cache Management on Windows. Key: HDFS-5202 URL: https://issues.apache.org/jira/browse/HDFS-5202 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 3.0.0, 2.5.0 Reporter: Colin Patrick McCabe Assignee: Chris Nauroth Attachments: HDFS-5202.1.patch HDFS caching currently is implemented using POSIX syscalls for checking ulimit and locking pages of memory into the process's address space. These POSIX syscalls do not exist on Windows. This issue will implement equivalent functionality so that Windows deployments can use Centralized Cache Management. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-5202) Support Centralized Cache Management on Windows.
[ https://issues.apache.org/jira/browse/HDFS-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-5202: Status: Patch Available (was: Open) Support Centralized Cache Management on Windows. Key: HDFS-5202 URL: https://issues.apache.org/jira/browse/HDFS-5202 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 3.0.0, 2.5.0 Reporter: Colin Patrick McCabe Assignee: Chris Nauroth Attachments: HDFS-5202.1.patch HDFS caching currently is implemented using POSIX syscalls for checking ulimit and locking pages of memory into the process's address space. These POSIX syscalls do not exist on Windows. This issue will implement equivalent functionality so that Windows deployments can use Centralized Cache Management. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6647) Edit log corruption when pipeline recovery occurs for deleted file present in snapshot
[ https://issues.apache.org/jira/browse/HDFS-6647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056791#comment-14056791 ] Aaron T. Myers commented on HDFS-6647: -- Thanks for all the comments, y'all. I agree with what everyone has said here. While we're on the subject, does it not seem strange to anyone that we allow the INode to still be considered under construction in a snapshot after it's been deleted from the present FS? I'm thinking that perhaps in addition to this change that Kihwal has in this patch we should make delete finalize the INode as well. I think that would've prevented this issue as well, since the current check in {{checkUCBlock}} would have failed. We could of course do that as a separate JIRA, or perhaps not at all if we think this is sufficient as-is. The patch that Kihwal provided looks good to me. One small comment is that it'd be good to use {{GenericTestUtils#assertExceptionContains}} in the test case to ensure the correct exception is thrown, but that's pretty minor. +1 once that's addressed, either by changing the patch or by telling me I'm being too pedantic. Kihwal - can I go ahead and assign this JIRA to you? Edit log corruption when pipeline recovery occurs for deleted file present in snapshot -- Key: HDFS-6647 URL: https://issues.apache.org/jira/browse/HDFS-6647 Project: Hadoop HDFS Issue Type: Bug Components: namenode, snapshots Affects Versions: 2.4.1 Reporter: Aaron T. Myers Priority: Blocker Attachments: HDFS-6647-failing-test.patch, HDFS-6647.patch I've encountered a situation wherein an OP_UPDATE_BLOCKS can appear in the edit log for a file after an OP_DELETE has previously been logged for that file. Such an edit log sequence cannot then be successfully read by the NameNode. More details in the first comment. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6618) Remove deleted INodes from INodeMap right away
[ https://issues.apache.org/jira/browse/HDFS-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056793#comment-14056793 ] Hadoop QA commented on HDFS-6618: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12654834/HDFS-6618.simpler.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7309//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7309//console This message is automatically generated. Remove deleted INodes from INodeMap right away -- Key: HDFS-6618 URL: https://issues.apache.org/jira/browse/HDFS-6618 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.0 Reporter: Kihwal Lee Assignee: Kihwal Lee Priority: Blocker Attachments: HDFS-6618.AbstractList.patch, HDFS-6618.inodeRemover.patch, HDFS-6618.inodeRemover.v2.patch, HDFS-6618.patch, HDFS-6618.simpler.patch After HDFS-6527, we have not seen the edit log corruption for weeks on multiple clusters until yesterday. Previously, we would see it within 30 minutes on a cluster. But the same condition was reproduced even with HDFS-6527. The only explanation is that the RPC handler thread serving {{addBlock()}} was accessing stale parent value. Although nulling out parent is done inside the {{FSNamesystem}} and {{FSDirectory}} write lock, there is no memory barrier because there is no synchronized block involved in the process. I suggest making parent volatile. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6647) Edit log corruption when pipeline recovery occurs for deleted file present in snapshot
[ https://issues.apache.org/jira/browse/HDFS-6647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056803#comment-14056803 ] Kihwal Lee commented on HDFS-6647: -- bq. does it not seem strange to anyone that we allow the INode to still be considered under construction in a snapshot after it's been deleted from the present FS. It does. But I think closing the file in this case is a bit complicated. I can think of many corner cases. The snapshot experts should chime in. I will address the review comment. Edit log corruption when pipeline recovery occurs for deleted file present in snapshot -- Key: HDFS-6647 URL: https://issues.apache.org/jira/browse/HDFS-6647 Project: Hadoop HDFS Issue Type: Bug Components: namenode, snapshots Affects Versions: 2.4.1 Reporter: Aaron T. Myers Assignee: Kihwal Lee Priority: Blocker Attachments: HDFS-6647-failing-test.patch, HDFS-6647.patch I've encountered a situation wherein an OP_UPDATE_BLOCKS can appear in the edit log for a file after an OP_DELETE has previously been logged for that file. Such an edit log sequence cannot then be successfully read by the NameNode. More details in the first comment. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6647) Edit log corruption when pipeline recovery occurs for deleted file present in snapshot
[ https://issues.apache.org/jira/browse/HDFS-6647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056798#comment-14056798 ] Aaron T. Myers commented on HDFS-6647: -- Oh, sorry, Kihwal - shouldn't this code also be checking to ensure that the file is under construction as well? {code} -if (file == null || !file.isUnderConstruction()) { +if (file == null || isFileDeleted(file)) { {code} i.e. I think it should be: {code} if (file == null || !file.isUnderConstruction() || isFileDeleted(file)) { {code} Edit log corruption when pipeline recovery occurs for deleted file present in snapshot -- Key: HDFS-6647 URL: https://issues.apache.org/jira/browse/HDFS-6647 Project: Hadoop HDFS Issue Type: Bug Components: namenode, snapshots Affects Versions: 2.4.1 Reporter: Aaron T. Myers Assignee: Kihwal Lee Priority: Blocker Attachments: HDFS-6647-failing-test.patch, HDFS-6647.patch I've encountered a situation wherein an OP_UPDATE_BLOCKS can appear in the edit log for a file after an OP_DELETE has previously been logged for that file. Such an edit log sequence cannot then be successfully read by the NameNode. More details in the first comment. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6647) Edit log corruption when pipeline recovery occurs for deleted file present in snapshot
[ https://issues.apache.org/jira/browse/HDFS-6647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056809#comment-14056809 ] Aaron T. Myers commented on HDFS-6647: -- bq. It does. But I think closing the file in this case is a bit complicated. I can think of many corner cases. The snapshot experts should chime in. Yea, I figured that'd be more complex. Totally fine to punt on that for now. Thanks, Kihwal. Edit log corruption when pipeline recovery occurs for deleted file present in snapshot -- Key: HDFS-6647 URL: https://issues.apache.org/jira/browse/HDFS-6647 Project: Hadoop HDFS Issue Type: Bug Components: namenode, snapshots Affects Versions: 2.4.1 Reporter: Aaron T. Myers Assignee: Kihwal Lee Priority: Blocker Attachments: HDFS-6647-failing-test.patch, HDFS-6647.patch I've encountered a situation wherein an OP_UPDATE_BLOCKS can appear in the edit log for a file after an OP_DELETE has previously been logged for that file. Such an edit log sequence cannot then be successfully read by the NameNode. More details in the first comment. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6647) Edit log corruption when pipeline recovery occurs for deleted file present in snapshot
[ https://issues.apache.org/jira/browse/HDFS-6647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056815#comment-14056815 ] Kihwal Lee commented on HDFS-6647: -- bq. Oh, sorry, Kihwal - shouldn't this code also be checking to ensure that the file is under construction as well? Yes, you have passed the test, Aaron. :) :) I will get it fixed. Edit log corruption when pipeline recovery occurs for deleted file present in snapshot -- Key: HDFS-6647 URL: https://issues.apache.org/jira/browse/HDFS-6647 Project: Hadoop HDFS Issue Type: Bug Components: namenode, snapshots Affects Versions: 2.4.1 Reporter: Aaron T. Myers Assignee: Kihwal Lee Priority: Blocker Attachments: HDFS-6647-failing-test.patch, HDFS-6647.patch I've encountered a situation wherein an OP_UPDATE_BLOCKS can appear in the edit log for a file after an OP_DELETE has previously been logged for that file. Such an edit log sequence cannot then be successfully read by the NameNode. More details in the first comment. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6647) Edit log corruption when pipeline recovery occurs for deleted file present in snapshot
[ https://issues.apache.org/jira/browse/HDFS-6647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056820#comment-14056820 ] Aaron T. Myers commented on HDFS-6647: -- You're the man, Kihwal. :) Edit log corruption when pipeline recovery occurs for deleted file present in snapshot -- Key: HDFS-6647 URL: https://issues.apache.org/jira/browse/HDFS-6647 Project: Hadoop HDFS Issue Type: Bug Components: namenode, snapshots Affects Versions: 2.4.1 Reporter: Aaron T. Myers Assignee: Kihwal Lee Priority: Blocker Attachments: HDFS-6647-failing-test.patch, HDFS-6647.patch I've encountered a situation wherein an OP_UPDATE_BLOCKS can appear in the edit log for a file after an OP_DELETE has previously been logged for that file. Such an edit log sequence cannot then be successfully read by the NameNode. More details in the first comment. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6643) Refactor INodeFile.HeaderFormat and INodeWithAdditionalFields.PermissionStatusFormat
[ https://issues.apache.org/jira/browse/HDFS-6643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056821#comment-14056821 ] Hadoop QA commented on HDFS-6643: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12654706/h6643_20140708b.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7310//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7310//console This message is automatically generated. Refactor INodeFile.HeaderFormat and INodeWithAdditionalFields.PermissionStatusFormat Key: HDFS-6643 URL: https://issues.apache.org/jira/browse/HDFS-6643 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Attachments: h6643_20140708.patch, h6643_20140708b.patch The use of them are very similar. We should change INodeFile.HeaderFormat to enum and refactor them for code reuse. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6647) Edit log corruption when pipeline recovery occurs for deleted file present in snapshot
[ https://issues.apache.org/jira/browse/HDFS-6647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-6647: - Attachment: HDFS-6647.v2.patch Edit log corruption when pipeline recovery occurs for deleted file present in snapshot -- Key: HDFS-6647 URL: https://issues.apache.org/jira/browse/HDFS-6647 Project: Hadoop HDFS Issue Type: Bug Components: namenode, snapshots Affects Versions: 2.4.1 Reporter: Aaron T. Myers Assignee: Kihwal Lee Priority: Blocker Attachments: HDFS-6647-failing-test.patch, HDFS-6647.patch, HDFS-6647.v2.patch I've encountered a situation wherein an OP_UPDATE_BLOCKS can appear in the edit log for a file after an OP_DELETE has previously been logged for that file. Such an edit log sequence cannot then be successfully read by the NameNode. More details in the first comment. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6422) getfattr in CLI doesn't throw exception or return non-0 return code when xattr doesn't exist
[ https://issues.apache.org/jira/browse/HDFS-6422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056872#comment-14056872 ] Charles Lamb commented on HDFS-6422: [~umamaheswararao] Actually, I don't think checking the owner for listXAttrs is correct. In linux, since the equivalent of listXAttrs only requires scan permission on the owning directory, I think we should do the same. Therefore, instead of checkOwner, I think we want to do checkParentAccess(EXECUTE). Are you ok with that? getfattr in CLI doesn't throw exception or return non-0 return code when xattr doesn't exist Key: HDFS-6422 URL: https://issues.apache.org/jira/browse/HDFS-6422 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 2.5.0 Reporter: Charles Lamb Assignee: Charles Lamb Attachments: HDFS-6422.1.patch, HDFS-6422.2.patch, HDFS-6422.3.patch If you do hdfs dfs -getfattr -n user.blah /foo and user.blah doesn't exist, the command prints # file: /foo and a 0 return code. It should print an exception and return a non-0 return code instead. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6645) Add test for successive Snapshots between XAttr modifications
[ https://issues.apache.org/jira/browse/HDFS-6645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Chu updated HDFS-6645: -- Attachment: HDFS-6645.001.patch Chris Nauroth and Giridharan Kesavan fixed the PreCommit jenkins job issues. Resubmitting identical patch to trigger Hadoop QA. Add test for successive Snapshots between XAttr modifications - Key: HDFS-6645 URL: https://issues.apache.org/jira/browse/HDFS-6645 Project: Hadoop HDFS Issue Type: Test Components: snapshots, test Affects Versions: 3.0.0, 2.6.0 Reporter: Stephen Chu Assignee: Stephen Chu Priority: Minor Attachments: HDFS-6645.001.patch, HDFS-6645.001.patch, HDFS-6645.001.patch In the current TestXAttrWithSnapshot unit tests, we create a single snapshot per test. We should test taking multiple snapshots on a path in between XAttr modifications of that path. We should also verify that deletion of a snapshot does not somehow alter the XAttrs of the other snapshots of the same path. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6647) Edit log corruption when pipeline recovery occurs for deleted file present in snapshot
[ https://issues.apache.org/jira/browse/HDFS-6647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056896#comment-14056896 ] Aaron T. Myers commented on HDFS-6647: -- The latest patch looks good to me. +1 pending HDFS-6618 being committed and the Jenkins seal of approval. Edit log corruption when pipeline recovery occurs for deleted file present in snapshot -- Key: HDFS-6647 URL: https://issues.apache.org/jira/browse/HDFS-6647 Project: Hadoop HDFS Issue Type: Bug Components: namenode, snapshots Affects Versions: 2.4.1 Reporter: Aaron T. Myers Assignee: Kihwal Lee Priority: Blocker Attachments: HDFS-6647-failing-test.patch, HDFS-6647.patch, HDFS-6647.v2.patch I've encountered a situation wherein an OP_UPDATE_BLOCKS can appear in the edit log for a file after an OP_DELETE has previously been logged for that file. Such an edit log sequence cannot then be successfully read by the NameNode. More details in the first comment. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6652) RecoverLease cannot success and file cannot be closed under high load
[ https://issues.apache.org/jira/browse/HDFS-6652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juan Yu updated HDFS-6652: -- Attachment: testLeaseRecoveryWithMultiWriters.patch Here is an unit test to reproduce this issue. RecoverLease cannot success and file cannot be closed under high load - Key: HDFS-6652 URL: https://issues.apache.org/jira/browse/HDFS-6652 Project: Hadoop HDFS Issue Type: Bug Reporter: Juan Yu Priority: Minor Attachments: testLeaseRecoveryWithMultiWriters.patch When there are multiple clients try to write to the same file frequently, there is chance that block state goes wrong so lease recovery cannot be done and file cannot be closed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6652) RecoverLease cannot success and file cannot be closed under high load
Juan Yu created HDFS-6652: - Summary: RecoverLease cannot success and file cannot be closed under high load Key: HDFS-6652 URL: https://issues.apache.org/jira/browse/HDFS-6652 Project: Hadoop HDFS Issue Type: Bug Reporter: Juan Yu Priority: Minor Attachments: testLeaseRecoveryWithMultiWriters.patch When there are multiple clients try to write to the same file frequently, there is chance that block state goes wrong so lease recovery cannot be done and file cannot be closed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6652) RecoverLease cannot success and file cannot be closed under high load
[ https://issues.apache.org/jira/browse/HDFS-6652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056936#comment-14056936 ] Juan Yu commented on HDFS-6652: --- The lease recovery failure is related to HDFS-4504 RecoverLease cannot success and file cannot be closed under high load - Key: HDFS-6652 URL: https://issues.apache.org/jira/browse/HDFS-6652 Project: Hadoop HDFS Issue Type: Bug Reporter: Juan Yu Priority: Minor Attachments: testLeaseRecoveryWithMultiWriters.patch When there are multiple clients try to write to the same file frequently, there is chance that block state goes wrong so lease recovery cannot be done and file cannot be closed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-4504) DFSOutputStream#close doesn't always release resources (such as leases)
[ https://issues.apache.org/jira/browse/HDFS-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056945#comment-14056945 ] Hadoop QA commented on HDFS-4504: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12599081/HDFS-4504.016.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7313//console This message is automatically generated. DFSOutputStream#close doesn't always release resources (such as leases) --- Key: HDFS-4504 URL: https://issues.apache.org/jira/browse/HDFS-4504 Project: Hadoop HDFS Issue Type: Bug Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-4504.001.patch, HDFS-4504.002.patch, HDFS-4504.007.patch, HDFS-4504.008.patch, HDFS-4504.009.patch, HDFS-4504.010.patch, HDFS-4504.011.patch, HDFS-4504.014.patch, HDFS-4504.015.patch, HDFS-4504.016.patch {{DFSOutputStream#close}} can throw an {{IOException}} in some cases. One example is if there is a pipeline error and then pipeline recovery fails. Unfortunately, in this case, some of the resources used by the {{DFSOutputStream}} are leaked. One particularly important resource is file leases. So it's possible for a long-lived HDFS client, such as Flume, to write many blocks to a file, but then fail to close it. Unfortunately, the {{LeaseRenewerThread}} inside the client will continue to renew the lease for the undead file. Future attempts to close the file will just rethrow the previous exception, and no progress can be made by the client. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5202) Support Centralized Cache Management on Windows.
[ https://issues.apache.org/jira/browse/HDFS-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056958#comment-14056958 ] Hadoop QA commented on HDFS-5202: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12654878/HDFS-5202.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 1 warning messages. See https://builds.apache.org/job/PreCommit-HDFS-Build/7311//artifact/trunk/patchprocess/diffJavadocWarnings.txt for details. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.fs.shell.TestCopyPreserveFlag org.apache.hadoop.fs.TestSymlinkLocalFSFileContext org.apache.hadoop.fs.shell.TestTextCommand org.apache.hadoop.ipc.TestIPC org.apache.hadoop.fs.TestSymlinkLocalFSFileSystem org.apache.hadoop.fs.shell.TestPathData org.apache.hadoop.fs.TestDFVariations {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7311//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7311//console This message is automatically generated. Support Centralized Cache Management on Windows. Key: HDFS-5202 URL: https://issues.apache.org/jira/browse/HDFS-5202 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 3.0.0, 2.5.0 Reporter: Colin Patrick McCabe Assignee: Chris Nauroth Attachments: HDFS-5202.1.patch HDFS caching currently is implemented using POSIX syscalls for checking ulimit and locking pages of memory into the process's address space. These POSIX syscalls do not exist on Windows. This issue will implement equivalent functionality so that Windows deployments can use Centralized Cache Management. -- This message was sent by Atlassian JIRA (v6.2#6252)