[jira] [Resolved] (HDFS-3660) TestDatanodeBlockScanner#testBlockCorruptionRecoveryPolicy2 times out
[ https://issues.apache.org/jira/browse/HDFS-3660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers resolved HDFS-3660. -- Resolution: Cannot Reproduce Target Version/s: (was: ) This is an ancient/stale flaky test JIRA. Resolving. TestDatanodeBlockScanner#testBlockCorruptionRecoveryPolicy2 times out Key: HDFS-3660 URL: https://issues.apache.org/jira/browse/HDFS-3660 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Eli Collins Priority: Minor Saw this on a recent jenkins run. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-4001) TestSafeMode#testInitializeReplQueuesEarly may time out
[ https://issues.apache.org/jira/browse/HDFS-4001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers resolved HDFS-4001. -- Resolution: Fixed Haven't seen this fail in a very long time. Closing this out. Feel free to reopen if you disagree. TestSafeMode#testInitializeReplQueuesEarly may time out --- Key: HDFS-4001 URL: https://issues.apache.org/jira/browse/HDFS-4001 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.0-alpha Reporter: Eli Collins Attachments: timeout.txt.gz Saw this failure on a recent branch-2 jenkins run, has also been seen on trunk. {noformat} java.util.concurrent.TimeoutException: Timed out waiting for condition at org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:107) at org.apache.hadoop.hdfs.TestSafeMode.testInitializeReplQueuesEarly(TestSafeMode.java:191) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-3811) TestPersistBlocks#TestRestartDfsWithFlush appears to be flaky
[ https://issues.apache.org/jira/browse/HDFS-3811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers resolved HDFS-3811. -- Resolution: Cannot Reproduce I don't think I've seen this fail in a very long time. Going to resolve this. Please reopen if you disagree. TestPersistBlocks#TestRestartDfsWithFlush appears to be flaky - Key: HDFS-3811 URL: https://issues.apache.org/jira/browse/HDFS-3811 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.2-alpha Reporter: Andrew Wang Assignee: Todd Lipcon Attachments: stacktrace, testfail-editlog.log, testfail.log, testpersistblocks.txt This test failed on a recent Jenkins build, but passes for me locally. Seems flaky. See: https://builds.apache.org/job/PreCommit-HDFS-Build/3021//testReport/org.apache.hadoop.hdfs/TestPersistBlocks/TestRestartDfsWithFlush/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-3532) TestDatanodeBlockScanner.testBlockCorruptionRecoveryPolicy1 times out
[ https://issues.apache.org/jira/browse/HDFS-3532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers resolved HDFS-3532. -- Resolution: Cannot Reproduce This is an ancient/stale flaky test JIRA. Resolving. TestDatanodeBlockScanner.testBlockCorruptionRecoveryPolicy1 times out - Key: HDFS-3532 URL: https://issues.apache.org/jira/browse/HDFS-3532 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.0.0-alpha Reporter: Eli Collins I've seen this test time out on recent trunk jenkins test patch runs even though HDFS-3266 was put in a couple weeks ago. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-2433) TestFileAppend4 fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers resolved HDFS-2433. -- Resolution: Cannot Reproduce I don't think I've seen this fail in a long, long time. Going to close this out. Please reopen if you disagree. TestFileAppend4 fails intermittently Key: HDFS-2433 URL: https://issues.apache.org/jira/browse/HDFS-2433 Project: Hadoop HDFS Issue Type: Bug Components: datanode, namenode, test Affects Versions: 0.20.205.0, 1.0.0 Reporter: Robert Joseph Evans Priority: Critical Attachments: failed.tar.bz2 A Jenkins build we have running failed twice in a row with issues form TestFileAppend4.testAppendSyncReplication1 in an attempt to reproduce the error I ran TestFileAppend4 in a loop over night saving the results away. (No clean was done in between test runs) When TestFileAppend4 is run in a loop the testAppendSyncReplication[012] tests fail about 10% of the time (14 times out of 130 tries) They all fail with something like the following. Often it is only one of the tests that fail, but I have seen as many as two fail in one run. {noformat} Testcase: testAppendSyncReplication2 took 32.198 sec FAILED Should have 2 replicas for that block, not 1 junit.framework.AssertionFailedError: Should have 2 replicas for that block, not 1 at org.apache.hadoop.hdfs.TestFileAppend4.replicationTest(TestFileAppend4.java:477) at org.apache.hadoop.hdfs.TestFileAppend4.testAppendSyncReplication2(TestFileAppend4.java:425) {noformat} I also saw several other tests that are a part of TestFileApped4 fail during this experiment. They may all be related to one another so I am filing them in the same JIRA. If it turns out that they are not related then they can be split up later. testAppendSyncBlockPlusBbw failed 6 out of the 130 times or about 5% of the time {noformat} Testcase: testAppendSyncBlockPlusBbw took 1.633 sec FAILED unexpected file size! received=0 , expected=1024 junit.framework.AssertionFailedError: unexpected file size! received=0 , expected=1024 at org.apache.hadoop.hdfs.TestFileAppend4.assertFileSize(TestFileAppend4.java:136) at org.apache.hadoop.hdfs.TestFileAppend4.testAppendSyncBlockPlusBbw(TestFileAppend4.java:401) {noformat} testAppendSyncChecksum[012] failed 2 out of the 130 times or about 1.5% of the time {noformat} Testcase: testAppendSyncChecksum1 took 32.385 sec FAILED Should have 1 replica for that block, not 2 junit.framework.AssertionFailedError: Should have 1 replica for that block, not 2 at org.apache.hadoop.hdfs.TestFileAppend4.checksumTest(TestFileAppend4.java:556) at org.apache.hadoop.hdfs.TestFileAppend4.testAppendSyncChecksum1(TestFileAppend4.java:500) {noformat} I will attach logs for all of the failures. Be aware that I did change some of the logging messages in this test so I could better see when testAppendSyncReplication started and ended. Other then that the code is stock 0.20.205 RC2 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8194) Add administrative tool to be able to examine the NN's view of DN storages
Aaron T. Myers created HDFS-8194: Summary: Add administrative tool to be able to examine the NN's view of DN storages Key: HDFS-8194 URL: https://issues.apache.org/jira/browse/HDFS-8194 Project: Hadoop HDFS Issue Type: New Feature Components: namenode Affects Versions: 2.7.0 Reporter: Aaron T. Myers Assignee: Colin Patrick McCabe The NN has long had facilities to be able to list all of the DNs that are registered with it. It would be great if there were an administrative tool be able to list all of the individual storages that the NN is tracking. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-7421) Move processing of postponed over-replicated blocks to a background task
[ https://issues.apache.org/jira/browse/HDFS-7421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers resolved HDFS-7421. -- Resolution: Duplicate Move processing of postponed over-replicated blocks to a background task Key: HDFS-7421 URL: https://issues.apache.org/jira/browse/HDFS-7421 Project: Hadoop HDFS Issue Type: Improvement Components: ha, namenode Affects Versions: 2.6.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers In an HA environment, we postpone sending block invalidates to DNs until all DNs holding a given block have done at least one block report to the NN after it became active. When that first block report after becoming active does occur, we attempt to reprocess all postponed misreplicated blocks inline with the block report RPC. In the case where there are many postponed misreplicated blocks, this can cause block report RPCs to take an inordinately long time to complete, sometimes on the order of minutes, which has the potential to tie up RPC handlers, block incoming RPCs, etc. There's no need to hurriedly process all postponed misreplicated blocks so that we can quickly send invalidate commands back to DNs, so let's move this processing outside of the RPC handler context and into a background thread. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7421) Move processing of postponed over-replicated blocks to a background task
Aaron T. Myers created HDFS-7421: Summary: Move processing of postponed over-replicated blocks to a background task Key: HDFS-7421 URL: https://issues.apache.org/jira/browse/HDFS-7421 Project: Hadoop HDFS Issue Type: Improvement Components: ha, namenode Affects Versions: 2.6.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers In an HA environment, we postpone sending block invalidates to DNs until all DNs holding a given block have done at least one block report to the NN after it became active. When that first block report after becoming active does occur, we attempt to reprocess all postponed misreplicated blocks inline with the block report RPC. In the case where there are many postponed misreplicated blocks, this can cause block report RPCs to take an inordinately long time to complete, sometimes on the order of minutes, which has the potential to tie up RPC handlers, block incoming RPCs, etc. There's no need to hurriedly process all postponed misreplicated blocks so that we can quickly send invalidate commands back to DNs, so let's move this processing outside of the RPC handler context and into a background thread. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-6647) Edit log corruption when pipeline recovery occurs for deleted file present in snapshot
Aaron T. Myers created HDFS-6647: Summary: Edit log corruption when pipeline recovery occurs for deleted file present in snapshot Key: HDFS-6647 URL: https://issues.apache.org/jira/browse/HDFS-6647 Project: Hadoop HDFS Issue Type: Bug Components: namenode, snapshots Affects Versions: 2.4.1 Reporter: Aaron T. Myers I've encountered a situation wherein an OP_UPDATE_BLOCKS can appear in the edit log for a file after an OP_DELETE has previously been logged for that file. Such an edit log sequence cannot then be successfully read by the NameNode. More details in the first comment. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6563) NameNode cannot save fsimage in certain circumstances when snapshots are in use
Aaron T. Myers created HDFS-6563: Summary: NameNode cannot save fsimage in certain circumstances when snapshots are in use Key: HDFS-6563 URL: https://issues.apache.org/jira/browse/HDFS-6563 Project: Hadoop HDFS Issue Type: Bug Components: namenode, snapshots Affects Versions: 2.4.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers Priority: Critical Checkpoints will start to fail and the NameNode will not be able to manually saveNamespace if the following set of steps occurs: # A zero-length file appears in a snapshot # That file is later lengthened to include at least one block # That file is subsequently deleted from the present file system but remains in the snapshot More details in the first comment. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6463) Incorrect permission can be created after setting ACLs
Aaron T. Myers created HDFS-6463: Summary: Incorrect permission can be created after setting ACLs Key: HDFS-6463 URL: https://issues.apache.org/jira/browse/HDFS-6463 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.0 Reporter: Aaron T. Myers When setting ACLs for a file or directory, it's possible for the resulting FsPermission object's group entry to be set incorrectly, in particular it will be set to the mask entry. More details in the first comment of this JIRA. Thanks to Szehon Ho for identifying this issue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6435) Add support for specifying a static uid/gid mapping for the NFS gateway
Aaron T. Myers created HDFS-6435: Summary: Add support for specifying a static uid/gid mapping for the NFS gateway Key: HDFS-6435 URL: https://issues.apache.org/jira/browse/HDFS-6435 Project: Hadoop HDFS Issue Type: New Feature Components: nfs Affects Versions: 2.4.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers It's quite reasonable that folks will want to access the HDFS NFS Gateway from client machines where the UIDs/GIDs do not line up with those on the NFS Gateway itself. We should provide a way to map these UIDs/GIDs between the systems. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6289) HA failover can fail if there are pending DN messages for DNs which no longer exist
Aaron T. Myers created HDFS-6289: Summary: HA failover can fail if there are pending DN messages for DNs which no longer exist Key: HDFS-6289 URL: https://issues.apache.org/jira/browse/HDFS-6289 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.4.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers Priority: Critical In an HA setup, the standby NN may receive messages from DNs for blocks which the standby NN is not yet aware of. It queues up these messages and replays them when it next reads from the edit log or fails over. On a failover, all of these pending DN messages must be processed successfully in order for the failover to succeed. If one of these pending DN messages refers to a DN storageId that no longer exists (because the DN with that transfer address has been reformatted and has re-registered with the same transfer address) then on transition to active the NN will not be able to process this DN message and will suicide with an error like the following: {noformat} 2014-04-25 14:23:17,922 FATAL namenode.NameNode (NameNode.java:doImmediateShutdown(1525)) - Error encountered requiring NN shutdown. Shutting down immediately. java.io.IOException: Cannot mark blk_1073741825_900(stored=blk_1073741825_1001) as corrupt because datanode 127.0.0.1:33324 does not exist {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HDFS-6280) Provide option to
[ https://issues.apache.org/jira/browse/HDFS-6280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers resolved HDFS-6280. -- Resolution: Invalid Accidentally hit create too soon. :) Provide option to -- Key: HDFS-6280 URL: https://issues.apache.org/jira/browse/HDFS-6280 Project: Hadoop HDFS Issue Type: Improvement Reporter: Aaron T. Myers -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6280) Provide option to
Aaron T. Myers created HDFS-6280: Summary: Provide option to Key: HDFS-6280 URL: https://issues.apache.org/jira/browse/HDFS-6280 Project: Hadoop HDFS Issue Type: Improvement Reporter: Aaron T. Myers -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6281) Provide option to use the NFS Gateway without having to use the Hadoop portmapper
Aaron T. Myers created HDFS-6281: Summary: Provide option to use the NFS Gateway without having to use the Hadoop portmapper Key: HDFS-6281 URL: https://issues.apache.org/jira/browse/HDFS-6281 Project: Hadoop HDFS Issue Type: New Feature Components: nfs Affects Versions: 2.4.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers In order to use the NFS Gateway on operating systems with the rpcbind privileged registration bug, we currently require users to shut down and discontinue use of the system-provided portmap daemon, and instead use the portmap daemon provided by Hadoop. Alternately, we can work around this bug if we tweak the NFS Gateway to perform its port registration from a privileged port, and still let users use the system portmap daemon. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6112) NFS Gateway docs are incorrect for allowed hosts configuration
Aaron T. Myers created HDFS-6112: Summary: NFS Gateway docs are incorrect for allowed hosts configuration Key: HDFS-6112 URL: https://issues.apache.org/jira/browse/HDFS-6112 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.4.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers Priority: Minor The NFS gateway export configuration docs say that the machine name configuration can be wildcards and provides the example {{host*.example.com}}. The term wildcard and this example might imply typical globbing semantics, but in fact what it actually supports is Java regular expressions. I think we should change the docs to make this clearer. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HDFS-6048) DFSClient fails if native library doesn't exist
[ https://issues.apache.org/jira/browse/HDFS-6048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers resolved HDFS-6048. -- Resolution: Duplicate Hi Akira, I think this will be addressed by HDFS-6040, which should be committed shortly. DFSClient fails if native library doesn't exist --- Key: HDFS-6048 URL: https://issues.apache.org/jira/browse/HDFS-6048 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 2.4.0 Reporter: Akira AJISAKA Priority: Blocker When I executed FSShell commands (such as hdfs dfs -ls, -mkdir, -cat) in trunk, {{UnsupportedOperationException}} occurred in {{o.a.h.net.unix.DomainSocketWatcher}} and the commands failed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6033) PBImageXmlWriter incorrectly handles processing cache directives
Aaron T. Myers created HDFS-6033: Summary: PBImageXmlWriter incorrectly handles processing cache directives Key: HDFS-6033 URL: https://issues.apache.org/jira/browse/HDFS-6033 Project: Hadoop HDFS Issue Type: Bug Components: caching Affects Versions: 2.4.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers When attempting to process cache directives in {{PBImageXmlWriter#dumpCacheManagerSection}}, we incorrectly loop the number of cache _pools_, not directives. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5921) Cannot browse file system via NN web UI if any directory has the sticky bit set
Aaron T. Myers created HDFS-5921: Summary: Cannot browse file system via NN web UI if any directory has the sticky bit set Key: HDFS-5921 URL: https://issues.apache.org/jira/browse/HDFS-5921 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.3.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers Priority: Critical You'll see an error like this in the JS console if any directory has the sticky bit set: {noformat} 'helper_to_permission': function(chunk, ctx, bodies, params) { var exec = ((parms.perm % 10) 1) == 1; Uncaught ReferenceError: parms is not defined {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5922) DN heartbeat thread can get stuck in tight loop
Aaron T. Myers created HDFS-5922: Summary: DN heartbeat thread can get stuck in tight loop Key: HDFS-5922 URL: https://issues.apache.org/jira/browse/HDFS-5922 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.3.0 Reporter: Aaron T. Myers We saw an issue recently on a test cluster where one of the DN threads was consuming 100% of a single CPU. Running jstack indicated that it was the DN heartbeat thread. I believe I've tracked down the cause to a bug in the accounting around the value of {{pendingReceivedRequests}}. More details in the first comment. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5517) Lower the default maximum number of blocks per file
Aaron T. Myers created HDFS-5517: Summary: Lower the default maximum number of blocks per file Key: HDFS-5517 URL: https://issues.apache.org/jira/browse/HDFS-5517 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.2.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers We introduced the maximum number of blocks per file in HDFS-4305, but we set the default to 1MM. In practice this limit is so high as to never be hit, whereas we know that an individual file with 10s of thousands of blocks can cause problems. We should lower the default value, in my opinion to 10k. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HDFS-5433) When reloading fsimage during checkpointing, we should clear existing snapshottable directories
Aaron T. Myers created HDFS-5433: Summary: When reloading fsimage during checkpointing, we should clear existing snapshottable directories Key: HDFS-5433 URL: https://issues.apache.org/jira/browse/HDFS-5433 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 2.2.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers Priority: Critical The complete set of snapshottable directories are referenced both via the file system tree and in the SnapshotManager class. It's possible that when the 2NN performs a checkpoint, it will reload its in-memory state based on a new fsimage from the NN, but will not clear the set of snapshottable directories referenced by the SnapshotManager. In this case, the 2NN will write out an fsimage that cannot be loaded, since the integer written to the fsimage indicating the number of snapshottable directories will be out of sync with the actual number of snapshottable directories serialized to the fsimage. This is basically the same as HDFS-3835, but for snapshottable directories instead of delegation tokens. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HDFS-5403) WebHdfs client cannot communicate with older WebHdfs servers post HDFS-5306
Aaron T. Myers created HDFS-5403: Summary: WebHdfs client cannot communicate with older WebHdfs servers post HDFS-5306 Key: HDFS-5403 URL: https://issues.apache.org/jira/browse/HDFS-5403 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 2.2.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers HDFS-5306 introduced the field infoSecurePort to the DatanodeIDProto PB definition and made it optional for compatibility purposes. However, we don't correctly the handle the case when this field is not present when deserializing the response from a WebHdfs request. This results in an NPE at the client when this occurs. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HDFS-5289) Race condition in TestRetryCacheWithHA#testCreateSymlink causes spurious test failure
Aaron T. Myers created HDFS-5289: Summary: Race condition in TestRetryCacheWithHA#testCreateSymlink causes spurious test failure Key: HDFS-5289 URL: https://issues.apache.org/jira/browse/HDFS-5289 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.1.1-beta Reporter: Aaron T. Myers Assignee: Aaron T. Myers The code to check if the operation has been completed on the active NN can potentially execute before the thread actually doing the operation has run. In this case the checking code will retry the check if the result of the check is null. However, the test operation does not in fact return null, instead throwing an exception if the file doesn't exist yet. We need to catch the exception and retry. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Resolved] (HDFS-3133) Add support for DFS upgrade with HA enabled
[ https://issues.apache.org/jira/browse/HDFS-3133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers resolved HDFS-3133. -- Resolution: Duplicate Resolving as a duplicate of HDFS-5138, which has a lot more discussion about how best to do this. Add support for DFS upgrade with HA enabled --- Key: HDFS-3133 URL: https://issues.apache.org/jira/browse/HDFS-3133 Project: Hadoop HDFS Issue Type: Improvement Components: ha, namenode Affects Versions: 2.0.0-alpha Reporter: Aaron T. Myers For the first implementation of HA NameNode, we punted on allowing DFS upgrade with HA enabled, which makes doing a DFS upgrade on an HA-enabled cluster quite cumbersome and error-prone. We should add better support for this. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Resolved] (HDFS-3958) Integrate upgrade/finalize/rollback with external journals
[ https://issues.apache.org/jira/browse/HDFS-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers resolved HDFS-3958. -- Resolution: Duplicate Resolving as a duplicate of HDFS-5138, which has a lot more discussion about how best to do this. Integrate upgrade/finalize/rollback with external journals -- Key: HDFS-3958 URL: https://issues.apache.org/jira/browse/HDFS-3958 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 3.0.0 Reporter: Todd Lipcon Currently the NameNode upgrade/rollback/finalize framework only supports local storage. With edits being stored in pluggable Journals, this could create certain difficulties - in particular, rollback wouldn't actually rollback the external storage to the old state. We should look at how to expose the right hooks to the external journal storage to snapshot/rollback/finalize. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HDFS-5223) Allow edit log/fsimage format changes without changing layout version
Aaron T. Myers created HDFS-5223: Summary: Allow edit log/fsimage format changes without changing layout version Key: HDFS-5223 URL: https://issues.apache.org/jira/browse/HDFS-5223 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.1.1-beta Reporter: Aaron T. Myers Currently all HDFS on-disk formats are version by the single layout version. This means that even for changes which might be backward compatible, like the addition of a new edit log op code, we must go through the full `namenode -upgrade' process which requires coordination with DNs, etc. HDFS should support a lighter weight alternative. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-4299) WebHDFS Should Support HA Configuration
[ https://issues.apache.org/jira/browse/HDFS-4299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers resolved HDFS-4299. -- Resolution: Duplicate Assignee: (was: Haohui Mai) Thanks, Daisuke. Closing this one out. WebHDFS Should Support HA Configuration --- Key: HDFS-4299 URL: https://issues.apache.org/jira/browse/HDFS-4299 Project: Hadoop HDFS Issue Type: Improvement Reporter: Daisuke Kobayashi WebHDFS clients connect directly to NameNodes rather than use a Hadoop client, so there is no failover capability. Though a workaround is available to use HttpFS with an HA client, WebHDFS also should support HA configuration. Please see also: https://issues.cloudera.org/browse/DISTRO-403 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-5159) Secondary NameNode fails to checkpoint if error occurs downloading edits on first checkpoint
Aaron T. Myers created HDFS-5159: Summary: Secondary NameNode fails to checkpoint if error occurs downloading edits on first checkpoint Key: HDFS-5159 URL: https://issues.apache.org/jira/browse/HDFS-5159 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.1.0-beta Reporter: Aaron T. Myers Assignee: Aaron T. Myers The 2NN will avoid downloading/loading a new fsimage if its local copy of fsimage is the same as the version on the NN. However, the decision to *load* the fsimage from disk into memory is based only on the on-disk fsimage version. If an error occurs between downloading and loading the fsimage on the first checkpoint attempt, the 2NN will never load the fsimage, and then on subsequent checkpoint attempts it will not load the on-disk fsimage and thus will never checkpoint successfully. Example error message in the first comment of this ticket. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-5102) Snapshot names should not be allowed to contain slash characters
Aaron T. Myers created HDFS-5102: Summary: Snapshot names should not be allowed to contain slash characters Key: HDFS-5102 URL: https://issues.apache.org/jira/browse/HDFS-5102 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 2.1.0-beta Reporter: Aaron T. Myers Snapshots of a snapshottable directory are allowed to have arbitrary names. Presently, if you create a snapshot with a snapshot name that begins with a / character, this will be allowed, but later attempts to access this snapshot will fail because of the way the {{Path}} class deals with consecutive / characters. I suggest we disallow / from appearing in snapshot names. An example of this is in the first comment on this JIRA. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-5060) NN should proactively perform a saveNamespace if it has a huge number of outstanding uncheckpointed transactions
Aaron T. Myers created HDFS-5060: Summary: NN should proactively perform a saveNamespace if it has a huge number of outstanding uncheckpointed transactions Key: HDFS-5060 URL: https://issues.apache.org/jira/browse/HDFS-5060 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.1.0-beta Reporter: Aaron T. Myers Assignee: Aaron T. Myers In a properly-functioning HDFS system, checkpoints will be triggered either by the secondary NN or standby NN regularly, by default every hour or 1MM outstanding edits transactions, whichever come first. However, in cases where this second node is down for an extended period of time, the number of outstanding transactions can grow so large as to cause a restart to take an inordinately long time. This JIRA proposes to make the active NN monitor its number of outstanding transactions and perform a proactive local saveNamespace if it grows beyond a configurable threshold. I'm envisioning something like 10x the configured number of transactions which in a properly-functioning cluster would result in a checkpoint from the second NN. Though this would be disruptive to clients while it's taking place, likely for a few minutes, this seems better than the alternative of a subsequent multi-hour restart and should never actually occur in a properly-functioning cluster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-5064) Standby checkpoints should not block concurrent readers
Aaron T. Myers created HDFS-5064: Summary: Standby checkpoints should not block concurrent readers Key: HDFS-5064 URL: https://issues.apache.org/jira/browse/HDFS-5064 Project: Hadoop HDFS Issue Type: Bug Components: ha, namenode Affects Versions: 2.1.1-beta Reporter: Aaron T. Myers Assignee: Aaron T. Myers We've observed an issue which causes fetches of the {{/jmx}} page of the NN to take a long time to load when the standby is in the process of creating a checkpoint. Even though both creating the checkpoint and gathering the statistics for {{/jmx}} take only the FSNS read lock, the issue is that since the FSNS uses a _fair_ RW lock, a single writer attempting to get the lock will block all threads attempting to get only the read lock for the duration of the checkpoint. This will cause {{/jmx}}, and really any thread only attempting to get the read lock, to block for the duration of the checkpoint, even though they should be able to proceed concurrently with the checkpointing thread. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-5027) On startup, DN should scan volumes in parallel
Aaron T. Myers created HDFS-5027: Summary: On startup, DN should scan volumes in parallel Key: HDFS-5027 URL: https://issues.apache.org/jira/browse/HDFS-5027 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.0.4-alpha Reporter: Aaron T. Myers Assignee: Aaron T. Myers On startup the DN must scan all replicas on all configured volumes before the initial block report to the NN. This is currently done serially, but can be done in parallel to improve startup time of the DN. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4906) HDFS Output streams should not accept writes after being closed
Aaron T. Myers created HDFS-4906: Summary: HDFS Output streams should not accept writes after being closed Key: HDFS-4906 URL: https://issues.apache.org/jira/browse/HDFS-4906 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.0.5-alpha Reporter: Aaron T. Myers Assignee: Aaron T. Myers Currently if one closes an OutputStream obtained from FileSystem#create and then calls write(...) on that closed stream, the write will appear to succeed without error though no data will be written to HDFS. A subsequent call to close will also silently appear to succeed. We should make it so that attempts to write to closed streams fails fast. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4830) Typo in config settings for AvailableSpaceVolumeChoosingPolicy in hdfs-default.xml
Aaron T. Myers created HDFS-4830: Summary: Typo in config settings for AvailableSpaceVolumeChoosingPolicy in hdfs-default.xml Key: HDFS-4830 URL: https://issues.apache.org/jira/browse/HDFS-4830 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.5-beta Reporter: Aaron T. Myers Assignee: Aaron T. Myers Priority: Minor In hdfs-default.xml we have these two settings: {noformat} dfs.datanode.fsdataset.volume.choosing.balanced-space-threshold dfs.datanode.fsdataset.volume.choosing.balanced-space-preference-percent {noformat} But in fact they should be these, from DFSConfigKeys.java: {noformat} dfs.datanode.available-space-volume-choosing-policy.balanced-space-threshold dfs.datanode.available-space-volume-choosing-policy.balanced-space-preference-percent {noformat} This won't actually affect any functionality, since default values are used in the code anyway, but makes the documentation generated from hdfs-default.xml inaccurate. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-4352) Encapsulate arguments to BlockReaderFactory in a class
[ https://issues.apache.org/jira/browse/HDFS-4352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers resolved HDFS-4352. -- Resolution: Won't Fix Seems some folks don't think this is the best idea. Encapsulate arguments to BlockReaderFactory in a class -- Key: HDFS-4352 URL: https://issues.apache.org/jira/browse/HDFS-4352 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Affects Versions: 2.0.3-alpha Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: 01b.patch, 01.patch Encapsulate the arguments to BlockReaderFactory in a class to avoid having to pass around 10+ arguments to a few different functions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4747) Convert snapshot user guide to APT from XDOC
Aaron T. Myers created HDFS-4747: Summary: Convert snapshot user guide to APT from XDOC Key: HDFS-4747 URL: https://issues.apache.org/jira/browse/HDFS-4747 Project: Hadoop HDFS Issue Type: Sub-task Components: documentation Affects Versions: Snapshot (HDFS-2802) Reporter: Aaron T. Myers Assignee: Aaron T. Myers To be consistent with the rest of the HDFS docs, the user snapshots user guide should use APT instead of XDOC. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4655) DNA_FINALIZE is logged as being an unknown command by the DN when received from the standby NN
Aaron T. Myers created HDFS-4655: Summary: DNA_FINALIZE is logged as being an unknown command by the DN when received from the standby NN Key: HDFS-4655 URL: https://issues.apache.org/jira/browse/HDFS-4655 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.0.4-alpha Reporter: Aaron T. Myers Assignee: Aaron T. Myers Priority: Minor This is harmless since the alternative is just to log the command as being ignored, but this bug results in a somewhat concerning error message appearing in the logs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4656) DN heartbeat loop can be briefly tight
Aaron T. Myers created HDFS-4656: Summary: DN heartbeat loop can be briefly tight Key: HDFS-4656 URL: https://issues.apache.org/jira/browse/HDFS-4656 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.0.4-alpha Reporter: Aaron T. Myers Assignee: Aaron T. Myers Priority: Minor The DN hearbeat loop looks roughly like this: {code} if (now - timeOfLastHeartbeat configuredHeartbeatInterval) { // do heartbeat } timeToWait = configuredHeartbeatInterval - (now - timeOfLastHeartbeat) sleep(timeToWait) {code} The trouble is that since we sleep for exactly the heartbeat interval, and then check to see if we have waited _more_ than that heartbeat interval, we will very often have waited exactly the heartbeat interval (in millis), and not more than it. In this case we will skip actually performing the heartbeat and will calculcate timeToWait as being 0ms. The DN heartbeat loop will then loop tightly for 1ms. The solution is just to change the {{}} in the code above to {{=}}. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4657) If incremental BR is received before first full BR NN will log a line for every block on a DN
Aaron T. Myers created HDFS-4657: Summary: If incremental BR is received before first full BR NN will log a line for every block on a DN Key: HDFS-4657 URL: https://issues.apache.org/jira/browse/HDFS-4657 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.0.4-alpha Reporter: Aaron T. Myers Assignee: Aaron T. Myers This can impact restart times pretty substantially if the DNs have a lot of blocks, and since the FSNS write lock is held while processing the block report clients will not make any progress. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4658) Standby NN will log that it has received a block report after becoming active
Aaron T. Myers created HDFS-4658: Summary: Standby NN will log that it has received a block report after becoming active Key: HDFS-4658 URL: https://issues.apache.org/jira/browse/HDFS-4658 Project: Hadoop HDFS Issue Type: Bug Components: ha, namenode Affects Versions: 2.0.4-alpha Reporter: Aaron T. Myers Assignee: Aaron T. Myers Priority: Trivial Even when in the standby state the following line will sometimes be logged: {noformat} INFO blockmanagement.BlockManager: BLOCK* processReport: Received first block report from 172.21.3.106:50010 after becoming active. Its block contents are no longer considered stale {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4614) FSNamesystem#getContentSummary should use getPermissionChecker helper method
Aaron T. Myers created HDFS-4614: Summary: FSNamesystem#getContentSummary should use getPermissionChecker helper method Key: HDFS-4614 URL: https://issues.apache.org/jira/browse/HDFS-4614 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.0.4-alpha Reporter: Aaron T. Myers Assignee: Aaron T. Myers Priority: Trivial HDFS-4222 added this helper method and called it in most places, but missed one. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4592) Default values for access time precision are out of sync between hdfs-default.xml and the code
Aaron T. Myers created HDFS-4592: Summary: Default values for access time precision are out of sync between hdfs-default.xml and the code Key: HDFS-4592 URL: https://issues.apache.org/jira/browse/HDFS-4592 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.0.4-alpha Reporter: Aaron T. Myers Assignee: Aaron T. Myers In {{hdfs-default.xml}} we have: {code} property namedfs.namenode.accesstime.precision/name value360/value descriptionThe access time for HDFS file is precise upto this value. The default value is 1 hour. Setting a value of 0 disables access times for HDFS. /description /property {code} But in {{FSNamesystem}} we have: {code} this.accessTimePrecision = conf.getLong(DFS_NAMENODE_ACCESSTIME_PRECISION_KEY, 0); {code} We properly define {{DFS_NAMENODE_ACCESSTIME_PRECISION_DEFAULT}} in DFSConfigKeys.java, but it's not actually referenced anywhere in the code. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4591) HA clients can fail to fail over while Standby NN is performing long checkpoint
Aaron T. Myers created HDFS-4591: Summary: HA clients can fail to fail over while Standby NN is performing long checkpoint Key: HDFS-4591 URL: https://issues.apache.org/jira/browse/HDFS-4591 Project: Hadoop HDFS Issue Type: Bug Components: ha, namenode Affects Versions: 2.0.4-alpha Reporter: Aaron T. Myers Assignee: Aaron T. Myers Clients know to fail over to talk to the Active NN when they perform an RPC to the Standby NN and it throws a StandbyException. However, most places in the code that check if the NN is in the standby state do so inside the FSNS fsLock. Since this lock is held for the duration of the saveNamespace during a checkpoint, StandbyExceptions will not be thrown during this time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-4476) HDFS-347: style cleanups
[ https://issues.apache.org/jira/browse/HDFS-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers resolved HDFS-4476. -- Resolution: Fixed Hadoop Flags: Reviewed I've just committed this to the HDFS-347 branch. Thanks a lot for the contribution, Colin. HDFS-347: style cleanups Key: HDFS-4476 URL: https://issues.apache.org/jira/browse/HDFS-4476 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, hdfs-client, performance Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Attachments: HDFS-4476.001.patch, HDFS-4476.002.patch Clean up some code style issues in HDFS-347. DomainSocket.java do not use AtomicInteger for status, add a new class rename fdRef(), fdUnref(boolean), jfds, jbuf, SND_BUF_SIZE, etc. do not override finalize(). remove some dead code. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-4453) Make a simple doc to describe the usage and design of the shortcircuit read feature
[ https://issues.apache.org/jira/browse/HDFS-4453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers resolved HDFS-4453. -- Resolution: Fixed Hadoop Flags: Reviewed I've just committed this to the HDFS-347 branch. Thanks a lot for the contribution, Colin. Make a simple doc to describe the usage and design of the shortcircuit read feature --- Key: HDFS-4453 URL: https://issues.apache.org/jira/browse/HDFS-4453 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, hdfs-client Reporter: Brandon Li Assignee: Colin Patrick McCabe Attachments: HDFS-4453.001.patch, HDFS-4453.002.patch, HDFS-4453.003.patch, HDFS-4453.004.patch It would be nice to have a document to describe the configuration and design of this feature. Also its relationship with previous short circuit read implementation(HDFS-2246), for example, can they co-exist, or this one is planed to replaces HDFS-2246, or it can fall back on HDFS-2246 when unix domain socket is not supported. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-4485) HDFS-347: DN should chmod socket path a+w
[ https://issues.apache.org/jira/browse/HDFS-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers resolved HDFS-4485. -- Resolution: Fixed I've just committed this to the HDFS-347 branch. Thanks a lot for the contribution, Colin. HDFS-347: DN should chmod socket path a+w - Key: HDFS-4485 URL: https://issues.apache.org/jira/browse/HDFS-4485 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Reporter: Todd Lipcon Assignee: Colin Patrick McCabe Priority: Critical Attachments: HDFS-4485.001.patch, HDFS-4485.003.patch In cluster-testing HDFS-347, we found that in clusters where the MR job doesn't run as the same user as HDFS, clients wouldn't use short circuit read because of a 'permission denied' error connecting to the socket. It turns out that, in order to connect to a socket, clients need write permissions on the socket file. The DN should set these permissions automatically after it creates the socket. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-4473) don't create domain socket unless we need it
[ https://issues.apache.org/jira/browse/HDFS-4473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers resolved HDFS-4473. -- Resolution: Fixed Hadoop Flags: Reviewed I've just committed this to the HDFS-347 branch. Thanks a lot for the contribution, Andy. don't create domain socket unless we need it Key: HDFS-4473 URL: https://issues.apache.org/jira/browse/HDFS-4473 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, hdfs-client, performance Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Attachments: HDFS-4473.001.patch If {{dfs.domain.socket.path}} is set, but we don't have anything enabled which would need it (like {{dfs.client.read.shortcircuit}}), don't create the socket. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4462) 2NN will fail to checkpoint after an HDFS upgrade from a pre-federation version of HDFS
Aaron T. Myers created HDFS-4462: Summary: 2NN will fail to checkpoint after an HDFS upgrade from a pre-federation version of HDFS Key: HDFS-4462 URL: https://issues.apache.org/jira/browse/HDFS-4462 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.0.2-alpha Reporter: Aaron T. Myers Assignee: Aaron T. Myers The 2NN currently has logic to detect when its on-disk FS metadata needs an upgrade with respect to the NN's metadata (i.e. the layout versions are different) and in this case it will proceed with the checkpoint despite storage signatures not matching precisely if the BP ID and Cluster ID do match exactly. However, in situations where we're upgrading from versions of HDFS prior to federation, which had no BP IDs or Cluster IDs, checkpoints will always fail with an error like the following: {noformat} 13/01/31 17:02:25 ERROR namenode.SecondaryNameNode: checkpoint: Inconsistent checkpoint fields. LV = -40 namespaceID = 403832480 cTime = 1359680537192 ; clusterId = CID-0df6ff22-1165-4c7d-9630-429972a7737c ; blockpoolId = BP-1520616013-172.21.3.106-1359680537136. Expecting respectively: -19; 403832480; 0; ; . {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4448) HA NN does not start with wildcard address configured for other NN when security is enabled
Aaron T. Myers created HDFS-4448: Summary: HA NN does not start with wildcard address configured for other NN when security is enabled Key: HDFS-4448 URL: https://issues.apache.org/jira/browse/HDFS-4448 Project: Hadoop HDFS Issue Type: Bug Components: ha, namenode, security Affects Versions: 2.0.3-alpha Reporter: Aaron T. Myers Assignee: Aaron T. Myers Currently if one tries to configure HA NNs use the wildcard HTTP address when security is enabled, the NN will fail to start with an error like the following: {code} java.lang.IllegalArgumentException: java.io.IOException: Cannot use a wildcard address with security. Must explicitly set bind address for Kerberos {code} This is the case even if one configures an actual address for the other NN's HTTP address. There's no good reason for this, since we now check for the local address being set to 0.0.0.0 and determine the canonical hostname for Kerberos purposes using {{InetAddress.getLocalHost().getCanonicalHostName()}}, so we should remove the restriction. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-4438) TestDomainSocket fails when system umask is set to 0002
[ https://issues.apache.org/jira/browse/HDFS-4438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers resolved HDFS-4438. -- Resolution: Fixed Hadoop Flags: Reviewed I've just committed this to the HDFS-347 branch. Thanks a lot for the contribution, Colin. TestDomainSocket fails when system umask is set to 0002 --- Key: HDFS-4438 URL: https://issues.apache.org/jira/browse/HDFS-4438 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, hdfs-client, performance Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-4438.001.patch {{TestDomainSocket#testFdPassingPathSecurity}} fails when the system umask is set to 0002. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-4440) avoid annoying log message when dfs.domain.socket.path is not set
[ https://issues.apache.org/jira/browse/HDFS-4440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers resolved HDFS-4440. -- Resolution: Fixed Hadoop Flags: Reviewed I've just committed this to the HDFS-347 branch. Thanks a lot for the contribution, Colin. avoid annoying log message when dfs.domain.socket.path is not set - Key: HDFS-4440 URL: https://issues.apache.org/jira/browse/HDFS-4440 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, hdfs-client, performance Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Trivial Attachments: HDFS-4440.001.patch When dfs.domain.socket.path is not set, it gets set to the empty string. We should check if this conf key is the empty string in {{DomainSocketFactory}}, rather than checking against null as we currently do. Otherwise, we get annoying log messages about failing to connect to the UNIX domain socket at ''. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-4333) Using right default value for creating files in HDFS
[ https://issues.apache.org/jira/browse/HDFS-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers resolved HDFS-4333. -- Resolution: Duplicate Sounds good. Closing this issue as a duplicate then. Using right default value for creating files in HDFS Key: HDFS-4333 URL: https://issues.apache.org/jira/browse/HDFS-4333 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.2-alpha Reporter: Binglin Chang Assignee: Binglin Chang Priority: Minor The default permission to create file should be 0666 rather than 0777, HADOOP-9155 add default permission for files and change localfilesystem.create to use this default value, this jira makes the similar change with hdfs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4376) Intermittent timeout of TestBalancerWithNodeGroup
Aaron T. Myers created HDFS-4376: Summary: Intermittent timeout of TestBalancerWithNodeGroup Key: HDFS-4376 URL: https://issues.apache.org/jira/browse/HDFS-4376 Project: Hadoop HDFS Issue Type: Bug Components: balancer, test Affects Versions: 2.0.3-alpha Reporter: Aaron T. Myers Priority: Minor Attachments: test-balancer-with-node-group-timeout.txt HDFS-4261 fixed several issues with the balancer and balancer tests, and reduced the frequency with which TestBalancerWithNodeGroup times out. Despite this, occasional timeouts still occur in this test. This JIRA is to track and fix this problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4315) DNs with multiple BPs can have BPOfferServices fail to start due to unsynchronized map access
Aaron T. Myers created HDFS-4315: Summary: DNs with multiple BPs can have BPOfferServices fail to start due to unsynchronized map access Key: HDFS-4315 URL: https://issues.apache.org/jira/browse/HDFS-4315 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.0.2-alpha Reporter: Aaron T. Myers Assignee: Aaron T. Myers In some nightly test runs we've seen pretty frequent failures of TestWebHdfsWithMultipleNameNodes. I've traced the root cause to an unsynchronized map access in the DataStorage class. More details in the first comment. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-3926) QJM: Add user documentation for QJM
[ https://issues.apache.org/jira/browse/HDFS-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers resolved HDFS-3926. -- Resolution: Fixed Fix Version/s: QuorumJournalManager (HDFS-3077) Hadoop Flags: Reviewed I've just committed this to the QJM branch. QJM: Add user documentation for QJM --- Key: HDFS-3926 URL: https://issues.apache.org/jira/browse/HDFS-3926 Project: Hadoop HDFS Issue Type: Sub-task Components: documentation Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Aaron T. Myers Assignee: Aaron T. Myers Fix For: QuorumJournalManager (HDFS-3077) Attachments: HDFS-3926.patch, HDFS-3926.patch, HDFS-3926.patch, qjm-ha-doc.diff, qjm-ha-doc.diff, regular-ha-doc.diff, regular-ha-doc.diff We should add user-facing documentation for how to configure/use the QJM. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3926) QJM: Add user documentation for QJM
Aaron T. Myers created HDFS-3926: Summary: QJM: Add user documentation for QJM Key: HDFS-3926 URL: https://issues.apache.org/jira/browse/HDFS-3926 Project: Hadoop HDFS Issue Type: Sub-task Components: documentation Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Aaron T. Myers Assignee: Aaron T. Myers We should add user-facing documentation for how to configure/use the QJM. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-3893) QJM: Make QJM work with security enabled
[ https://issues.apache.org/jira/browse/HDFS-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers resolved HDFS-3893. -- Resolution: Fixed Fix Version/s: QuorumJournalManager (HDFS-3077) Hadoop Flags: Reviewed Thanks a lot for the review, Todd. I've just committed this to the HDFS-3077 branch. QJM: Make QJM work with security enabled Key: HDFS-3893 URL: https://issues.apache.org/jira/browse/HDFS-3893 Project: Hadoop HDFS Issue Type: Sub-task Components: name-node, security Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Aaron T. Myers Assignee: Aaron T. Myers Fix For: QuorumJournalManager (HDFS-3077) Attachments: HDFS-3893.patch, HDFS-3893.patch Currently the QJM does not work when security is enabled. The quorum cannot be formatted, the NN and SBN cannot communicate with the JNs, and JNs cannot synchronize edit logs with each other. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3897) QJM: TestBlockToken fails after HDFS-3893
Aaron T. Myers created HDFS-3897: Summary: QJM: TestBlockToken fails after HDFS-3893 Key: HDFS-3897 URL: https://issues.apache.org/jira/browse/HDFS-3897 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Aaron T. Myers Assignee: Aaron T. Myers HDFS-3893 caused the NN to log in using its configured Kerberos credentials when formatting the NN. This caused TestBlockToken#testBlockTokenInLastLocatedBlock to begin failing, since the test enables Kerberos but doesn't configure the NN principal or keytab. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-3897) QJM: TestBlockToken fails after HDFS-3893
[ https://issues.apache.org/jira/browse/HDFS-3897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers resolved HDFS-3897. -- Resolution: Fixed Fix Version/s: QuorumJournalManager (HDFS-3077) Hadoop Flags: Reviewed Thanks a lot for the quick review, Todd. I've just committed this to the HDFS-3077 branch. QJM: TestBlockToken fails after HDFS-3893 - Key: HDFS-3897 URL: https://issues.apache.org/jira/browse/HDFS-3897 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Aaron T. Myers Assignee: Aaron T. Myers Fix For: QuorumJournalManager (HDFS-3077) Attachments: HDFS-3897.patch HDFS-3893 caused the NN to log in using its configured Kerberos credentials when formatting the NN. This caused TestBlockToken#testBlockTokenInLastLocatedBlock to begin failing, since the test enables Kerberos but doesn't configure the NN principal or keytab. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3893) QJM: Make QJM work with security enabled
Aaron T. Myers created HDFS-3893: Summary: QJM: Make QJM work with security enabled Key: HDFS-3893 URL: https://issues.apache.org/jira/browse/HDFS-3893 Project: Hadoop HDFS Issue Type: Sub-task Components: name-node, security Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Aaron T. Myers Assignee: Aaron T. Myers Currently the QJM does not work when security is enabled. The quorum cannot be formatted, the NN and SBN cannot communicate with the JNs, and JNs cannot synchronize edit logs with each other. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3878) Add an administrative command to download finalized edit log segments from the NN
Aaron T. Myers created HDFS-3878: Summary: Add an administrative command to download finalized edit log segments from the NN Key: HDFS-3878 URL: https://issues.apache.org/jira/browse/HDFS-3878 Project: Hadoop HDFS Issue Type: New Feature Components: hdfs client, name-node Affects Versions: 2.0.0-alpha Reporter: Aaron T. Myers Similarly to the `hdfs dfsadmin -fetchImage' command added in HDFS-2941, it would be nice to have an admin command capable of fetching edit log segments. This could be useful, for example, for use in a script designed to back up the NN on-disk metadata. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3864) NN does not update internal file mtime for OP_CLOSE when reading from the edit log
Aaron T. Myers created HDFS-3864: Summary: NN does not update internal file mtime for OP_CLOSE when reading from the edit log Key: HDFS-3864 URL: https://issues.apache.org/jira/browse/HDFS-3864 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 2.0.0-alpha Reporter: Aaron T. Myers Assignee: Aaron T. Myers When logging an OP_CLOSE to the edit log, the NN writes out an updated file mtime and atime. However, when reading in an OP_CLOSE from the edit log, the NN does not apply these values to the in-memory FS data structure. Because of this, a file's mtime or atime may appear to go back in time after an NN restart, or an HA failover. Most of the time this will be harmless and folks won't notice, but in the event one of these files is being used in the distributed cache of an MR job when an HA failover occurs, the job might notice that the mtime of a cache file has changed, which in MR2 will cause the job to fail with an exception like the following: {noformat} java.io.IOException: Resource hdfs://ha-nn-uri/user/jenkins/.staging/job_1341364439849_0513/libjars/snappy-java-1.0.3.2.jar changed on src filesystem (expected 1342137814599, was 1342137814473 at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:90) at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49) at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157) at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {noformat} Credit to Sujay Rau for discovering this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3835) Long-lived 2NN cannot perform a checkpoint if security is enabled and the NN restarts without outstanding delegation tokens
Aaron T. Myers created HDFS-3835: Summary: Long-lived 2NN cannot perform a checkpoint if security is enabled and the NN restarts without outstanding delegation tokens Key: HDFS-3835 URL: https://issues.apache.org/jira/browse/HDFS-3835 Project: Hadoop HDFS Issue Type: Bug Components: name-node, security Affects Versions: 2.0.0-alpha Reporter: Aaron T. Myers Assignee: Aaron T. Myers When the 2NN wants to perform a checkpoint, it figures out the highest transaction ID of the fsimage files on the NN, and if the 2NN has a copy of that fsimage file (because it created that merged fsimage file the last time it did a checkpoint) then the 2NN won't download the fsimage file from the NN, and instead only gets the new edits files from the NN. In this case, the 2NN also doesn't even bother reloading the fsimage file it has from disk, since it has all of the namespace state in-memory. This all works just fine. When the 2NN _doesn't_ have a copy of the relevant fsimage file (for example, if the NN had restarted since the last checkpoint) then the 2NN blows away its in-memory namespace state, downloads the fsimage file from the NN, and loads the newly-downloaded fsimage file from disk. The bug is that when the 2NN clears its in-memory state, it only resets the namespace, but not the delegation token map. The fix is pretty simple - just make the delegation token map get cleared as well as the namespace state when a running 2NN needs to load a new fsimage from disk. Credit to Stephen Chu for identifying this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3820) QJM: Add a few missing imports in TestEditLog and TestFSEditLogLoader
Aaron T. Myers created HDFS-3820: Summary: QJM: Add a few missing imports in TestEditLog and TestFSEditLogLoader Key: HDFS-3820 URL: https://issues.apache.org/jira/browse/HDFS-3820 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Aaron T. Myers Assignee: Aaron T. Myers Both of these test files are missing a few imports required to make the compile. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-3820) QJM: Add a few missing imports in TestEditLog and TestFSEditLogLoader
[ https://issues.apache.org/jira/browse/HDFS-3820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers resolved HDFS-3820. -- Resolution: Invalid Never mind. Was looking at the wrong branch. Sorry for the noise. QJM: Add a few missing imports in TestEditLog and TestFSEditLogLoader - Key: HDFS-3820 URL: https://issues.apache.org/jira/browse/HDFS-3820 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Aaron T. Myers Assignee: Aaron T. Myers Both of these test files are missing a few imports required to make the compile. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3823) QJM: TestQJMWithFaults fails occasionally because of missed setting of HTTP port
Aaron T. Myers created HDFS-3823: Summary: QJM: TestQJMWithFaults fails occasionally because of missed setting of HTTP port Key: HDFS-3823 URL: https://issues.apache.org/jira/browse/HDFS-3823 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Aaron T. Myers Assignee: Aaron T. Myers Priority: Minor TestQJMWithFaults#testRandomized will fail if the IPCLoggerChannel.httpPort instance variable isn't set before a call to IPCLoggerChannel#prepareRecovery is made, since this is necessary to build URLs to the returned logs. Credit to Todd Lipcon for discovering this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-3823) QJM: TestQJMWithFaults fails occasionally because of missed setting of HTTP port
[ https://issues.apache.org/jira/browse/HDFS-3823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers resolved HDFS-3823. -- Resolution: Fixed Fix Version/s: QuorumJournalManager (HDFS-3077) Hadoop Flags: Reviewed Thanks a lot for the quick review, Eli. I've just committed this to the branch. QJM: TestQJMWithFaults fails occasionally because of missed setting of HTTP port Key: HDFS-3823 URL: https://issues.apache.org/jira/browse/HDFS-3823 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Aaron T. Myers Assignee: Aaron T. Myers Priority: Minor Fix For: QuorumJournalManager (HDFS-3077) Attachments: HDFS-3823.patch TestQJMWithFaults#testRandomized will fail if the IPCLoggerChannel.httpPort instance variable isn't set before a call to IPCLoggerChannel#prepareRecovery is made, since this is necessary to build URLs to the returned logs. Credit to Todd Lipcon for discovering this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3826) QJM: Some trivial logging / exception text improvements
Aaron T. Myers created HDFS-3826: Summary: QJM: Some trivial logging / exception text improvements Key: HDFS-3826 URL: https://issues.apache.org/jira/browse/HDFS-3826 Project: Hadoop HDFS Issue Type: Improvement Components: name-node, test Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Aaron T. Myers Assignee: Aaron T. Myers Priority: Minor Some of the log text in QuorumException and QuorumJournalManager could stand to be improved. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-3826) QJM: Some trivial logging / exception text improvements
[ https://issues.apache.org/jira/browse/HDFS-3826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers resolved HDFS-3826. -- Resolution: Fixed Fix Version/s: QuorumJournalManager (HDFS-3077) Hadoop Flags: Reviewed Thanks a lot for the quick review, Eli. I've just committed this to the HDFS-3077 branch. QJM: Some trivial logging / exception text improvements --- Key: HDFS-3826 URL: https://issues.apache.org/jira/browse/HDFS-3826 Project: Hadoop HDFS Issue Type: Improvement Components: name-node, test Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Aaron T. Myers Assignee: Aaron T. Myers Priority: Minor Fix For: QuorumJournalManager (HDFS-3077) Attachments: HDFS-3826.patch Some of the log text in QuorumException and QuorumJournalManager could stand to be improved. Credit to Todd Lipcon for noticing this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (HDFS-2998) OfflineImageViewer and ImageVisitor should be annotated public
[ https://issues.apache.org/jira/browse/HDFS-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers reopened HDFS-2998: -- My apologies that this fell off my radar. I still believe that this is a useful interface for external developers, and thus that we should mark it public/unstable or public/evolving. I agree that doing so would require marking the referenced classes similarly public/(evolving|unstable). OfflineImageViewer and ImageVisitor should be annotated public -- Key: HDFS-2998 URL: https://issues.apache.org/jira/browse/HDFS-2998 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 0.23.1 Reporter: Aaron T. Myers The OfflineImageViewer is currently annotated as InterfaceAudience.Private. It's intended for subclassing, so it should be annotated as the public API that it is. The ImageVisitor class should similarly be annotated public (evolving is fine). Note that it should also be changed to be public; it's currently package-private, which means that users have to cheat with their subclass package name. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (HDFS-3719) Re-enable append-related tests in TestFileConcurrentReader
[ https://issues.apache.org/jira/browse/HDFS-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers reopened HDFS-3719: -- Re-enable append-related tests in TestFileConcurrentReader -- Key: HDFS-3719 URL: https://issues.apache.org/jira/browse/HDFS-3719 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.0.0-alpha Reporter: Andrew Wang Assignee: Andrew Wang Fix For: 2.2.0-alpha Attachments: hdfs-3719-1.patch Both of these tests are disabled. We should figure out what append functionality we need to make the tests work again, and reenable them. {code} // fails due to issue w/append, disable @Ignore @Test public void _testUnfinishedBlockCRCErrorTransferToAppend() throws IOException { runTestUnfinishedBlockCRCError(true, SyncType.APPEND, DEFAULT_WRITE_SIZE); } // fails due to issue w/append, disable @Ignore @Test public void _testUnfinishedBlockCRCErrorNormalTransferAppend() throws IOException { runTestUnfinishedBlockCRCError(false, SyncType.APPEND, DEFAULT_WRITE_SIZE); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3773) TestNNWithQJM fails after HDFS-3741
Aaron T. Myers created HDFS-3773: Summary: TestNNWithQJM fails after HDFS-3741 Key: HDFS-3773 URL: https://issues.apache.org/jira/browse/HDFS-3773 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Aaron T. Myers Assignee: Aaron T. Myers Looks like the change of visibility of one of the QuorumJournalManager constructors fouls up the instantiation via reflection. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3745) fsck prints that it's using KSSL even when it's in fact using SPNEGO for authentication
Aaron T. Myers created HDFS-3745: Summary: fsck prints that it's using KSSL even when it's in fact using SPNEGO for authentication Key: HDFS-3745 URL: https://issues.apache.org/jira/browse/HDFS-3745 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client, security Affects Versions: 2.0.0-alpha, 1.2.0 Reporter: Aaron T. Myers Priority: Trivial In branch-2 (which exclusively uses SPNEGO for HTTP authentication) and in branch-1 (which can optionally use SPNEGO for HTTP authentication), running fsck will print the following, which isn't quite right: {quote} FSCK started by hdfs (auth:KERBEROS_SSL) from... {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3738) TestDFSClientRetries#testFailuresArePerOperation sets incorrect timeout config
Aaron T. Myers created HDFS-3738: Summary: TestDFSClientRetries#testFailuresArePerOperation sets incorrect timeout config Key: HDFS-3738 URL: https://issues.apache.org/jira/browse/HDFS-3738 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.0.0-alpha Reporter: Aaron T. Myers Assignee: Aaron T. Myers Priority: Minor TestDFSClientRetries#testFailuresArePerOperation involves testing retries by making use of expected timeouts. However, this test sets the wrong config to lower the timeout, and thus takes far longer than it should. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3727) When using SPNEGO, NN should not try to log in using KSSL principal
Aaron T. Myers created HDFS-3727: Summary: When using SPNEGO, NN should not try to log in using KSSL principal Key: HDFS-3727 URL: https://issues.apache.org/jira/browse/HDFS-3727 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 1.2.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers When performing a checkpoint with security enabled, the NN will attempt to relogin from its keytab before making an HTTP request back to the 2NN to fetch the newly-merged image. However, it always attempts to log in using the KSSL principal, even if SPNEGO is configured to be used. This issue was discovered by Stephen Chu. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3698) TestHftpFileSystem is failing in branch-1 due to changed default secure port
Aaron T. Myers created HDFS-3698: Summary: TestHftpFileSystem is failing in branch-1 due to changed default secure port Key: HDFS-3698 URL: https://issues.apache.org/jira/browse/HDFS-3698 Project: Hadoop HDFS Issue Type: Bug Components: security Affects Versions: 1.2.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers This test is failing since the default secure port changed to the HTTP port upon the commit of HDFS-2617. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (HDFS-3654) TestJspHelper#testGetUgi fails with NPE
[ https://issues.apache.org/jira/browse/HDFS-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers reopened HDFS-3654: -- TestJspHelper#testGetUgi fails with NPE --- Key: HDFS-3654 URL: https://issues.apache.org/jira/browse/HDFS-3654 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.1.0-alpha Reporter: Eli Collins Assignee: Eli Collins Fix For: 1.2.0, 2.1.0-alpha Attachments: hdfs-3654-b1.txt, hdfs-3654.txt, hdfs-3654.txt Looks like my recent change in HDFS-3639 can occasionally cause this test to fail. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (HDFS-3639) JspHelper#getUGI should always verify the token if security is enabled
[ https://issues.apache.org/jira/browse/HDFS-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers reopened HDFS-3639: -- JspHelper#getUGI should always verify the token if security is enabled -- Key: HDFS-3639 URL: https://issues.apache.org/jira/browse/HDFS-3639 Project: Hadoop HDFS Issue Type: Bug Components: security Affects Versions: 1.0.0, 2.0.0-alpha Reporter: Eli Collins Assignee: Eli Collins Priority: Minor Fix For: 1.2.0, 2.1.0-alpha Attachments: hdfs-3639-b1.txt, hdfs-3639.txt JspHelper#getUGI on verifies the given token if the context and nn are set (added in HDFS-2416). We should unconditionally verifyToken the token, ie a bug where name.node is not set in the context object should not result in not verifying the token. In practice this shouldn't be an issue as per HDFS-3434 the context and NN should never be null. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-3654) TestJspHelper#testGetUgi fails with NPE
[ https://issues.apache.org/jira/browse/HDFS-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers resolved HDFS-3654. -- Resolution: Invalid Seems like this JIRA is now irrelevant since the change in HDFS-3639 has been reverted and will be done differently. TestJspHelper#testGetUgi fails with NPE --- Key: HDFS-3654 URL: https://issues.apache.org/jira/browse/HDFS-3654 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.1.0-alpha Reporter: Eli Collins Assignee: Eli Collins Fix For: 1.2.0, 2.1.0-alpha Attachments: hdfs-3654-b1.txt, hdfs-3654.txt, hdfs-3654.txt Looks like my recent change in HDFS-3639 can occasionally cause this test to fail. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3637) Add support for encrypting the DataTransferProtocol
Aaron T. Myers created HDFS-3637: Summary: Add support for encrypting the DataTransferProtocol Key: HDFS-3637 URL: https://issues.apache.org/jira/browse/HDFS-3637 Project: Hadoop HDFS Issue Type: New Feature Components: data-node, hdfs client, security Affects Versions: 2.0.0-alpha Reporter: Aaron T. Myers Assignee: Aaron T. Myers Currently all HDFS RPCs performed by NNs/DNs/clients can be optionally encrypted. However, actual data read or written between DNs and clients (or DNs to DNs) is sent in the clear. When processing sensitive data on a shared cluster, confidentiality of the data read/written from/to HDFS may be desired. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3501) Checkpointing with security + HA enabled will stop working after ticket lifetime expires
Aaron T. Myers created HDFS-3501: Summary: Checkpointing with security + HA enabled will stop working after ticket lifetime expires Key: HDFS-3501 URL: https://issues.apache.org/jira/browse/HDFS-3501 Project: Hadoop HDFS Issue Type: Bug Components: ha, name-node Affects Versions: 2.0.0-alpha Reporter: Aaron T. Myers Assignee: Aaron T. Myers Attachments: HDFS-3501.patch The StandbyCheckpointer currently does the right thing in renewing its krb5 creds before attempting to perform a checkpoint to the active NN, but the active NN makes no attempt to renew its own krb5 creds before connecting to the standby NN to fetch the new merged fsimage file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3484) hdfs fsck doesn't work if NN HTTP address is set to 0.0.0.0 even if NN RPC address is configured
Aaron T. Myers created HDFS-3484: Summary: hdfs fsck doesn't work if NN HTTP address is set to 0.0.0.0 even if NN RPC address is configured Key: HDFS-3484 URL: https://issues.apache.org/jira/browse/HDFS-3484 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 2.0.0-alpha Reporter: Aaron T. Myers Assignee: Aaron T. Myers Priority: Minor The default NN HTTP address is 0.0.0.0. Clients which need to connect to the HTTP address (e.g. fsck and fetchImage) need an address which is actually resolvable, however. If the configured NN HTTP address is set to 0.0.0.0, these clients should fall back on using the hostname configured for the RPC address, with the port configured for the HTTP address. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-3463) DFSTestUtil.waitCorruptReplicas() should not use file reading time as a timeout measure.
[ https://issues.apache.org/jira/browse/HDFS-3463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers resolved HDFS-3463. -- Resolution: Duplicate DFSTestUtil.waitCorruptReplicas() should not use file reading time as a timeout measure. Key: HDFS-3463 URL: https://issues.apache.org/jira/browse/HDFS-3463 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 3.0.0 Reporter: Konstantin Shvachko Attachments: testBlockCorruptionRecoveryPolicy1.log.htm Tests fail because DFSTestUtil.waitCorruptReplicas() does not wait long enough. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-3427) TestEditLogFileOutputStream#PREALLOCATION_LENGTH is dead code
[ https://issues.apache.org/jira/browse/HDFS-3427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers resolved HDFS-3427. -- Resolution: Duplicate Looks like this got fixed with the commit of HDFS-2982. TestEditLogFileOutputStream#PREALLOCATION_LENGTH is dead code - Key: HDFS-3427 URL: https://issues.apache.org/jira/browse/HDFS-3427 Project: Hadoop HDFS Issue Type: Improvement Components: test Affects Versions: 3.0.0 Reporter: Aaron T. Myers Priority: Trivial Labels: newbie The constant PREALLOCATION_LENGTH in TestEditLogFileOutputStream is no longer referenced anywhere. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3444) hdfs groups command doesn't work with security enabled
Aaron T. Myers created HDFS-3444: Summary: hdfs groups command doesn't work with security enabled Key: HDFS-3444 URL: https://issues.apache.org/jira/browse/HDFS-3444 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 2.0.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers When one tries to run `hdfs groups' with security enabled, you'll get an error like the following: {noformat} java.io.IOException: Failed to specify server's Kerberos principal name; {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3439) Balancer exits if fs.defaultFS is set to a different, but semantically identical, URI from dfs.namenode.rpc-address
Aaron T. Myers created HDFS-3439: Summary: Balancer exits if fs.defaultFS is set to a different, but semantically identical, URI from dfs.namenode.rpc-address Key: HDFS-3439 URL: https://issues.apache.org/jira/browse/HDFS-3439 Project: Hadoop HDFS Issue Type: Bug Components: balancer Affects Versions: 2.0.0 Reporter: Aaron T. Myers The balancer determines the set of NN URIs to balance by looking at fs.defaultFS and all possible dfs.namenode.(service)rpc-address settings. If fs.defaultFS is, for example, set to hdfs://foo.example.com:8020/ (note the trailing /) and the rpc-address is set to hdfs://foo.example.com:8020 (without a /), then the balancer will conclude that there are two NNs and try to balance both. However, since both of these URIs refer to the same actual FS instance, the balancer will exit with java.io.IOException: Another balancer is running. Exiting ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-3430) Start-all.sh Error
[ https://issues.apache.org/jira/browse/HDFS-3430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers resolved HDFS-3430. -- Resolution: Invalid Fix Version/s: (was: 1.0.1) Hi Hiten, you're attempting to use an invalid hostname for you NameNode. Hostnames are not permitted to contain the underscore (_) character. Start-all.sh Error --- Key: HDFS-3430 URL: https://issues.apache.org/jira/browse/HDFS-3430 Project: Hadoop HDFS Issue Type: Test Components: data-node, hdfs client, name-node Affects Versions: 1.0.1 Environment: Linux Reporter: Hiten Tathe Labels: hadoop Attachments: Screenshot.png Original Estimate: 5h Remaining Estimate: 5h Hi, m new to Hadoop and trying to run hadoop on standalone Linux machine but faing some error please help me the Error is as follow :- 2012-05-16 13:03:11,155 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG: / STARTUP_MSG: Starting NameNode STARTUP_MSG: host = sra_hadoop.com/192.168.1.62 STARTUP_MSG: args = [] STARTUP_MSG: version = 1.0.2-SNAPSHOT STARTUP_MSG: build = -r ; compiled by 'root' on Wed May 16 12:30:17 IST 2012 / 2012-05-16 13:03:11,363 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties 2012-05-16 13:03:11,379 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered. 2012-05-16 13:03:11,380 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). 2012-05-16 13:03:11,381 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system started 2012-05-16 13:03:11,390 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.lang.IllegalArgumentException: Does not contain a valid host:port authority: hdfs://sra_hadoop:9000 at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:162) at org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:198) at org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:228) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:262) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:496) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1279) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1288) 2012-05-16 13:03:11,391 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down NameNode at sra_hadoop.com/192.168.1.62 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3433) GetImageServlet should allow administrative requestors when security is enabled
Aaron T. Myers created HDFS-3433: Summary: GetImageServlet should allow administrative requestors when security is enabled Key: HDFS-3433 URL: https://issues.apache.org/jira/browse/HDFS-3433 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 2.0.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers Currently the GetImageServlet only allows the NN and checkpointing nodes to connect. Since we now have the fetchImage command in DFSAdmin, we should also allow administrative requests as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3400) DNs should be able start with jsvc even if security is disabled
Aaron T. Myers created HDFS-3400: Summary: DNs should be able start with jsvc even if security is disabled Key: HDFS-3400 URL: https://issues.apache.org/jira/browse/HDFS-3400 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, scripts Affects Versions: 2.0.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers Currently if one tries to start a DN with security disabled (via hadoop.security.authentication = simple in the configs), but JSVC is correctly configured, the DN will refuse to start. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3404) Make putImage in GetImageServlet infer remote address to fetch from
Aaron T. Myers created HDFS-3404: Summary: Make putImage in GetImageServlet infer remote address to fetch from Key: HDFS-3404 URL: https://issues.apache.org/jira/browse/HDFS-3404 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers As it stands, daemons which perform checkpointing must determine their own address on which they can be reached, so that the NN which they checkpoint against knows what address to fetch a merged fsimage from. This causes problems if, for example, the daemon performing checkpointing binds to 0.0.0.0, and thus can't be sure of what address the NN can reach it at. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3405) Checkpointing should use HTTP POST or PUT instead of GET-GET to send merged fsimages
Aaron T. Myers created HDFS-3405: Summary: Checkpointing should use HTTP POST or PUT instead of GET-GET to send merged fsimages Key: HDFS-3405 URL: https://issues.apache.org/jira/browse/HDFS-3405 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 1.0.0 Reporter: Aaron T. Myers As Todd points out in [this comment|https://issues.apache.org/jira/browse/HDFS-3404?focusedCommentId=13272986page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13272986], the current scheme for a checkpointing daemon to upload a merged fsimage file to an NN is to issue an HTTP get request to tell the target NN to issue another GET request back to the checkpointing daemon to retrieve the merged fsimage file. There's no fundamental reason the checkpointing daemon can't just use an HTTP POST or PUT to send back the merged fsimage file, rather than the double-GET scheme. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3395) NN doesn't start with HA+security enabled and HTTP address set to 0.0.0.0
Aaron T. Myers created HDFS-3395: Summary: NN doesn't start with HA+security enabled and HTTP address set to 0.0.0.0 Key: HDFS-3395 URL: https://issues.apache.org/jira/browse/HDFS-3395 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 2.0.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers DFSUtil#substituteForWildcardAddress subs in a default hostname if the given hostname is 0.0.0.0. However, this function throws an exception if the given hostname is set to 0.0.0.0 and security is enabled, regardless of whether the default hostname is also 0.0.0.0. This function shouldn't throw an exception unless both addresses are set to 0.0.0.0. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3390) DFSAdmin should print full stack traces of errors when DEBUG logging is enabled
Aaron T. Myers created HDFS-3390: Summary: DFSAdmin should print full stack traces of errors when DEBUG logging is enabled Key: HDFS-3390 URL: https://issues.apache.org/jira/browse/HDFS-3390 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs client Affects Versions: 2.0.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers Priority: Minor If an error is encountered when running an `hdfs dfsadmin ...' command, only the exception's message is output. It would be handy for debugging if the full stack trace of the exception were output when DEBUG logging is enabled. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-3345) Primary and secondary NameNode principals must be the same
[ https://issues.apache.org/jira/browse/HDFS-3345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers resolved HDFS-3345. -- Resolution: Duplicate Closing this as a dupe. Please feel free to reopen it if you disagree, or if I've misinterpreted your report of the issue. Primary and secondary NameNode principals must be the same -- Key: HDFS-3345 URL: https://issues.apache.org/jira/browse/HDFS-3345 Project: Hadoop HDFS Issue Type: Bug Reporter: Owen O'Malley The NameNode and SecondaryNameNode have two different configuration knobs (dfs.namenode.kerberos.principal and dfs.secondary.namenode.kerberos.principal), but the secondary namenode fails authorization unless it is the same user. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3349) DFSAdmin fetchImage command should initialize security credentials
Aaron T. Myers created HDFS-3349: Summary: DFSAdmin fetchImage command should initialize security credentials Key: HDFS-3349 URL: https://issues.apache.org/jira/browse/HDFS-3349 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 2.0.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers The `hdfs dfsadmin -fetchImage' command should fetch the fsimage using the appropriate credentials if security is enabled. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3351) NameNode#initializeGenericKeys should always set fs.defaultFS regardless of whether HA or Federation is enabled
Aaron T. Myers created HDFS-3351: Summary: NameNode#initializeGenericKeys should always set fs.defaultFS regardless of whether HA or Federation is enabled Key: HDFS-3351 URL: https://issues.apache.org/jira/browse/HDFS-3351 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 2.0.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers NameNode#initializeGenericKeys exits early if neither a nameservice nor NN ID is passed. However, this method also serves to set fs.defaultFS in the configuration object stored by the NN to the NN RPC address after generic keys have been configured. This should be done in all cases. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-2999) DN metrics should include per-disk utilization
[ https://issues.apache.org/jira/browse/HDFS-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers resolved HDFS-2999. -- Resolution: Won't Fix Target Version/s: 2.0.0, 3.0.0 (was: 3.0.0, 2.0.0) Operators can monitor this using more direct means. DN metrics should include per-disk utilization -- Key: HDFS-2999 URL: https://issues.apache.org/jira/browse/HDFS-2999 Project: Hadoop HDFS Issue Type: Improvement Components: data-node Affects Versions: 0.23.1 Reporter: Aaron T. Myers We should have per-dfs.data.dir metrics in the DN's metrics report. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2350) Secure DN doesn't print output to console when started interactively
Secure DN doesn't print output to console when started interactively Key: HDFS-2350 URL: https://issues.apache.org/jira/browse/HDFS-2350 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 0.24.0 Reporter: Aaron T. Myers Fix For: 0.24.0 If one starts a secure DN (using jsvc) interactively, the output is not printed to the console, but instead ends up in {{$HADOOP_LOG_DIR/jsvc.err}} and {{$HADOOP_LOG_DIR/jsvc.out}}. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira