[jira] [Resolved] (HDFS-3660) TestDatanodeBlockScanner#testBlockCorruptionRecoveryPolicy2 times out

2015-07-20 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers resolved HDFS-3660.
--
  Resolution: Cannot Reproduce
Target Version/s:   (was: )

This is an ancient/stale flaky test JIRA. Resolving.

 TestDatanodeBlockScanner#testBlockCorruptionRecoveryPolicy2 times out   
 

 Key: HDFS-3660
 URL: https://issues.apache.org/jira/browse/HDFS-3660
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
Priority: Minor

 Saw this on a recent jenkins run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-4001) TestSafeMode#testInitializeReplQueuesEarly may time out

2015-07-20 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers resolved HDFS-4001.
--
Resolution: Fixed

Haven't seen this fail in a very long time. Closing this out. Feel free to 
reopen if you disagree.

 TestSafeMode#testInitializeReplQueuesEarly may time out
 ---

 Key: HDFS-4001
 URL: https://issues.apache.org/jira/browse/HDFS-4001
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
 Attachments: timeout.txt.gz


 Saw this failure on a recent branch-2 jenkins run, has also been seen on 
 trunk.
 {noformat}
 java.util.concurrent.TimeoutException: Timed out waiting for condition
   at 
 org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:107)
   at 
 org.apache.hadoop.hdfs.TestSafeMode.testInitializeReplQueuesEarly(TestSafeMode.java:191)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-3811) TestPersistBlocks#TestRestartDfsWithFlush appears to be flaky

2015-07-20 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers resolved HDFS-3811.
--
Resolution: Cannot Reproduce

I don't think I've seen this fail in a very long time. Going to resolve this. 
Please reopen if you disagree.

 TestPersistBlocks#TestRestartDfsWithFlush appears to be flaky
 -

 Key: HDFS-3811
 URL: https://issues.apache.org/jira/browse/HDFS-3811
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.2-alpha
Reporter: Andrew Wang
Assignee: Todd Lipcon
 Attachments: stacktrace, testfail-editlog.log, testfail.log, 
 testpersistblocks.txt


 This test failed on a recent Jenkins build, but passes for me locally. Seems 
 flaky.
 See:
 https://builds.apache.org/job/PreCommit-HDFS-Build/3021//testReport/org.apache.hadoop.hdfs/TestPersistBlocks/TestRestartDfsWithFlush/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-3532) TestDatanodeBlockScanner.testBlockCorruptionRecoveryPolicy1 times out

2015-07-20 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers resolved HDFS-3532.
--
Resolution: Cannot Reproduce

This is an ancient/stale flaky test JIRA. Resolving.

 TestDatanodeBlockScanner.testBlockCorruptionRecoveryPolicy1 times out
 -

 Key: HDFS-3532
 URL: https://issues.apache.org/jira/browse/HDFS-3532
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins

 I've seen this test time out on recent trunk jenkins test patch runs even 
 though HDFS-3266 was put in a couple weeks ago.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-2433) TestFileAppend4 fails intermittently

2015-07-20 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers resolved HDFS-2433.
--
Resolution: Cannot Reproduce

I don't think I've seen this fail in a long, long time. Going to close this 
out. Please reopen if you disagree.

 TestFileAppend4 fails intermittently
 

 Key: HDFS-2433
 URL: https://issues.apache.org/jira/browse/HDFS-2433
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, namenode, test
Affects Versions: 0.20.205.0, 1.0.0
Reporter: Robert Joseph Evans
Priority: Critical
 Attachments: failed.tar.bz2


 A Jenkins build we have running failed twice in a row with issues form 
 TestFileAppend4.testAppendSyncReplication1 in an attempt to reproduce the 
 error I ran TestFileAppend4 in a loop over night saving the results away.  
 (No clean was done in between test runs)
 When TestFileAppend4 is run in a loop the testAppendSyncReplication[012] 
 tests fail about 10% of the time (14 times out of 130 tries)  They all fail 
 with something like the following.  Often it is only one of the tests that 
 fail, but I have seen as many as two fail in one run.
 {noformat}
 Testcase: testAppendSyncReplication2 took 32.198 sec
 FAILED
 Should have 2 replicas for that block, not 1
 junit.framework.AssertionFailedError: Should have 2 replicas for that block, 
 not 1
 at 
 org.apache.hadoop.hdfs.TestFileAppend4.replicationTest(TestFileAppend4.java:477)
 at 
 org.apache.hadoop.hdfs.TestFileAppend4.testAppendSyncReplication2(TestFileAppend4.java:425)
 {noformat}
 I also saw several other tests that are a part of TestFileApped4 fail during 
 this experiment.  They may all be related to one another so I am filing them 
 in the same JIRA.  If it turns out that they are not related then they can be 
 split up later.
 testAppendSyncBlockPlusBbw failed 6 out of the 130 times or about 5% of the 
 time
 {noformat}
 Testcase: testAppendSyncBlockPlusBbw took 1.633 sec
 FAILED
 unexpected file size! received=0 , expected=1024
 junit.framework.AssertionFailedError: unexpected file size! received=0 , 
 expected=1024
 at 
 org.apache.hadoop.hdfs.TestFileAppend4.assertFileSize(TestFileAppend4.java:136)
 at 
 org.apache.hadoop.hdfs.TestFileAppend4.testAppendSyncBlockPlusBbw(TestFileAppend4.java:401)
 {noformat}
 testAppendSyncChecksum[012] failed 2 out of the 130 times or about 1.5% of 
 the time
 {noformat}
 Testcase: testAppendSyncChecksum1 took 32.385 sec
 FAILED
 Should have 1 replica for that block, not 2
 junit.framework.AssertionFailedError: Should have 1 replica for that block, 
 not 2
 at 
 org.apache.hadoop.hdfs.TestFileAppend4.checksumTest(TestFileAppend4.java:556)
 at 
 org.apache.hadoop.hdfs.TestFileAppend4.testAppendSyncChecksum1(TestFileAppend4.java:500)
 {noformat}
 I will attach logs for all of the failures.  Be aware that I did change some 
 of the logging messages in this test so I could better see when 
 testAppendSyncReplication started and ended.  Other then that the code is 
 stock 0.20.205 RC2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8194) Add administrative tool to be able to examine the NN's view of DN storages

2015-04-20 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-8194:


 Summary: Add administrative tool to be able to examine the NN's 
view of DN storages
 Key: HDFS-8194
 URL: https://issues.apache.org/jira/browse/HDFS-8194
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: namenode
Affects Versions: 2.7.0
Reporter: Aaron T. Myers
Assignee: Colin Patrick McCabe


The NN has long had facilities to be able to list all of the DNs that are 
registered with it. It would be great if there were an administrative tool be 
able to list all of the individual storages that the NN is tracking.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-7421) Move processing of postponed over-replicated blocks to a background task

2015-01-23 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers resolved HDFS-7421.
--
Resolution: Duplicate

 Move processing of postponed over-replicated blocks to a background task
 

 Key: HDFS-7421
 URL: https://issues.apache.org/jira/browse/HDFS-7421
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha, namenode
Affects Versions: 2.6.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers

 In an HA environment, we postpone sending block invalidates to DNs until all 
 DNs holding a given block have done at least one block report to the NN after 
 it became active. When that first block report after becoming active does 
 occur, we attempt to reprocess all postponed misreplicated blocks inline with 
 the block report RPC. In the case where there are many postponed 
 misreplicated blocks, this can cause block report RPCs to take an 
 inordinately long time to complete, sometimes on the order of minutes, which 
 has the potential to tie up RPC handlers, block incoming RPCs, etc. There's 
 no need to hurriedly process all postponed misreplicated blocks so that we 
 can quickly send invalidate commands back to DNs, so let's move this 
 processing outside of the RPC handler context and into a background thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7421) Move processing of postponed over-replicated blocks to a background task

2014-11-21 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-7421:


 Summary: Move processing of postponed over-replicated blocks to a 
background task
 Key: HDFS-7421
 URL: https://issues.apache.org/jira/browse/HDFS-7421
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha, namenode
Affects Versions: 2.6.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers


In an HA environment, we postpone sending block invalidates to DNs until all 
DNs holding a given block have done at least one block report to the NN after 
it became active. When that first block report after becoming active does 
occur, we attempt to reprocess all postponed misreplicated blocks inline with 
the block report RPC. In the case where there are many postponed misreplicated 
blocks, this can cause block report RPCs to take an inordinately long time to 
complete, sometimes on the order of minutes, which has the potential to tie up 
RPC handlers, block incoming RPCs, etc. There's no need to hurriedly process 
all postponed misreplicated blocks so that we can quickly send invalidate 
commands back to DNs, so let's move this processing outside of the RPC handler 
context and into a background thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-6647) Edit log corruption when pipeline recovery occurs for deleted file present in snapshot

2014-07-09 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-6647:


 Summary: Edit log corruption when pipeline recovery occurs for 
deleted file present in snapshot
 Key: HDFS-6647
 URL: https://issues.apache.org/jira/browse/HDFS-6647
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode, snapshots
Affects Versions: 2.4.1
Reporter: Aaron T. Myers


I've encountered a situation wherein an OP_UPDATE_BLOCKS can appear in the edit 
log for a file after an OP_DELETE has previously been logged for that file. 
Such an edit log sequence cannot then be successfully read by the NameNode.

More details in the first comment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6563) NameNode cannot save fsimage in certain circumstances when snapshots are in use

2014-06-18 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-6563:


 Summary: NameNode cannot save fsimage in certain circumstances 
when snapshots are in use
 Key: HDFS-6563
 URL: https://issues.apache.org/jira/browse/HDFS-6563
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode, snapshots
Affects Versions: 2.4.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
Priority: Critical


Checkpoints will start to fail and the NameNode will not be able to manually 
saveNamespace if the following set of steps occurs:

# A zero-length file appears in a snapshot
# That file is later lengthened to include at least one block
# That file is subsequently deleted from the present file system but remains in 
the snapshot

More details in the first comment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6463) Incorrect permission can be created after setting ACLs

2014-05-28 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-6463:


 Summary: Incorrect permission can be created after setting ACLs
 Key: HDFS-6463
 URL: https://issues.apache.org/jira/browse/HDFS-6463
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.4.0
Reporter: Aaron T. Myers


When setting ACLs for a file or directory, it's possible for the resulting 
FsPermission object's group entry to be set incorrectly, in particular it will 
be set to the mask entry. More details in the first comment of this JIRA.

Thanks to Szehon Ho for identifying this issue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6435) Add support for specifying a static uid/gid mapping for the NFS gateway

2014-05-20 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-6435:


 Summary: Add support for specifying a static uid/gid mapping for 
the NFS gateway
 Key: HDFS-6435
 URL: https://issues.apache.org/jira/browse/HDFS-6435
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: nfs
Affects Versions: 2.4.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers


It's quite reasonable that folks will want to access the HDFS NFS Gateway from 
client machines where the UIDs/GIDs do not line up with those on the NFS 
Gateway itself. We should provide a way to map these UIDs/GIDs between the 
systems.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6289) HA failover can fail if there are pending DN messages for DNs which no longer exist

2014-04-25 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-6289:


 Summary: HA failover can fail if there are pending DN messages for 
DNs which no longer exist
 Key: HDFS-6289
 URL: https://issues.apache.org/jira/browse/HDFS-6289
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha
Affects Versions: 2.4.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
Priority: Critical


In an HA setup, the standby NN may receive messages from DNs for blocks which 
the standby NN is not yet aware of. It queues up these messages and replays 
them when it next reads from the edit log or fails over. On a failover, all of 
these pending DN messages must be processed successfully in order for the 
failover to succeed. If one of these pending DN messages refers to a DN 
storageId that no longer exists (because the DN with that transfer address has 
been reformatted and has re-registered with the same transfer address) then on 
transition to active the NN will not be able to process this DN message and 
will suicide with an error like the following:

{noformat}
2014-04-25 14:23:17,922 FATAL namenode.NameNode 
(NameNode.java:doImmediateShutdown(1525)) - Error encountered requiring NN 
shutdown. Shutting down immediately.
java.io.IOException: Cannot mark blk_1073741825_900(stored=blk_1073741825_1001) 
as corrupt because datanode 127.0.0.1:33324 does not exist
{noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HDFS-6280) Provide option to

2014-04-24 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers resolved HDFS-6280.
--

Resolution: Invalid

Accidentally hit create too soon. :)

 Provide option to 
 --

 Key: HDFS-6280
 URL: https://issues.apache.org/jira/browse/HDFS-6280
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Aaron T. Myers





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6280) Provide option to

2014-04-24 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-6280:


 Summary: Provide option to 
 Key: HDFS-6280
 URL: https://issues.apache.org/jira/browse/HDFS-6280
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Aaron T. Myers






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6281) Provide option to use the NFS Gateway without having to use the Hadoop portmapper

2014-04-24 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-6281:


 Summary: Provide option to use the NFS Gateway without having to 
use the Hadoop portmapper
 Key: HDFS-6281
 URL: https://issues.apache.org/jira/browse/HDFS-6281
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: nfs
Affects Versions: 2.4.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers


In order to use the NFS Gateway on operating systems with the rpcbind 
privileged registration bug, we currently require users to shut down and 
discontinue use of the system-provided portmap daemon, and instead use the 
portmap daemon provided by Hadoop. Alternately, we can work around this bug if 
we tweak the NFS Gateway to perform its port registration from a privileged 
port, and still let users use the system portmap daemon.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6112) NFS Gateway docs are incorrect for allowed hosts configuration

2014-03-17 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-6112:


 Summary: NFS Gateway docs are incorrect for allowed hosts 
configuration
 Key: HDFS-6112
 URL: https://issues.apache.org/jira/browse/HDFS-6112
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Affects Versions: 2.4.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
Priority: Minor


The NFS gateway export configuration docs say that the machine name 
configuration can be wildcards and provides the example 
{{host*.example.com}}. The term wildcard and this example might imply 
typical globbing semantics, but in fact what it actually supports is Java 
regular expressions. I think we should change the docs to make this clearer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HDFS-6048) DFSClient fails if native library doesn't exist

2014-03-03 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers resolved HDFS-6048.
--

Resolution: Duplicate

Hi Akira, I think this will be addressed by HDFS-6040, which should be 
committed shortly.

 DFSClient fails if native library doesn't exist
 ---

 Key: HDFS-6048
 URL: https://issues.apache.org/jira/browse/HDFS-6048
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.4.0
Reporter: Akira AJISAKA
Priority: Blocker

 When I executed FSShell commands (such as hdfs dfs -ls, -mkdir, -cat) in 
 trunk, {{UnsupportedOperationException}} occurred in 
 {{o.a.h.net.unix.DomainSocketWatcher}} and the commands failed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6033) PBImageXmlWriter incorrectly handles processing cache directives

2014-02-27 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-6033:


 Summary: PBImageXmlWriter incorrectly handles processing cache 
directives
 Key: HDFS-6033
 URL: https://issues.apache.org/jira/browse/HDFS-6033
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: caching
Affects Versions: 2.4.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers


When attempting to process cache directives in 
{{PBImageXmlWriter#dumpCacheManagerSection}}, we incorrectly loop the number of 
cache _pools_, not directives.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5921) Cannot browse file system via NN web UI if any directory has the sticky bit set

2014-02-10 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-5921:


 Summary: Cannot browse file system via NN web UI if any directory 
has the sticky bit set
 Key: HDFS-5921
 URL: https://issues.apache.org/jira/browse/HDFS-5921
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.3.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
Priority: Critical


You'll see an error like this in the JS console if any directory has the sticky 
bit set:

{noformat}
'helper_to_permission': function(chunk, ctx, bodies, params) {

var exec = ((parms.perm % 10)  1) == 1;
Uncaught ReferenceError: parms is not defined
{noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5922) DN heartbeat thread can get stuck in tight loop

2014-02-10 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-5922:


 Summary: DN heartbeat thread can get stuck in tight loop
 Key: HDFS-5922
 URL: https://issues.apache.org/jira/browse/HDFS-5922
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.3.0
Reporter: Aaron T. Myers


We saw an issue recently on a test cluster where one of the DN threads was 
consuming 100% of a single CPU. Running jstack indicated that it was the DN 
heartbeat thread. I believe I've tracked down the cause to a bug in the 
accounting around the value of {{pendingReceivedRequests}}.

More details in the first comment.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5517) Lower the default maximum number of blocks per file

2013-11-14 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-5517:


 Summary: Lower the default maximum number of blocks per file
 Key: HDFS-5517
 URL: https://issues.apache.org/jira/browse/HDFS-5517
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.2.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers


We introduced the maximum number of blocks per file in HDFS-4305, but we set 
the default to 1MM. In practice this limit is so high as to never be hit, 
whereas we know that an individual file with 10s of thousands of blocks can 
cause problems. We should lower the default value, in my opinion to 10k.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HDFS-5433) When reloading fsimage during checkpointing, we should clear existing snapshottable directories

2013-10-26 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-5433:


 Summary: When reloading fsimage during checkpointing, we should 
clear existing snapshottable directories
 Key: HDFS-5433
 URL: https://issues.apache.org/jira/browse/HDFS-5433
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 2.2.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
Priority: Critical


The complete set of snapshottable directories are referenced both via the file 
system tree and in the SnapshotManager class. It's possible that when the 2NN 
performs a checkpoint, it will reload its in-memory state based on a new 
fsimage from the NN, but will not clear the set of snapshottable directories 
referenced by the SnapshotManager. In this case, the 2NN will write out an 
fsimage that cannot be loaded, since the integer written to the fsimage 
indicating the number of snapshottable directories will be out of sync with the 
actual number of snapshottable directories serialized to the fsimage.

This is basically the same as HDFS-3835, but for snapshottable directories 
instead of delegation tokens.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HDFS-5403) WebHdfs client cannot communicate with older WebHdfs servers post HDFS-5306

2013-10-22 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-5403:


 Summary: WebHdfs client cannot communicate with older WebHdfs 
servers post HDFS-5306
 Key: HDFS-5403
 URL: https://issues.apache.org/jira/browse/HDFS-5403
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Affects Versions: 2.2.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers


HDFS-5306 introduced the field infoSecurePort to the DatanodeIDProto PB 
definition and made it optional for compatibility purposes. However, we don't 
correctly the handle the case when this field is not present when deserializing 
the response from a WebHdfs request. This results in an NPE at the client when 
this occurs.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HDFS-5289) Race condition in TestRetryCacheWithHA#testCreateSymlink causes spurious test failure

2013-10-02 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-5289:


 Summary: Race condition in TestRetryCacheWithHA#testCreateSymlink 
causes spurious test failure
 Key: HDFS-5289
 URL: https://issues.apache.org/jira/browse/HDFS-5289
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.1.1-beta
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers


The code to check if the operation has been completed on the active NN can 
potentially execute before the thread actually doing the operation has run. In 
this case the checking code will retry the check if the result of the check is 
null. However, the test operation does not in fact return null, instead 
throwing an exception if the file doesn't exist yet. We need to catch the 
exception and retry.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Resolved] (HDFS-3133) Add support for DFS upgrade with HA enabled

2013-10-02 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers resolved HDFS-3133.
--

Resolution: Duplicate

Resolving as a duplicate of HDFS-5138, which has a lot more discussion about 
how best to do this.

 Add support for DFS upgrade with HA enabled
 ---

 Key: HDFS-3133
 URL: https://issues.apache.org/jira/browse/HDFS-3133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha, namenode
Affects Versions: 2.0.0-alpha
Reporter: Aaron T. Myers

 For the first implementation of HA NameNode, we punted on allowing DFS 
 upgrade with HA enabled, which makes doing a DFS upgrade on an HA-enabled 
 cluster quite cumbersome and error-prone. We should add better support for 
 this.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Resolved] (HDFS-3958) Integrate upgrade/finalize/rollback with external journals

2013-10-02 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers resolved HDFS-3958.
--

Resolution: Duplicate

Resolving as a duplicate of HDFS-5138, which has a lot more discussion about 
how best to do this.

 Integrate upgrade/finalize/rollback with external journals
 --

 Key: HDFS-3958
 URL: https://issues.apache.org/jira/browse/HDFS-3958
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 3.0.0
Reporter: Todd Lipcon

 Currently the NameNode upgrade/rollback/finalize framework only supports 
 local storage. With edits being stored in pluggable Journals, this could 
 create certain difficulties - in particular, rollback wouldn't actually 
 rollback the external storage to the old state.
 We should look at how to expose the right hooks to the external journal 
 storage to snapshot/rollback/finalize.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HDFS-5223) Allow edit log/fsimage format changes without changing layout version

2013-09-18 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-5223:


 Summary: Allow edit log/fsimage format changes without changing 
layout version
 Key: HDFS-5223
 URL: https://issues.apache.org/jira/browse/HDFS-5223
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.1.1-beta
Reporter: Aaron T. Myers


Currently all HDFS on-disk formats are version by the single layout version. 
This means that even for changes which might be backward compatible, like the 
addition of a new edit log op code, we must go through the full `namenode 
-upgrade' process which requires coordination with DNs, etc. HDFS should 
support a lighter weight alternative.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-4299) WebHDFS Should Support HA Configuration

2013-09-11 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers resolved HDFS-4299.
--

Resolution: Duplicate
  Assignee: (was: Haohui Mai)

Thanks, Daisuke. Closing this one out.

 WebHDFS Should Support HA Configuration
 ---

 Key: HDFS-4299
 URL: https://issues.apache.org/jira/browse/HDFS-4299
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Daisuke Kobayashi

 WebHDFS clients connect directly to NameNodes rather than use a Hadoop 
 client, so there is no failover capability.  Though a workaround is available 
 to use HttpFS with an HA client, WebHDFS also should support HA configuration.
 Please see also: https://issues.cloudera.org/browse/DISTRO-403

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-5159) Secondary NameNode fails to checkpoint if error occurs downloading edits on first checkpoint

2013-09-03 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-5159:


 Summary: Secondary NameNode fails to checkpoint if error occurs 
downloading edits on first checkpoint
 Key: HDFS-5159
 URL: https://issues.apache.org/jira/browse/HDFS-5159
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.1.0-beta
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers


The 2NN will avoid downloading/loading a new fsimage if its local copy of 
fsimage is the same as the version on the NN. However, the decision to *load* 
the fsimage from disk into memory is based only on the on-disk fsimage version. 
If an error occurs between downloading and loading the fsimage on the first 
checkpoint attempt, the 2NN will never load the fsimage, and then on subsequent 
checkpoint attempts it will not load the on-disk fsimage and thus will never 
checkpoint successfully.

Example error message in the first comment of this ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-5102) Snapshot names should not be allowed to contain slash characters

2013-08-15 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-5102:


 Summary: Snapshot names should not be allowed to contain slash 
characters
 Key: HDFS-5102
 URL: https://issues.apache.org/jira/browse/HDFS-5102
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 2.1.0-beta
Reporter: Aaron T. Myers


Snapshots of a snapshottable directory are allowed to have arbitrary names. 
Presently, if you create a snapshot with a snapshot name that begins with a / 
character, this will be allowed, but later attempts to access this snapshot 
will fail because of the way the {{Path}} class deals with consecutive / 
characters. I suggest we disallow / from appearing in snapshot names.

An example of this is in the first comment on this JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-5060) NN should proactively perform a saveNamespace if it has a huge number of outstanding uncheckpointed transactions

2013-08-02 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-5060:


 Summary: NN should proactively perform a saveNamespace if it has a 
huge number of outstanding uncheckpointed transactions
 Key: HDFS-5060
 URL: https://issues.apache.org/jira/browse/HDFS-5060
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.1.0-beta
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers


In a properly-functioning HDFS system, checkpoints will be triggered either by 
the secondary NN or standby NN regularly, by default every hour or 1MM 
outstanding edits transactions, whichever come first. However, in cases where 
this second node is down for an extended period of time, the number of 
outstanding transactions can grow so large as to cause a restart to take an 
inordinately long time.

This JIRA proposes to make the active NN monitor its number of outstanding 
transactions and perform a proactive local saveNamespace if it grows beyond a 
configurable threshold. I'm envisioning something like 10x the configured 
number of transactions which in a properly-functioning cluster would result in 
a checkpoint from the second NN. Though this would be disruptive to clients 
while it's taking place, likely for a few minutes, this seems better than the 
alternative of a subsequent multi-hour restart and should never actually occur 
in a properly-functioning cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-5064) Standby checkpoints should not block concurrent readers

2013-08-02 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-5064:


 Summary: Standby checkpoints should not block concurrent readers
 Key: HDFS-5064
 URL: https://issues.apache.org/jira/browse/HDFS-5064
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, namenode
Affects Versions: 2.1.1-beta
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers


We've observed an issue which causes fetches of the {{/jmx}} page of the NN to 
take a long time to load when the standby is in the process of creating a 
checkpoint.

Even though both creating the checkpoint and gathering the statistics for 
{{/jmx}} take only the FSNS read lock, the issue is that since the FSNS uses a 
_fair_ RW lock, a single writer attempting to get the lock will block all 
threads attempting to get only the read lock for the duration of the 
checkpoint. This will cause {{/jmx}}, and really any thread only attempting to 
get the read lock, to block for the duration of the checkpoint, even though 
they should be able to proceed concurrently with the checkpointing thread.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-5027) On startup, DN should scan volumes in parallel

2013-07-23 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-5027:


 Summary: On startup, DN should scan volumes in parallel
 Key: HDFS-5027
 URL: https://issues.apache.org/jira/browse/HDFS-5027
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.0.4-alpha
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers


On startup the DN must scan all replicas on all configured volumes before the 
initial block report to the NN. This is currently done serially, but can be 
done in parallel to improve startup time of the DN.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4906) HDFS Output streams should not accept writes after being closed

2013-06-14 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-4906:


 Summary: HDFS Output streams should not accept writes after being 
closed
 Key: HDFS-4906
 URL: https://issues.apache.org/jira/browse/HDFS-4906
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.0.5-alpha
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers


Currently if one closes an OutputStream obtained from FileSystem#create and 
then calls write(...) on that closed stream, the write will appear to succeed 
without error though no data will be written to HDFS. A subsequent call to 
close will also silently appear to succeed. We should make it so that attempts 
to write to closed streams fails fast.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4830) Typo in config settings for AvailableSpaceVolumeChoosingPolicy in hdfs-default.xml

2013-05-16 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-4830:


 Summary: Typo in config settings for 
AvailableSpaceVolumeChoosingPolicy in hdfs-default.xml
 Key: HDFS-4830
 URL: https://issues.apache.org/jira/browse/HDFS-4830
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.5-beta
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
Priority: Minor


In hdfs-default.xml we have these two settings:

{noformat}
dfs.datanode.fsdataset.volume.choosing.balanced-space-threshold
dfs.datanode.fsdataset.volume.choosing.balanced-space-preference-percent
{noformat}

But in fact they should be these, from DFSConfigKeys.java:

{noformat}
dfs.datanode.available-space-volume-choosing-policy.balanced-space-threshold
dfs.datanode.available-space-volume-choosing-policy.balanced-space-preference-percent
{noformat}

This won't actually affect any functionality, since default values are used in 
the code anyway, but makes the documentation generated from hdfs-default.xml 
inaccurate.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-4352) Encapsulate arguments to BlockReaderFactory in a class

2013-05-10 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers resolved HDFS-4352.
--

Resolution: Won't Fix

Seems some folks don't think this is the best idea.

 Encapsulate arguments to BlockReaderFactory in a class
 --

 Key: HDFS-4352
 URL: https://issues.apache.org/jira/browse/HDFS-4352
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Affects Versions: 2.0.3-alpha
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: 01b.patch, 01.patch


 Encapsulate the arguments to BlockReaderFactory in a class to avoid having to 
 pass around 10+ arguments to a few different functions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4747) Convert snapshot user guide to APT from XDOC

2013-04-24 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-4747:


 Summary: Convert snapshot user guide to APT from XDOC
 Key: HDFS-4747
 URL: https://issues.apache.org/jira/browse/HDFS-4747
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: documentation
Affects Versions: Snapshot (HDFS-2802)
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers


To be consistent with the rest of the HDFS docs, the user snapshots user guide 
should use APT instead of XDOC.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4655) DNA_FINALIZE is logged as being an unknown command by the DN when received from the standby NN

2013-04-01 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-4655:


 Summary: DNA_FINALIZE is logged as being an unknown command by the 
DN when received from the standby NN
 Key: HDFS-4655
 URL: https://issues.apache.org/jira/browse/HDFS-4655
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.0.4-alpha
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
Priority: Minor


This is harmless since the alternative is just to log the command as being 
ignored, but this bug results in a somewhat concerning error message appearing 
in the logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4656) DN heartbeat loop can be briefly tight

2013-04-01 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-4656:


 Summary: DN heartbeat loop can be briefly tight
 Key: HDFS-4656
 URL: https://issues.apache.org/jira/browse/HDFS-4656
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.0.4-alpha
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
Priority: Minor


The DN hearbeat loop looks roughly like this:

{code}
if (now - timeOfLastHeartbeat  configuredHeartbeatInterval) {
  // do heartbeat
}
timeToWait = configuredHeartbeatInterval - (now - timeOfLastHeartbeat)
sleep(timeToWait)
{code}

The trouble is that since we sleep for exactly the heartbeat interval, and then 
check to see if we have waited _more_ than that heartbeat interval, we will 
very often have waited exactly the heartbeat interval (in millis), and not more 
than it. In this case we will skip actually performing the heartbeat and will 
calculcate timeToWait as being 0ms. The DN heartbeat loop will then loop 
tightly for 1ms. The solution is just to change the {{}} in the code above 
to {{=}}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4657) If incremental BR is received before first full BR NN will log a line for every block on a DN

2013-04-01 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-4657:


 Summary: If incremental BR is received before first full BR NN 
will log a line for every block on a DN
 Key: HDFS-4657
 URL: https://issues.apache.org/jira/browse/HDFS-4657
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.0.4-alpha
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers


This can impact restart times pretty substantially if the DNs have a lot of 
blocks, and since the FSNS write lock is held while processing the block report 
clients will not make any progress.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4658) Standby NN will log that it has received a block report after becoming active

2013-04-01 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-4658:


 Summary: Standby NN will log that it has received a block report 
after becoming active
 Key: HDFS-4658
 URL: https://issues.apache.org/jira/browse/HDFS-4658
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, namenode
Affects Versions: 2.0.4-alpha
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
Priority: Trivial


Even when in the standby state the following line will sometimes be logged:

{noformat}
INFO blockmanagement.BlockManager: BLOCK* processReport: Received first block 
report from 172.21.3.106:50010 after becoming active. Its block contents are no 
longer considered stale
{noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4614) FSNamesystem#getContentSummary should use getPermissionChecker helper method

2013-03-19 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-4614:


 Summary: FSNamesystem#getContentSummary should use 
getPermissionChecker helper method
 Key: HDFS-4614
 URL: https://issues.apache.org/jira/browse/HDFS-4614
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.0.4-alpha
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
Priority: Trivial


HDFS-4222 added this helper method and called it in most places, but missed one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4592) Default values for access time precision are out of sync between hdfs-default.xml and the code

2013-03-12 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-4592:


 Summary: Default values for access time precision are out of sync 
between hdfs-default.xml and the code
 Key: HDFS-4592
 URL: https://issues.apache.org/jira/browse/HDFS-4592
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.0.4-alpha
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers


In {{hdfs-default.xml}} we have:
{code}
property
  namedfs.namenode.accesstime.precision/name
  value360/value
  descriptionThe access time for HDFS file is precise upto this value.
   The default value is 1 hour. Setting a value of 0 disables
   access times for HDFS.
  /description
/property
{code}

But in {{FSNamesystem}} we have:
{code}
this.accessTimePrecision = conf.getLong(DFS_NAMENODE_ACCESSTIME_PRECISION_KEY, 
0);
{code}

We properly define {{DFS_NAMENODE_ACCESSTIME_PRECISION_DEFAULT}} in 
DFSConfigKeys.java, but it's not actually referenced anywhere in the code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4591) HA clients can fail to fail over while Standby NN is performing long checkpoint

2013-03-11 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-4591:


 Summary: HA clients can fail to fail over while Standby NN is 
performing long checkpoint
 Key: HDFS-4591
 URL: https://issues.apache.org/jira/browse/HDFS-4591
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, namenode
Affects Versions: 2.0.4-alpha
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers


Clients know to fail over to talk to the Active NN when they perform an RPC to 
the Standby NN and it throws a StandbyException. However, most places in the 
code that check if the NN is in the standby state do so inside the FSNS fsLock. 
Since this lock is held for the duration of the saveNamespace during a 
checkpoint, StandbyExceptions will not be thrown during this time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-4476) HDFS-347: style cleanups

2013-02-15 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers resolved HDFS-4476.
--

  Resolution: Fixed
Hadoop Flags: Reviewed

I've just committed this to the HDFS-347 branch. Thanks a lot for the 
contribution, Colin.

 HDFS-347: style cleanups
 

 Key: HDFS-4476
 URL: https://issues.apache.org/jira/browse/HDFS-4476
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, hdfs-client, performance
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-4476.001.patch, HDFS-4476.002.patch


 Clean up some code style issues in HDFS-347.
 DomainSocket.java
   do not use AtomicInteger for status, add a new class
   rename fdRef(), fdUnref(boolean), jfds, jbuf, SND_BUF_SIZE, etc.
   do not override finalize().
   remove some dead code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-4453) Make a simple doc to describe the usage and design of the shortcircuit read feature

2013-02-11 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers resolved HDFS-4453.
--

  Resolution: Fixed
Hadoop Flags: Reviewed

I've just committed this to the HDFS-347 branch.

Thanks a lot for the contribution, Colin.

 Make a simple doc to describe the usage and design of the shortcircuit read 
 feature
 ---

 Key: HDFS-4453
 URL: https://issues.apache.org/jira/browse/HDFS-4453
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, hdfs-client
Reporter: Brandon Li
Assignee: Colin Patrick McCabe
 Attachments: HDFS-4453.001.patch, HDFS-4453.002.patch, 
 HDFS-4453.003.patch, HDFS-4453.004.patch


 It would be nice to have a document to describe the configuration and design 
 of this feature. Also its relationship with previous short circuit read 
 implementation(HDFS-2246), for example, can they co-exist, or this one is 
 planed to replaces HDFS-2246, or it can fall back on HDFS-2246 when unix 
 domain socket is not supported.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-4485) HDFS-347: DN should chmod socket path a+w

2013-02-08 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers resolved HDFS-4485.
--

Resolution: Fixed

I've just committed this to the HDFS-347 branch.

Thanks a lot for the contribution, Colin.

 HDFS-347: DN should chmod socket path a+w
 -

 Key: HDFS-4485
 URL: https://issues.apache.org/jira/browse/HDFS-4485
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Reporter: Todd Lipcon
Assignee: Colin Patrick McCabe
Priority: Critical
 Attachments: HDFS-4485.001.patch, HDFS-4485.003.patch


 In cluster-testing HDFS-347, we found that in clusters where the MR job 
 doesn't run as the same user as HDFS, clients wouldn't use short circuit read 
 because of a 'permission denied' error connecting to the socket. It turns out 
 that, in order to connect to a socket, clients need write permissions on the 
 socket file.
 The DN should set these permissions automatically after it creates the socket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-4473) don't create domain socket unless we need it

2013-02-06 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers resolved HDFS-4473.
--

  Resolution: Fixed
Hadoop Flags: Reviewed

I've just committed this to the HDFS-347 branch.

Thanks a lot for the contribution, Andy.

 don't create domain socket unless we need it
 

 Key: HDFS-4473
 URL: https://issues.apache.org/jira/browse/HDFS-4473
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, hdfs-client, performance
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-4473.001.patch


 If {{dfs.domain.socket.path}} is set, but we don't have anything enabled 
 which would need it (like {{dfs.client.read.shortcircuit}}), don't create the 
 socket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4462) 2NN will fail to checkpoint after an HDFS upgrade from a pre-federation version of HDFS

2013-01-31 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-4462:


 Summary: 2NN will fail to checkpoint after an HDFS upgrade from a 
pre-federation version of HDFS
 Key: HDFS-4462
 URL: https://issues.apache.org/jira/browse/HDFS-4462
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.0.2-alpha
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers


The 2NN currently has logic to detect when its on-disk FS metadata needs an 
upgrade with respect to the NN's metadata (i.e. the layout versions are 
different) and in this case it will proceed with the checkpoint despite storage 
signatures not matching precisely if the BP ID and Cluster ID do match exactly. 
However, in situations where we're upgrading from versions of HDFS prior to 
federation, which had no BP IDs or Cluster IDs, checkpoints will always fail 
with an error like the following:
{noformat}
13/01/31 17:02:25 ERROR namenode.SecondaryNameNode: checkpoint: Inconsistent 
checkpoint fields.
LV = -40 namespaceID = 403832480 cTime = 1359680537192 ; clusterId = 
CID-0df6ff22-1165-4c7d-9630-429972a7737c ; blockpoolId = 
BP-1520616013-172.21.3.106-1359680537136.
Expecting respectively: -19; 403832480; 0; ; .
{noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4448) HA NN does not start with wildcard address configured for other NN when security is enabled

2013-01-28 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-4448:


 Summary: HA NN does not start with wildcard address configured for 
other NN when security is enabled
 Key: HDFS-4448
 URL: https://issues.apache.org/jira/browse/HDFS-4448
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, namenode, security
Affects Versions: 2.0.3-alpha
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers


Currently if one tries to configure HA NNs use the wildcard HTTP address when 
security is enabled, the NN will fail to start with an error like the following:
{code}
java.lang.IllegalArgumentException: java.io.IOException: Cannot use a wildcard 
address with security. Must explicitly set bind address for Kerberos
{code}
This is the case even if one configures an actual address for the other NN's 
HTTP address. There's no good reason for this, since we now check for the local 
address being set to 0.0.0.0 and determine the canonical hostname for Kerberos 
purposes using {{InetAddress.getLocalHost().getCanonicalHostName()}}, so we 
should remove the restriction.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-4438) TestDomainSocket fails when system umask is set to 0002

2013-01-24 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers resolved HDFS-4438.
--

  Resolution: Fixed
Hadoop Flags: Reviewed

I've just committed this to the HDFS-347 branch. Thanks a lot for the 
contribution, Colin.

 TestDomainSocket fails when system umask is set to 0002
 ---

 Key: HDFS-4438
 URL: https://issues.apache.org/jira/browse/HDFS-4438
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, hdfs-client, performance
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-4438.001.patch


 {{TestDomainSocket#testFdPassingPathSecurity}} fails when the system umask is 
 set to 0002.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-4440) avoid annoying log message when dfs.domain.socket.path is not set

2013-01-24 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers resolved HDFS-4440.
--

  Resolution: Fixed
Hadoop Flags: Reviewed

I've just committed this to the HDFS-347 branch. Thanks a lot for the 
contribution, Colin.

 avoid annoying log message when dfs.domain.socket.path is not set
 -

 Key: HDFS-4440
 URL: https://issues.apache.org/jira/browse/HDFS-4440
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, hdfs-client, performance
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Trivial
 Attachments: HDFS-4440.001.patch


 When dfs.domain.socket.path is not set, it gets set to the empty string.  We 
 should check if this conf key is the empty string in {{DomainSocketFactory}}, 
 rather than checking against null as we currently do.  Otherwise, we get 
 annoying log messages about failing to connect to the UNIX domain socket at 
 ''.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-4333) Using right default value for creating files in HDFS

2013-01-10 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers resolved HDFS-4333.
--

Resolution: Duplicate

Sounds good. Closing this issue as a duplicate then.

 Using right default value for creating files in HDFS
 

 Key: HDFS-4333
 URL: https://issues.apache.org/jira/browse/HDFS-4333
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.0.2-alpha
Reporter: Binglin Chang
Assignee: Binglin Chang
Priority: Minor

 The default permission to create file should be 0666 rather than 0777, 
 HADOOP-9155 add default permission for files and change 
 localfilesystem.create to use this default value, this jira makes the similar 
 change with hdfs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4376) Intermittent timeout of TestBalancerWithNodeGroup

2013-01-09 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-4376:


 Summary: Intermittent timeout of TestBalancerWithNodeGroup
 Key: HDFS-4376
 URL: https://issues.apache.org/jira/browse/HDFS-4376
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer, test
Affects Versions: 2.0.3-alpha
Reporter: Aaron T. Myers
Priority: Minor
 Attachments: test-balancer-with-node-group-timeout.txt

HDFS-4261 fixed several issues with the balancer and balancer tests, and 
reduced the frequency with which TestBalancerWithNodeGroup times out. Despite 
this, occasional timeouts still occur in this test. This JIRA is to track and 
fix this problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4315) DNs with multiple BPs can have BPOfferServices fail to start due to unsynchronized map access

2012-12-14 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-4315:


 Summary: DNs with multiple BPs can have BPOfferServices fail to 
start due to unsynchronized map access
 Key: HDFS-4315
 URL: https://issues.apache.org/jira/browse/HDFS-4315
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.0.2-alpha
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers


In some nightly test runs we've seen pretty frequent failures of 
TestWebHdfsWithMultipleNameNodes. I've traced the root cause to an 
unsynchronized map access in the DataStorage class.

More details in the first comment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-3926) QJM: Add user documentation for QJM

2012-09-13 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers resolved HDFS-3926.
--

   Resolution: Fixed
Fix Version/s: QuorumJournalManager (HDFS-3077)
 Hadoop Flags: Reviewed

I've just committed this to the QJM branch.

 QJM: Add user documentation for QJM
 ---

 Key: HDFS-3926
 URL: https://issues.apache.org/jira/browse/HDFS-3926
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: documentation
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
 Fix For: QuorumJournalManager (HDFS-3077)

 Attachments: HDFS-3926.patch, HDFS-3926.patch, HDFS-3926.patch, 
 qjm-ha-doc.diff, qjm-ha-doc.diff, regular-ha-doc.diff, regular-ha-doc.diff


 We should add user-facing documentation for how to configure/use the QJM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-3926) QJM: Add user documentation for QJM

2012-09-11 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-3926:


 Summary: QJM: Add user documentation for QJM
 Key: HDFS-3926
 URL: https://issues.apache.org/jira/browse/HDFS-3926
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: documentation
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers


We should add user-facing documentation for how to configure/use the QJM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-3893) QJM: Make QJM work with security enabled

2012-09-06 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers resolved HDFS-3893.
--

   Resolution: Fixed
Fix Version/s: QuorumJournalManager (HDFS-3077)
 Hadoop Flags: Reviewed

Thanks a lot for the review, Todd. I've just committed this to the HDFS-3077 
branch.

 QJM: Make QJM work with security enabled
 

 Key: HDFS-3893
 URL: https://issues.apache.org/jira/browse/HDFS-3893
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: name-node, security
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
 Fix For: QuorumJournalManager (HDFS-3077)

 Attachments: HDFS-3893.patch, HDFS-3893.patch


 Currently the QJM does not work when security is enabled. The quorum cannot 
 be formatted, the NN and SBN cannot communicate with the JNs, and JNs cannot 
 synchronize edit logs with each other.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-3897) QJM: TestBlockToken fails after HDFS-3893

2012-09-06 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-3897:


 Summary: QJM: TestBlockToken fails after HDFS-3893
 Key: HDFS-3897
 URL: https://issues.apache.org/jira/browse/HDFS-3897
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers


HDFS-3893 caused the NN to log in using its configured Kerberos credentials 
when formatting the NN. This caused 
TestBlockToken#testBlockTokenInLastLocatedBlock to begin failing, since the 
test enables Kerberos but doesn't configure the NN principal or keytab.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-3897) QJM: TestBlockToken fails after HDFS-3893

2012-09-06 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers resolved HDFS-3897.
--

   Resolution: Fixed
Fix Version/s: QuorumJournalManager (HDFS-3077)
 Hadoop Flags: Reviewed

Thanks a lot for the quick review, Todd. I've just committed this to the 
HDFS-3077 branch.

 QJM: TestBlockToken fails after HDFS-3893
 -

 Key: HDFS-3897
 URL: https://issues.apache.org/jira/browse/HDFS-3897
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
 Fix For: QuorumJournalManager (HDFS-3077)

 Attachments: HDFS-3897.patch


 HDFS-3893 caused the NN to log in using its configured Kerberos credentials 
 when formatting the NN. This caused 
 TestBlockToken#testBlockTokenInLastLocatedBlock to begin failing, since the 
 test enables Kerberos but doesn't configure the NN principal or keytab.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-3893) QJM: Make QJM work with security enabled

2012-09-05 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-3893:


 Summary: QJM: Make QJM work with security enabled
 Key: HDFS-3893
 URL: https://issues.apache.org/jira/browse/HDFS-3893
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: name-node, security
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers


Currently the QJM does not work when security is enabled. The quorum cannot be 
formatted, the NN and SBN cannot communicate with the JNs, and JNs cannot 
synchronize edit logs with each other.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-3878) Add an administrative command to download finalized edit log segments from the NN

2012-08-31 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-3878:


 Summary: Add an administrative command to download finalized edit 
log segments from the NN
 Key: HDFS-3878
 URL: https://issues.apache.org/jira/browse/HDFS-3878
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs client, name-node
Affects Versions: 2.0.0-alpha
Reporter: Aaron T. Myers


Similarly to the `hdfs dfsadmin -fetchImage' command added in HDFS-2941, it 
would be nice to have an admin command capable of fetching edit log segments. 
This could be useful, for example, for use in a script designed to back up the 
NN on-disk metadata.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-3864) NN does not update internal file mtime for OP_CLOSE when reading from the edit log

2012-08-28 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-3864:


 Summary: NN does not update internal file mtime for OP_CLOSE when 
reading from the edit log
 Key: HDFS-3864
 URL: https://issues.apache.org/jira/browse/HDFS-3864
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 2.0.0-alpha
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers


When logging an OP_CLOSE to the edit log, the NN writes out an updated file 
mtime and atime. However, when reading in an OP_CLOSE from the edit log, the NN 
does not apply these values to the in-memory FS data structure. Because of 
this, a file's mtime or atime may appear to go back in time after an NN 
restart, or an HA failover.

Most of the time this will be harmless and folks won't notice, but in the event 
one of these files is being used in the distributed cache of an MR job when an 
HA failover occurs, the job might notice that the mtime of a cache file has 
changed, which in MR2 will cause the job to fail with an exception like the 
following:

{noformat}
java.io.IOException: Resource 
hdfs://ha-nn-uri/user/jenkins/.staging/job_1341364439849_0513/libjars/snappy-java-1.0.3.2.jar
 changed on src filesystem (expected 1342137814599, was 1342137814473
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:90)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49)
at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157)
at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
{noformat}

Credit to Sujay Rau for discovering this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-3835) Long-lived 2NN cannot perform a checkpoint if security is enabled and the NN restarts without outstanding delegation tokens

2012-08-21 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-3835:


 Summary: Long-lived 2NN cannot perform a checkpoint if security is 
enabled and the NN restarts without outstanding delegation tokens
 Key: HDFS-3835
 URL: https://issues.apache.org/jira/browse/HDFS-3835
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node, security
Affects Versions: 2.0.0-alpha
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers


When the 2NN wants to perform a checkpoint, it figures out the highest 
transaction ID of the fsimage files on the NN, and if the 2NN has a copy of 
that fsimage file (because it created that merged fsimage file the last time it 
did a checkpoint) then the 2NN won't download the fsimage file from the NN, and 
instead only gets the new edits files from the NN. In this case, the 2NN also 
doesn't even bother reloading the fsimage file it has from disk, since it has 
all of the namespace state in-memory. This all works just fine.

When the 2NN _doesn't_ have a copy of the relevant fsimage file (for example, 
if the NN had restarted since the last checkpoint) then the 2NN blows away its 
in-memory namespace state, downloads the fsimage file from the NN, and loads 
the newly-downloaded fsimage file from disk. The bug is that when the 2NN 
clears its in-memory state, it only resets the namespace, but not the 
delegation token map.

The fix is pretty simple - just make the delegation token map get cleared as 
well as the namespace state when a running 2NN needs to load a new fsimage from 
disk.

Credit to Stephen Chu for identifying this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3820) QJM: Add a few missing imports in TestEditLog and TestFSEditLogLoader

2012-08-20 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-3820:


 Summary: QJM: Add a few missing imports in TestEditLog and 
TestFSEditLogLoader
 Key: HDFS-3820
 URL: https://issues.apache.org/jira/browse/HDFS-3820
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers


Both of these test files are missing a few imports required to make the compile.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HDFS-3820) QJM: Add a few missing imports in TestEditLog and TestFSEditLogLoader

2012-08-20 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers resolved HDFS-3820.
--

Resolution: Invalid

Never mind. Was looking at the wrong branch. Sorry for the noise.

 QJM: Add a few missing imports in TestEditLog and TestFSEditLogLoader
 -

 Key: HDFS-3820
 URL: https://issues.apache.org/jira/browse/HDFS-3820
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers

 Both of these test files are missing a few imports required to make the 
 compile.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3823) QJM: TestQJMWithFaults fails occasionally because of missed setting of HTTP port

2012-08-20 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-3823:


 Summary: QJM: TestQJMWithFaults fails occasionally because of 
missed setting of HTTP port
 Key: HDFS-3823
 URL: https://issues.apache.org/jira/browse/HDFS-3823
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
Priority: Minor


TestQJMWithFaults#testRandomized will fail if the IPCLoggerChannel.httpPort 
instance variable isn't set before a call to IPCLoggerChannel#prepareRecovery 
is made, since this is necessary to build URLs to the returned logs.

Credit to Todd Lipcon for discovering this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HDFS-3823) QJM: TestQJMWithFaults fails occasionally because of missed setting of HTTP port

2012-08-20 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers resolved HDFS-3823.
--

   Resolution: Fixed
Fix Version/s: QuorumJournalManager (HDFS-3077)
 Hadoop Flags: Reviewed

Thanks a lot for the quick review, Eli. I've just committed this to the branch.

 QJM: TestQJMWithFaults fails occasionally because of missed setting of HTTP 
 port
 

 Key: HDFS-3823
 URL: https://issues.apache.org/jira/browse/HDFS-3823
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
Priority: Minor
 Fix For: QuorumJournalManager (HDFS-3077)

 Attachments: HDFS-3823.patch


 TestQJMWithFaults#testRandomized will fail if the IPCLoggerChannel.httpPort 
 instance variable isn't set before a call to IPCLoggerChannel#prepareRecovery 
 is made, since this is necessary to build URLs to the returned logs.
 Credit to Todd Lipcon for discovering this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3826) QJM: Some trivial logging / exception text improvements

2012-08-20 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-3826:


 Summary: QJM: Some trivial logging / exception text improvements
 Key: HDFS-3826
 URL: https://issues.apache.org/jira/browse/HDFS-3826
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node, test
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
Priority: Minor


Some of the log text in QuorumException and QuorumJournalManager could stand to 
be improved.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HDFS-3826) QJM: Some trivial logging / exception text improvements

2012-08-20 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers resolved HDFS-3826.
--

   Resolution: Fixed
Fix Version/s: QuorumJournalManager (HDFS-3077)
 Hadoop Flags: Reviewed

Thanks a lot for the quick review, Eli. I've just committed this to the 
HDFS-3077 branch.

 QJM: Some trivial logging / exception text improvements
 ---

 Key: HDFS-3826
 URL: https://issues.apache.org/jira/browse/HDFS-3826
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node, test
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
Priority: Minor
 Fix For: QuorumJournalManager (HDFS-3077)

 Attachments: HDFS-3826.patch


 Some of the log text in QuorumException and QuorumJournalManager could stand 
 to be improved.
 Credit to Todd Lipcon for noticing this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Reopened] (HDFS-2998) OfflineImageViewer and ImageVisitor should be annotated public

2012-08-17 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers reopened HDFS-2998:
--


My apologies that this fell off my radar. I still believe that this is a useful 
interface for external developers, and thus that we should mark it 
public/unstable or public/evolving.

I agree that doing so would require marking the referenced classes similarly 
public/(evolving|unstable).

 OfflineImageViewer and ImageVisitor should be annotated public
 --

 Key: HDFS-2998
 URL: https://issues.apache.org/jira/browse/HDFS-2998
 Project: Hadoop HDFS
  Issue Type: New Feature
Affects Versions: 0.23.1
Reporter: Aaron T. Myers

 The OfflineImageViewer is currently annotated as InterfaceAudience.Private. 
 It's intended for subclassing, so it should be annotated as the public API 
 that it is.
 The ImageVisitor class should similarly be annotated public (evolving is 
 fine). Note that it should also be changed to be public; it's currently 
 package-private, which means that users have to cheat with their subclass 
 package name.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Reopened] (HDFS-3719) Re-enable append-related tests in TestFileConcurrentReader

2012-08-13 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers reopened HDFS-3719:
--


 Re-enable append-related tests in TestFileConcurrentReader
 --

 Key: HDFS-3719
 URL: https://issues.apache.org/jira/browse/HDFS-3719
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.0.0-alpha
Reporter: Andrew Wang
Assignee: Andrew Wang
 Fix For: 2.2.0-alpha

 Attachments: hdfs-3719-1.patch


 Both of these tests are disabled. We should figure out what append 
 functionality we need to make the tests work again, and reenable them.
 {code}
   // fails due to issue w/append, disable 
   @Ignore
   @Test
   public void _testUnfinishedBlockCRCErrorTransferToAppend()
 throws IOException {
 runTestUnfinishedBlockCRCError(true, SyncType.APPEND, DEFAULT_WRITE_SIZE);
   }
   // fails due to issue w/append, disable 
   @Ignore
   @Test
   public void _testUnfinishedBlockCRCErrorNormalTransferAppend()
 throws IOException {
 runTestUnfinishedBlockCRCError(false, SyncType.APPEND, 
 DEFAULT_WRITE_SIZE);
   }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3773) TestNNWithQJM fails after HDFS-3741

2012-08-08 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-3773:


 Summary: TestNNWithQJM fails after HDFS-3741
 Key: HDFS-3773
 URL: https://issues.apache.org/jira/browse/HDFS-3773
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers


Looks like the change of visibility of one of the QuorumJournalManager 
constructors fouls up the instantiation via reflection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3745) fsck prints that it's using KSSL even when it's in fact using SPNEGO for authentication

2012-07-31 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-3745:


 Summary: fsck prints that it's using KSSL even when it's in fact 
using SPNEGO for authentication
 Key: HDFS-3745
 URL: https://issues.apache.org/jira/browse/HDFS-3745
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client, security
Affects Versions: 2.0.0-alpha, 1.2.0
Reporter: Aaron T. Myers
Priority: Trivial


In branch-2 (which exclusively uses SPNEGO for HTTP authentication) and in 
branch-1 (which can optionally use SPNEGO for HTTP authentication), running 
fsck will print the following, which isn't quite right:

{quote}
FSCK started by hdfs (auth:KERBEROS_SSL) from...
{quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3738) TestDFSClientRetries#testFailuresArePerOperation sets incorrect timeout config

2012-07-30 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-3738:


 Summary: TestDFSClientRetries#testFailuresArePerOperation sets 
incorrect timeout config
 Key: HDFS-3738
 URL: https://issues.apache.org/jira/browse/HDFS-3738
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.0.0-alpha
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
Priority: Minor


TestDFSClientRetries#testFailuresArePerOperation involves testing retries by 
making use of expected timeouts. However, this test sets the wrong config to 
lower the timeout, and thus takes far longer than it should.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3727) When using SPNEGO, NN should not try to log in using KSSL principal

2012-07-25 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-3727:


 Summary: When using SPNEGO, NN should not try to log in using KSSL 
principal
 Key: HDFS-3727
 URL: https://issues.apache.org/jira/browse/HDFS-3727
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 1.2.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers


When performing a checkpoint with security enabled, the NN will attempt to 
relogin from its keytab before making an HTTP request back to the 2NN to fetch 
the newly-merged image. However, it always attempts to log in using the KSSL 
principal, even if SPNEGO is configured to be used.

This issue was discovered by Stephen Chu.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3698) TestHftpFileSystem is failing in branch-1 due to changed default secure port

2012-07-21 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-3698:


 Summary: TestHftpFileSystem is failing in branch-1 due to changed 
default secure port
 Key: HDFS-3698
 URL: https://issues.apache.org/jira/browse/HDFS-3698
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: security
Affects Versions: 1.2.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers


This test is failing since the default secure port changed to the HTTP port 
upon the commit of HDFS-2617.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Reopened] (HDFS-3654) TestJspHelper#testGetUgi fails with NPE

2012-07-17 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers reopened HDFS-3654:
--


 TestJspHelper#testGetUgi fails with NPE
 ---

 Key: HDFS-3654
 URL: https://issues.apache.org/jira/browse/HDFS-3654
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.1.0-alpha
Reporter: Eli Collins
Assignee: Eli Collins
 Fix For: 1.2.0, 2.1.0-alpha

 Attachments: hdfs-3654-b1.txt, hdfs-3654.txt, hdfs-3654.txt


 Looks like my recent change in HDFS-3639 can occasionally cause this test to 
 fail. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Reopened] (HDFS-3639) JspHelper#getUGI should always verify the token if security is enabled

2012-07-17 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers reopened HDFS-3639:
--


 JspHelper#getUGI should always verify the token if security is enabled
 --

 Key: HDFS-3639
 URL: https://issues.apache.org/jira/browse/HDFS-3639
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: security
Affects Versions: 1.0.0, 2.0.0-alpha
Reporter: Eli Collins
Assignee: Eli Collins
Priority: Minor
 Fix For: 1.2.0, 2.1.0-alpha

 Attachments: hdfs-3639-b1.txt, hdfs-3639.txt


 JspHelper#getUGI on verifies the given token if the context and nn are set 
 (added in HDFS-2416). We should unconditionally verifyToken the token, ie a 
 bug where name.node is not set in the context object should not result in 
 not verifying the token. In practice this shouldn't be an issue as per 
 HDFS-3434 the context and NN should never be null.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HDFS-3654) TestJspHelper#testGetUgi fails with NPE

2012-07-17 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers resolved HDFS-3654.
--

Resolution: Invalid

Seems like this JIRA is now irrelevant since the change in HDFS-3639 has been 
reverted and will be done differently.

 TestJspHelper#testGetUgi fails with NPE
 ---

 Key: HDFS-3654
 URL: https://issues.apache.org/jira/browse/HDFS-3654
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.1.0-alpha
Reporter: Eli Collins
Assignee: Eli Collins
 Fix For: 1.2.0, 2.1.0-alpha

 Attachments: hdfs-3654-b1.txt, hdfs-3654.txt, hdfs-3654.txt


 Looks like my recent change in HDFS-3639 can occasionally cause this test to 
 fail. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3637) Add support for encrypting the DataTransferProtocol

2012-07-11 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-3637:


 Summary: Add support for encrypting the DataTransferProtocol
 Key: HDFS-3637
 URL: https://issues.apache.org/jira/browse/HDFS-3637
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: data-node, hdfs client, security
Affects Versions: 2.0.0-alpha
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers


Currently all HDFS RPCs performed by NNs/DNs/clients can be optionally 
encrypted. However, actual data read or written between DNs and clients (or DNs 
to DNs) is sent in the clear. When processing sensitive data on a shared 
cluster, confidentiality of the data read/written from/to HDFS may be desired.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3501) Checkpointing with security + HA enabled will stop working after ticket lifetime expires

2012-06-04 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-3501:


 Summary: Checkpointing with security + HA enabled will stop 
working after ticket lifetime expires
 Key: HDFS-3501
 URL: https://issues.apache.org/jira/browse/HDFS-3501
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, name-node
Affects Versions: 2.0.0-alpha
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
 Attachments: HDFS-3501.patch

The StandbyCheckpointer currently does the right thing in renewing its krb5 
creds before attempting to perform a checkpoint to the active NN, but the 
active NN makes no attempt to renew its own krb5 creds before connecting to the 
standby NN to fetch the new merged fsimage file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3484) hdfs fsck doesn't work if NN HTTP address is set to 0.0.0.0 even if NN RPC address is configured

2012-05-31 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-3484:


 Summary: hdfs fsck doesn't work if NN HTTP address is set to 
0.0.0.0 even if NN RPC address is configured
 Key: HDFS-3484
 URL: https://issues.apache.org/jira/browse/HDFS-3484
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 2.0.0-alpha
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
Priority: Minor


The default NN HTTP address is 0.0.0.0. Clients which need to connect to the 
HTTP address (e.g. fsck and fetchImage) need an address which is actually 
resolvable, however. If the configured NN HTTP address is set to 0.0.0.0, these 
clients should fall back on using the hostname configured for the RPC address, 
with the port configured for the HTTP address.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HDFS-3463) DFSTestUtil.waitCorruptReplicas() should not use file reading time as a timeout measure.

2012-05-30 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers resolved HDFS-3463.
--

Resolution: Duplicate

 DFSTestUtil.waitCorruptReplicas() should not use file reading time as a 
 timeout measure.
 

 Key: HDFS-3463
 URL: https://issues.apache.org/jira/browse/HDFS-3463
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 3.0.0
Reporter: Konstantin Shvachko
 Attachments: testBlockCorruptionRecoveryPolicy1.log.htm


 Tests fail because DFSTestUtil.waitCorruptReplicas() does not wait long 
 enough.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HDFS-3427) TestEditLogFileOutputStream#PREALLOCATION_LENGTH is dead code

2012-05-23 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers resolved HDFS-3427.
--

Resolution: Duplicate

Looks like this got fixed with the commit of HDFS-2982.

 TestEditLogFileOutputStream#PREALLOCATION_LENGTH is dead code
 -

 Key: HDFS-3427
 URL: https://issues.apache.org/jira/browse/HDFS-3427
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: test
Affects Versions: 3.0.0
Reporter: Aaron T. Myers
Priority: Trivial
  Labels: newbie

 The constant PREALLOCATION_LENGTH in TestEditLogFileOutputStream is no longer 
 referenced anywhere.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3444) hdfs groups command doesn't work with security enabled

2012-05-18 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-3444:


 Summary: hdfs groups command doesn't work with security enabled
 Key: HDFS-3444
 URL: https://issues.apache.org/jira/browse/HDFS-3444
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 2.0.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers


When one tries to run `hdfs groups' with security enabled, you'll get an error 
like the following:

{noformat}
java.io.IOException: Failed to specify server's Kerberos principal name;
{noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3439) Balancer exits if fs.defaultFS is set to a different, but semantically identical, URI from dfs.namenode.rpc-address

2012-05-17 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-3439:


 Summary: Balancer exits if fs.defaultFS is set to a different, but 
semantically identical, URI from dfs.namenode.rpc-address
 Key: HDFS-3439
 URL: https://issues.apache.org/jira/browse/HDFS-3439
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer
Affects Versions: 2.0.0
Reporter: Aaron T. Myers


The balancer determines the set of NN URIs to balance by looking at 
fs.defaultFS and all possible dfs.namenode.(service)rpc-address settings. If 
fs.defaultFS is, for example, set to hdfs://foo.example.com:8020/ (note the 
trailing /) and the rpc-address is set to hdfs://foo.example.com:8020 
(without a /), then the balancer will conclude that there are two NNs and try 
to balance both. However, since both of these URIs refer to the same actual FS 
instance, the balancer will exit with java.io.IOException: Another balancer is 
running.  Exiting ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HDFS-3430) Start-all.sh Error

2012-05-16 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers resolved HDFS-3430.
--

   Resolution: Invalid
Fix Version/s: (was: 1.0.1)

Hi Hiten, you're attempting to use an invalid hostname for you NameNode. 
Hostnames are not permitted to contain the underscore (_) character.

 Start-all.sh Error 
 ---

 Key: HDFS-3430
 URL: https://issues.apache.org/jira/browse/HDFS-3430
 Project: Hadoop HDFS
  Issue Type: Test
  Components: data-node, hdfs client, name-node
Affects Versions: 1.0.1
 Environment: Linux
Reporter: Hiten Tathe
  Labels: hadoop
 Attachments: Screenshot.png

   Original Estimate: 5h
  Remaining Estimate: 5h

 Hi,
 m new to Hadoop and trying to run hadoop on standalone Linux machine but 
 faing some error please help me the Error is as follow :- 
 2012-05-16 13:03:11,155 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
 STARTUP_MSG: 
 /
 STARTUP_MSG: Starting NameNode
 STARTUP_MSG:   host = sra_hadoop.com/192.168.1.62
 STARTUP_MSG:   args = []
 STARTUP_MSG:   version = 1.0.2-SNAPSHOT
 STARTUP_MSG:   build =  -r ; compiled by 'root' on Wed May 16 12:30:17 IST 
 2012
 /
 2012-05-16 13:03:11,363 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: 
 loaded properties from hadoop-metrics2.properties
 2012-05-16 13:03:11,379 INFO 
 org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source 
 MetricsSystem,sub=Stats registered.
 2012-05-16 13:03:11,380 INFO 
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period 
 at 10 second(s).
 2012-05-16 13:03:11,381 INFO 
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system 
 started
 2012-05-16 13:03:11,390 ERROR 
 org.apache.hadoop.hdfs.server.namenode.NameNode: 
 java.lang.IllegalArgumentException: Does not contain a valid host:port 
 authority: hdfs://sra_hadoop:9000
 at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:162)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:198)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:228)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:262)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:496)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1279)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1288)
 2012-05-16 13:03:11,391 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
 SHUTDOWN_MSG: 
 /
 SHUTDOWN_MSG: Shutting down NameNode at sra_hadoop.com/192.168.1.62

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3433) GetImageServlet should allow administrative requestors when security is enabled

2012-05-16 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-3433:


 Summary: GetImageServlet should allow administrative requestors 
when security is enabled
 Key: HDFS-3433
 URL: https://issues.apache.org/jira/browse/HDFS-3433
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 2.0.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers


Currently the GetImageServlet only allows the NN and checkpointing nodes to 
connect. Since we now have the fetchImage command in DFSAdmin, we should also 
allow administrative requests as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3400) DNs should be able start with jsvc even if security is disabled

2012-05-10 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-3400:


 Summary: DNs should be able start with jsvc even if security is 
disabled
 Key: HDFS-3400
 URL: https://issues.apache.org/jira/browse/HDFS-3400
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, scripts
Affects Versions: 2.0.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers


Currently if one tries to start a DN with security disabled (via 
hadoop.security.authentication = simple in the configs), but JSVC is 
correctly configured, the DN will refuse to start.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3404) Make putImage in GetImageServlet infer remote address to fetch from

2012-05-10 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-3404:


 Summary: Make putImage in GetImageServlet infer remote address to 
fetch from
 Key: HDFS-3404
 URL: https://issues.apache.org/jira/browse/HDFS-3404
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.0.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers


As it stands, daemons which perform checkpointing must determine their own 
address on which they can be reached, so that the NN which they checkpoint 
against knows what address to fetch a merged fsimage from. This causes problems 
if, for example, the daemon performing checkpointing binds to 0.0.0.0, and thus 
can't be sure of what address the NN can reach it at.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3405) Checkpointing should use HTTP POST or PUT instead of GET-GET to send merged fsimages

2012-05-10 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-3405:


 Summary: Checkpointing should use HTTP POST or PUT instead of 
GET-GET to send merged fsimages
 Key: HDFS-3405
 URL: https://issues.apache.org/jira/browse/HDFS-3405
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 1.0.0
Reporter: Aaron T. Myers


As Todd points out in [this 
comment|https://issues.apache.org/jira/browse/HDFS-3404?focusedCommentId=13272986page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13272986],
 the current scheme for a checkpointing daemon to upload a merged fsimage file 
to an NN is to issue an HTTP get request to tell the target NN to issue another 
GET request back to the checkpointing daemon to retrieve the merged fsimage 
file. There's no fundamental reason the checkpointing daemon can't just use an 
HTTP POST or PUT to send back the merged fsimage file, rather than the 
double-GET scheme.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3395) NN doesn't start with HA+security enabled and HTTP address set to 0.0.0.0

2012-05-09 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-3395:


 Summary: NN doesn't start with HA+security enabled and HTTP 
address set to 0.0.0.0
 Key: HDFS-3395
 URL: https://issues.apache.org/jira/browse/HDFS-3395
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 2.0.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers


DFSUtil#substituteForWildcardAddress subs in a default hostname if the given 
hostname is 0.0.0.0. However, this function throws an exception if the given 
hostname is set to 0.0.0.0 and security is enabled, regardless of whether the 
default hostname is also 0.0.0.0. This function shouldn't throw an exception 
unless both addresses are set to 0.0.0.0.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3390) DFSAdmin should print full stack traces of errors when DEBUG logging is enabled

2012-05-08 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-3390:


 Summary: DFSAdmin should print full stack traces of errors when 
DEBUG logging is enabled
 Key: HDFS-3390
 URL: https://issues.apache.org/jira/browse/HDFS-3390
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs client
Affects Versions: 2.0.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
Priority: Minor


If an error is encountered when running an `hdfs dfsadmin ...' command, only 
the exception's message is output. It would be handy for debugging if the full 
stack trace of the exception were output when DEBUG logging is enabled.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HDFS-3345) Primary and secondary NameNode principals must be the same

2012-05-04 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers resolved HDFS-3345.
--

Resolution: Duplicate

Closing this as a dupe. Please feel free to reopen it if you disagree, or if 
I've misinterpreted your report of the issue.

 Primary and secondary NameNode principals must be the same
 --

 Key: HDFS-3345
 URL: https://issues.apache.org/jira/browse/HDFS-3345
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Owen O'Malley

 The NameNode and SecondaryNameNode have two different configuration knobs 
 (dfs.namenode.kerberos.principal and 
 dfs.secondary.namenode.kerberos.principal), but the secondary namenode fails 
 authorization unless it is the same user.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3349) DFSAdmin fetchImage command should initialize security credentials

2012-05-02 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-3349:


 Summary: DFSAdmin fetchImage command should initialize security 
credentials
 Key: HDFS-3349
 URL: https://issues.apache.org/jira/browse/HDFS-3349
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 2.0.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers


The `hdfs dfsadmin -fetchImage' command should fetch the fsimage using the 
appropriate credentials if security is enabled.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3351) NameNode#initializeGenericKeys should always set fs.defaultFS regardless of whether HA or Federation is enabled

2012-05-02 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-3351:


 Summary: NameNode#initializeGenericKeys should always set 
fs.defaultFS regardless of whether HA or Federation is enabled
 Key: HDFS-3351
 URL: https://issues.apache.org/jira/browse/HDFS-3351
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 2.0.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers


NameNode#initializeGenericKeys exits early if neither a nameservice nor NN ID 
is passed. However, this method also serves to set fs.defaultFS in the 
configuration object stored by the NN to the NN RPC address after generic keys 
have been configured. This should be done in all cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HDFS-2999) DN metrics should include per-disk utilization

2012-04-24 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers resolved HDFS-2999.
--

  Resolution: Won't Fix
Target Version/s: 2.0.0, 3.0.0  (was: 3.0.0, 2.0.0)

Operators can monitor this using more direct means.

 DN metrics should include per-disk utilization
 --

 Key: HDFS-2999
 URL: https://issues.apache.org/jira/browse/HDFS-2999
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node
Affects Versions: 0.23.1
Reporter: Aaron T. Myers

 We should have per-dfs.data.dir metrics in the DN's metrics report.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2350) Secure DN doesn't print output to console when started interactively

2011-09-19 Thread Aaron T. Myers (JIRA)
Secure DN doesn't print output to console when started interactively


 Key: HDFS-2350
 URL: https://issues.apache.org/jira/browse/HDFS-2350
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.24.0
Reporter: Aaron T. Myers
 Fix For: 0.24.0


If one starts a secure DN (using jsvc) interactively, the output is not printed 
to the console, but instead ends up in {{$HADOOP_LOG_DIR/jsvc.err}} and 
{{$HADOOP_LOG_DIR/jsvc.out}}.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




  1   2   >