[jira] [Resolved] (HDFS-14535) The default 8KB buffer in requestFileDescriptors#BufferedOutputStream is causing lots of heap allocation in HBase when using short-circut read

2019-06-04 Thread Todd Lipcon (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-14535.

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.3.0

> The default 8KB buffer in requestFileDescriptors#BufferedOutputStream is 
> causing lots of heap allocation in HBase when using short-circut read
> --
>
> Key: HDFS-14535
> URL: https://issues.apache.org/jira/browse/HDFS-14535
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14535.patch
>
>
> Our HBase team are trying to read the blocks from HDFS into pooled offheap 
> ByteBuffers directly (HBASE-21879),  and recently we had some benchmark, 
> found that almost 45% heap allocation from the DFS client.   The heap 
> allocation flame graph can be see here: 
> https://issues.apache.org/jira/secure/attachment/12970295/async-prof-pid-25042-alloc-2.svg
> After checking the code path,  we found that when requesting file descriptors 
> from a DomainPeer,  we allocated huge 8KB buffer for BufferedOutputStream, 
> though the protocal content was quite small and just few bytes.
> It made a heavy GC pressure for HBase when cacheHitRatio < 60%,  which 
> increased the HBase P999 latency.  Actually,  we can pre-allocate a small 
> buffer for the BufferedOutputStream, such as 512 bytes, it's enough to read 
> the short-circuit fd protocal content.  we've created a patch like that, and 
> the allocation flame graph show that  after the patch, the heap allocation 
> from DFS client dropped from 45% to 27%, that's a very good thing  I think.  
> see: 
> https://issues.apache.org/jira/secure/attachment/12970475/async-prof-pid-24534-alloc-2.svg
> Hope this attached patch can be merged into HDFS trunk, also Hadoop-2.8.x,  
> HBase will benifit a lot from this. 
> Thanks. 
> For more details, can see here: 
> https://issues.apache.org/jira/browse/HBASE-22387?focusedCommentId=16851639=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16851639



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-14482) Crash when using libhdfs with bad classpath

2019-05-14 Thread Todd Lipcon (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-14482.

Resolution: Fixed

> Crash when using libhdfs with bad classpath
> ---
>
> Key: HDFS-14482
> URL: https://issues.apache.org/jira/browse/HDFS-14482
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Todd Lipcon
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: 3.3.0
>
>
> HDFS-14304 added a call to initCachedClasses in getJNIEnv after creating the 
> env but before checking whether it's null. In the case that getJNIEnv() fails 
> to create an env, it returns NULL, and then we crash when calling 
> initCachedClasses() on line 555
> {code}
> 551 state->env = getGlobalJNIEnv();
> 552 mutexUnlock();
> 553 
> 554 jthrowable jthr = NULL;
> 555 jthr = initCachedClasses(state->env);
> 556 if (jthr) {
> 557   printExceptionAndFree(state->env, jthr, PRINT_EXC_ALL,
> 558 "initCachedClasses failed");
> 559   goto fail;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-14482) Crash when using libhdfs with bad classpath

2019-05-08 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-14482:
--

 Summary: Crash when using libhdfs with bad classpath
 Key: HDFS-14482
 URL: https://issues.apache.org/jira/browse/HDFS-14482
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Todd Lipcon
Assignee: Sahil Takiar


HDFS-14304 added a call to initCachedClasses in getJNIEnv after creating the 
env but before checking whether it's null. In the case that getJNIEnv() fails 
to create an env, it returns NULL, and then we crash when calling 
initCachedClasses() on line 555
{code}
551 state->env = getGlobalJNIEnv();
552 mutexUnlock();
553 
554 jthrowable jthr = NULL;
555 jthr = initCachedClasses(state->env);
556 if (jthr) {
557   printExceptionAndFree(state->env, jthr, PRINT_EXC_ALL,
558 "initCachedClasses failed");
559   goto fail;
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-14111) hdfsOpenFile on HDFS causes unnecessary IO from file offset 0

2018-11-28 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-14111:
--

 Summary: hdfsOpenFile on HDFS causes unnecessary IO from file 
offset 0
 Key: HDFS-14111
 URL: https://issues.apache.org/jira/browse/HDFS-14111
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client, libhdfs
Affects Versions: 3.2.0
Reporter: Todd Lipcon


hdfsOpenFile() calls readDirect() with a 0-length argument in order to check 
whether the underlying stream supports bytebuffer reads. With DFSInputStream, 
the read(0) isn't short circuited, and results in the DFSClient opening a block 
reader. In the case of a remote block, the block reader will actually issue a 
read of the whole block, causing the datanode to perform unnecessary IO and 
network transfers in order to fill up the client's TCP buffers. This causes 
performance degradation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-10369) hdfsread crash when reading data reaches to 128M

2018-11-28 Thread Todd Lipcon (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-10369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-10369.

Resolution: Invalid

You're mallocing a buffer of 5 bytes here, seems your C code is just broken.

> hdfsread crash when reading data reaches to 128M
> 
>
> Key: HDFS-10369
> URL: https://issues.apache.org/jira/browse/HDFS-10369
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs
>Reporter: vince zhang
>Priority: Major
>
> see code below, it would crash after   printf("hdfsGetDefaultBlockSize2:%d, 
> ret:%d\n", hdfsGetDefaultBlockSize(fs), ret);
>   
> hdfsFile read_file = hdfsOpenFile(fs, "/testpath", O_RDONLY, 0, 0, 1); 
>   int total = hdfsAvailable(fs, read_file);
>   printf("Total:%d\n", total);
>   char* buffer = (char*)malloc(sizeof(size+1) * sizeof(char));
>   int ret = -1; 
>   int len = 0;
>   ret = hdfsSeek(fs, read_file, 134152192);
>   printf("hdfsGetDefaultBlockSize1:%d, ret:%d\n", 
> hdfsGetDefaultBlockSize(fs), ret);
>   ret = hdfsRead(fs, read_file, (void*)buffer, size);
>   printf("hdfsGetDefaultBlockSize2:%d, ret:%d\n", 
> hdfsGetDefaultBlockSize(fs), ret);
>   ret = hdfsRead(fs, read_file, (void*)buffer, size);
>   printf("hdfsGetDefaultBlockSize3:%d, ret:%d\n", 
> hdfsGetDefaultBlockSize(fs), ret);
>   return 0;



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-13826) Add a hidden configuration for NameNode to generate fake block locations

2018-08-14 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-13826:
--

 Summary: Add a hidden configuration for NameNode to generate fake 
block locations
 Key: HDFS-13826
 URL: https://issues.apache.org/jira/browse/HDFS-13826
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Todd Lipcon
Assignee: Todd Lipcon


In doing testing and benchmarking of the NameNode and dependent systems, it's 
often useful to be able to use an fsimage provided by some production system in 
a controlled environment without actually having access to any of the data. For 
example, while doing some recent work on Apache Impala I was trying to optimize 
the transmission and storage of block locations and tokens and measure the 
results based on metadata from a production user. In order to achieve this, it 
would be useful for the NN to expose a developer-only (undocumented) 
configuration to generate fake block locations and return them to callers. The 
"fake" locations should be randomly distributed across a fixed set of fake 
datanodes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-13747) Statistic for list_located_status is incremented incorrectly by listStatusIterator

2018-07-18 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-13747:
--

 Summary: Statistic for list_located_status is incremented 
incorrectly by listStatusIterator
 Key: HDFS-13747
 URL: https://issues.apache.org/jira/browse/HDFS-13747
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 3.0.3
Reporter: Todd Lipcon






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-13703) Avoid allocation of CorruptedBlocks hashmap when no corrupted blocks are hit

2018-06-26 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-13703:
--

 Summary: Avoid allocation of CorruptedBlocks hashmap when no 
corrupted blocks are hit
 Key: HDFS-13703
 URL: https://issues.apache.org/jira/browse/HDFS-13703
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: performance
Reporter: Todd Lipcon
Assignee: Todd Lipcon


The DFSClient creates a CorruptedBlocks object, which contains a HashMap, on 
every read call. In most cases, a read will not hit any corrupted blocks, and 
this hashmap is not used. It seems the JIT isn't smart enough to eliminate this 
allocation. We would be better off avoiding it and only allocating in the rare 
case when a corrupt block is hit.

Removing this allocation reduced CPU usage of a TeraValidate job by about 10%.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-13702) HTrace hooks taking 10-15% CPU in DFS client when disabled

2018-06-26 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-13702:
--

 Summary: HTrace hooks taking 10-15% CPU in DFS client when disabled
 Key: HDFS-13702
 URL: https://issues.apache.org/jira/browse/HDFS-13702
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: performance
Affects Versions: 3.0.0
Reporter: Todd Lipcon


I am seeing DFSClient.newReaderTraceScope take ~15% CPU in a teravalidate 
workload even when HTrace is disabled. This is because it stringifies several 
integers. We should avoid all allocation and stringification when htrace is 
disabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-13701) Removal of logging guards regressed performance

2018-06-26 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-13701:
--

 Summary: Removal of logging guards regressed performance
 Key: HDFS-13701
 URL: https://issues.apache.org/jira/browse/HDFS-13701
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: performance
Affects Versions: 3.0.0
Reporter: Todd Lipcon


HDFS-8971 removed various logging guards from hot methods in the DFS client. In 
theory using a format string with {} placeholders is equivalent, but in fact 
it's not equivalent when one or more of the variable arguments are primitives. 
To be passed as part of the varargs array, the primitives need to be boxed. I 
am seeing Integer.valueOf() inside BlockReaderLocal.read taking ~3% of CPU.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-3653) 1.x: Add a retention period for purged edit logs

2018-04-23 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-3653.
---
Resolution: Won't Fix

> 1.x: Add a retention period for purged edit logs
> 
>
> Key: HDFS-3653
> URL: https://issues.apache.org/jira/browse/HDFS-3653
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 1.1.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Major
>
> Occasionally we have a bug which causes something to go wrong with edits 
> files. Even more occasionally the bug is such that the namenode mistakenly 
> deletes an {{edits}} file without merging it into {{fsimage}} properly -- e.g 
> if the bug mistakenly writes an OP_INVALID at the top of the log.
> In trunk/2.0 we retain many edit log segments going back in time to be more 
> robust to this kind of error. I'd like to implement something similar (but 
> much simpler) in 1.x, which would be used only by HDFS developers in 
> root-causing or repairing from these rare scenarios: the NN should never 
> directly delete an edit log file. Instead, it should rename the file into 
> some kind of "trash" directory inside the name dir, and associate it with a 
> timestamp. Then, periodically a separate thread should scan the trash dirs 
> and delete any logs older than a configurable time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-3069) If an edits file has more edits in it than expected by its name, should trigger an error

2018-04-23 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-3069.
---
  Resolution: Won't Fix
Target Version/s:   (was: )

> If an edits file has more edits in it than expected by its name, should 
> trigger an error
> 
>
> Key: HDFS-3069
> URL: https://issues.apache.org/jira/browse/HDFS-3069
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 0.23.0, 2.0.0-alpha
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Major
>
> In testing what happens in HA split brain scenarios, I ended up with an edits 
> log that was named edits_47-47 but actually had two edits in it (#47 and 
> #48). The edits loading process should detect this situation and barf. 
> Otherwise, the problem shows up later during loading or even on the next 
> restart, and is tough to fix.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-957) FSImage layout version should be only once file is complete

2015-02-11 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-957.
--
Resolution: Won't Fix

 FSImage layout version should be only once file is complete
 ---

 Key: HDFS-957
 URL: https://issues.apache.org/jira/browse/HDFS-957
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: hdfs-957.txt


 Right now, the FSImage save code writes the LAYOUT_VERSION at the head of the 
 file, along with some other headers, and then dumps the directory into the 
 file. Instead, it should write a special IMAGE_IN_PROGRESS entry for the 
 layout version, dump all of the data, then seek back to the head of the file 
 to write the proper LAYOUT_VERSION. This would make it very easy to detect 
 the case where the FSImage save got interrupted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-3528) Use native CRC32 in DFS write path

2014-08-28 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-3528.
---

   Resolution: Fixed
Fix Version/s: 2.6.0

 Use native CRC32 in DFS write path
 --

 Key: HDFS-3528
 URL: https://issues.apache.org/jira/browse/HDFS-3528
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, hdfs-client, performance
Affects Versions: 2.0.0-alpha
Reporter: Todd Lipcon
Assignee: James Thomas
 Fix For: 2.6.0


 HDFS-2080 improved the CPU efficiency of the read path by using native 
 SSE-enabled code for CRC verification. Benchmarks of the write path show that 
 it's often CPU bound by checksums as well, so we should make the same 
 improvement there.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HDFS-3278) Umbrella Jira for HDFS-HA Phase 2

2014-05-16 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-3278.
---

   Resolution: Fixed
Fix Version/s: 2.1.0-beta
 Assignee: Todd Lipcon  (was: Sanjay Radia)

These subtasks were completed quite a while back.

 Umbrella Jira for HDFS-HA Phase 2
 -

 Key: HDFS-3278
 URL: https://issues.apache.org/jira/browse/HDFS-3278
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Sanjay Radia
Assignee: Todd Lipcon
 Fix For: 2.1.0-beta


 HDFS-1623 gives a high level architecture and design for hot automatic 
 failover of the NN. Branch HDFS-1623 was merged into trunk for tactical 
 reasons even though the work for HA was not complete, Branch HDFS-1623 
 contained mechanisms for keeping a standby Hot (ie read from shared journal), 
 dual block reports, fencing of DNs, Zookeeper library for leader election 
 etc. This Umbrella jira covers the remaining work for HA and will link all 
 the jiras for the remaining work. Unlike HDFS-1623 no single branch will be 
 created - work will proceed in parallel branches.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-5790) LeaseManager.findPath is very slow when many leases need recovery

2014-01-16 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-5790:
-

 Summary: LeaseManager.findPath is very slow when many leases need 
recovery
 Key: HDFS-5790
 URL: https://issues.apache.org/jira/browse/HDFS-5790
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode, performance
Affects Versions: 2.4.0
Reporter: Todd Lipcon


We recently saw an issue where the NN restarted while tens of thousands of 
files were open. The NN then ended up spending multiple seconds for each 
commitBlockSynchronization() call, spending most of its time inside 
LeaseManager.findPath(). findPath currently works by looping over all files 
held for a given writer, and traversing the filesystem for each one. This takes 
way too long when tens of thousands of files are open by a single writer.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5287) JN need not validate finalized log segments in newEpoch

2013-10-01 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-5287:
-

 Summary: JN need not validate finalized log segments in newEpoch
 Key: HDFS-5287
 URL: https://issues.apache.org/jira/browse/HDFS-5287
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: qjm
Affects Versions: 2.1.1-beta
Reporter: Todd Lipcon
Priority: Minor


In {{scanStorageForLatestEdits}}, the JN will call {{validateLog}} on the last 
log segment, regardless of whether it is finalized. If it's finalized, then 
this is a needless pass over the logs which can adversely affect failover time 
for a graceful failover.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Resolved] (HDFS-3656) ZKFC may write a null breadcrumb znode

2013-08-14 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-3656.
---

  Resolution: Duplicate
Target Version/s:   (was: )

Yep, I think you're right. Thanks.

 ZKFC may write a null breadcrumb znode
 

 Key: HDFS-3656
 URL: https://issues.apache.org/jira/browse/HDFS-3656
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: auto-failover
Affects Versions: 2.0.0-alpha
Reporter: Todd Lipcon

 A user [reported|https://issues.cloudera.org/browse/DISTRO-412] an NPE trying 
 to read the breadcrumb znode in the failover controller. This happened 
 repeatedly, implying that an earlier process set the znode to null - probably 
 some race, though I don't see anything obvious in the code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-5074) Allow starting up from an fsimage checkpoint in the middle of a segment

2013-08-06 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-5074:
-

 Summary: Allow starting up from an fsimage checkpoint in the 
middle of a segment
 Key: HDFS-5074
 URL: https://issues.apache.org/jira/browse/HDFS-5074
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, namenode
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Todd Lipcon


We've seen the following behavior a couple times:
- SBN is running and somehow encounters an error in the middle of replaying an 
edit log in the tailer (eg the JN it's reading from crashes)
- SBN successfully has processed half of the edits in the segment it was 
reading.
- SBN saves a checkpoint, which now falls in the middle of a segment, and then 
restarts

Upon restart, the SBN will load this checkpoint which falls in the middle of a 
segment. {{selectInputStreams}} then fails when the SBN requests a mid-segment 
txid.

We should handle this case by downloading the right segment and fast-forwarding 
to the correct txid.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-5058) QJM should validate startLogSegment() more strictly

2013-08-02 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-5058:
-

 Summary: QJM should validate startLogSegment() more strictly
 Key: HDFS-5058
 URL: https://issues.apache.org/jira/browse/HDFS-5058
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: qjm
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Todd Lipcon
Assignee: Todd Lipcon


We've seen a small handful of times a case where one of the NNs in an HA 
cluster ends up with an fsimage checkpoint that falls in the middle of an edit 
segment. We're not sure yet how this happens, but one issue can happen as a 
result:
- Node has fsimage_500. Cluster has edits_1-1000, edits_1001_inprogress
- Node restarts, loads fsimage_500
- Node wants to become active. It calls selectInputStreams(500). Currently, 
this API logs a WARN that 500 falls in the middle of the 1-1000 segment, but 
continues and returns no results.
- Node calls startLogSegment(501).

Currently, the QJM will accept this (incorrectly). The node then crashes when 
it first tries to journal a real transaction, but it ends up leaving the 
edits_501_inprogress lying around, potentially causing more issues later.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-5037) Active NN should trigger its own edit log rolls

2013-07-26 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-5037:
-

 Summary: Active NN should trigger its own edit log rolls
 Key: HDFS-5037
 URL: https://issues.apache.org/jira/browse/HDFS-5037
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha, namenode
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Todd Lipcon


We've seen cases where the SBN/2NN went down, and then users accumulated very 
very large edit log segments. This causes a slow startup time because the last 
edit log segment must be read fully to recover it before the NN can start up 
again. Additionally, in the case of QJM, it can trigger timeouts on recovery or 
edit log syncing because the very-large segment has to get processed within a 
certain time bound.

We could easily improve this by having the NN trigger its own edit log rolls on 
a configurable size (eg every 256MB)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4982) JournalNode should relogin from keytab before fetching logs from other JNs

2013-07-11 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-4982:
-

 Summary: JournalNode should relogin from keytab before fetching 
logs from other JNs
 Key: HDFS-4982
 URL: https://issues.apache.org/jira/browse/HDFS-4982
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: journal-node, security
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Todd Lipcon
Assignee: Todd Lipcon


We've seen an issue in a secure cluster where, after a failover, the new NN 
isn't able to properly coordinate QJM recovery. The JNs fail to fetch logs from 
each other due to apparently not having a Kerberos TGT. It seems that we need 
to add the {{checkTGTAndReloginFromKeytab}} call prior to making the HTTP 
connection, since the java HTTP stuff doesn't do an automatic relogin

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4915) Add config to ZKFC to disable fencing

2013-06-18 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-4915:
-

 Summary: Add config to ZKFC to disable fencing
 Key: HDFS-4915
 URL: https://issues.apache.org/jira/browse/HDFS-4915
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha
Affects Versions: 3.0.0
Reporter: Todd Lipcon


With QuorumJournalManager, it's not important for the ZKFCs to perform any 
fencing. We currently workaround this by setting the fencer to /bin/true, but 
the ZKFC still does things like create breadcrumb znodes, etc. It would be 
simpler to add a config to disable fencing, and then the ZKFC's job would be 
simpler

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4879) Add blocked ArrayList collection to avoid CMS full GCs

2013-06-04 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-4879:
-

 Summary: Add blocked ArrayList collection to avoid CMS full GCs
 Key: HDFS-4879
 URL: https://issues.apache.org/jira/browse/HDFS-4879
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.0.4-alpha, 3.0.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon


We recently saw an issue where a large deletion was issued which caused 25M 
blocks to be collected during {{deleteInternal}}. Currently, the list of 
collected blocks is an ArrayList, meaning that we had to allocate a contiguous 
25M-entry array (~400MB). After a NN has been running for a long amount of 
time, the old generation may become fragmented such that it's hard to find a 
400MB contiguous chunk of heap.

In general, we should try to design the NN such that the only large objects are 
long-lived and created at startup time. We can improve this particular case 
(and perhaps some others) by introducing a new List implementation which is 
made of a linked list of arrays, each of which is size-limited (eg to 1MB).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-4184) Add the ability for Client to provide more hint information for DataNode to manage the OS buffer cache more accurate

2013-05-28 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-4184.
---

Resolution: Duplicate

 Add the ability for Client to provide more hint information for DataNode to 
 manage the OS buffer cache more accurate
 

 Key: HDFS-4184
 URL: https://issues.apache.org/jira/browse/HDFS-4184
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: binlijin

 HDFS now has the ability to use posix_fadvise and sync_data_range syscalls to 
 manage the OS buffer cache.
 {code}
 When hbase read hlog the data we can set dfs.datanode.drop.cache.behind.reads 
 to true to drop data out of the buffer cache when performing sequential reads.
 When hbase write hlog we can set dfs.datanode.drop.cache.behind.writes to 
 true to drop data out of the buffer cache after writing
 When hbase read hfile during compaction we can set 
 dfs.datanode.readahead.bytes to a non-zero value to trigger readahead for 
 sequential reads, and also set dfs.datanode.drop.cache.behind.reads to true 
 to drop data out of the buffer cache when performing sequential reads.
 and so on... 
 {code}
 Current we can only set these feature global in datanode,we should set these 
 feature per session.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4828) Make QJM epoch-related errors more understandable

2013-05-16 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-4828:
-

 Summary: Make QJM epoch-related errors more understandable
 Key: HDFS-4828
 URL: https://issues.apache.org/jira/browse/HDFS-4828
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: qjm
Affects Versions: 3.0.0, 2.0.5-beta
Reporter: Todd Lipcon


Since we started running QJM on production clusters, we've found that users are 
very confused by some of the error messages that it produces. For example, when 
a failover occurs and an old NN gets fenced out, it sees errors about its epoch 
being out of date. We should amend these errors to add text like This is 
likely because another NameNode took over as Active. Potentially we can even 
include the other NN's hostname, timestamp it became active, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4833) Corrupt blocks are not invalidated when first processing repl queues

2013-05-16 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-4833:
-

 Summary: Corrupt blocks are not invalidated when first processing 
repl queues
 Key: HDFS-4833
 URL: https://issues.apache.org/jira/browse/HDFS-4833
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.0.4-alpha
Reporter: Todd Lipcon


When the NN processes misreplicated blocks in {{processMisReplicatedBlock}} (eg 
during initial startup when first processing repl queues), it does not 
invalidate corrupt replicas unless the block is also over-replicated. This can 
result in replicas stuck in corrupt state forever if they were that way when 
the cluster booted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4799) Corrupt replica can be prematurely removed from corruptReplicas map

2013-05-05 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-4799:
-

 Summary: Corrupt replica can be prematurely removed from 
corruptReplicas map
 Key: HDFS-4799
 URL: https://issues.apache.org/jira/browse/HDFS-4799
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.0.4-alpha
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Blocker


We saw the following sequence of events in a cluster result in losing the most 
recent genstamp of a block:
- client is writing to a pipeline of 3
- the pipeline had nodes fail over some period of time, such that it left 3 
old-genstamp replicas on the original three nodes, having recruited 3 new 
replicas with a later genstamp.
-- so, we have 6 total replicas in the cluster, three with old genstamps on 
downed nodes, and 3 with the latest genstamp
- cluster reboots, and the nodes with old genstamps blockReport first. The 
replicas are correctly added to the corrupt replicas map since they have a 
too-old genstamp
- the nodes with the new genstamp block report. When the latest one block 
reports, chooseExcessReplicates is called and incorrectly decides to remove the 
three good replicas, leaving only the old-genstamp replicas.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4643) Fix flakiness in TestQuorumJournalManager

2013-03-27 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-4643:
-

 Summary: Fix flakiness in TestQuorumJournalManager
 Key: HDFS-4643
 URL: https://issues.apache.org/jira/browse/HDFS-4643
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: qjm, test
Affects Versions: 2.0.3-alpha, 3.0.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Trivial


TestQuorumJournalManager can occasionally fail if two consecutive test cases 
pick the same port number for the JournalNodes. In this case, sometimes an IPC 
client can be cached from a previous test case, and then fail when it tries to 
make an IPC over that cached connection to the now-broken connection. We need 
to more carefully call close() on all the QJMs to prevent this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-4538) allow use of legacy blockreader

2013-03-27 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-4538.
---

  Resolution: Fixed
Hadoop Flags: Reviewed

Committed to HDFS-347 branch.

 allow use of legacy blockreader
 ---

 Key: HDFS-4538
 URL: https://issues.apache.org/jira/browse/HDFS-4538
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, hdfs-client, performance
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-4538.001.patch, HDFS-4538.002.patch, 
 HDFS-4538.003.patch, HDFS-4538.004.patch


 Some users might want to use the legacy block reader, because it is available 
 on Windows, whereas the secure solution has not yet been implemented there.  
 As described in the mailing list discussion, let's enable this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-4617) warning while purging logs with QJM enabled

2013-03-27 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-4617.
---

Resolution: Duplicate

ATM points out that I already found this bug 3 months ago... resolving as 
duplicate with HDFS-4298

 warning while purging logs with QJM enabled
 ---

 Key: HDFS-4617
 URL: https://issues.apache.org/jira/browse/HDFS-4617
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, namenode, qjm
Affects Versions: 2.0.3-alpha
Reporter: Todd Lipcon

 HDFS-2946 changed the way that edit log purging is calculated, such that it 
 calls selectInputStreams() with an arbitrary transaction ID calculated 
 relative to the current one. The JournalNodes will reject such a request if 
 that transaction ID falls in the middle of a segment (which it usually will). 
 This means that selectInputStreams gets an exception, and the QJM journal 
 manager is not included in this calculation. Additionally, a warning will be 
 logged.
 Purging itself still happens, because the detailed information on remote logs 
 is not necessary to calculate a retention interval, but the feature from 
 HDFS-2946 may not work as intended.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4618) default for checkpoint txn interval is too low

2013-03-20 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-4618:
-

 Summary: default for checkpoint txn interval is too low
 Key: HDFS-4618
 URL: https://issues.apache.org/jira/browse/HDFS-4618
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.0.4-alpha
Reporter: Todd Lipcon
Assignee: Todd Lipcon


The default checkpoint interval is currently set to 40k transactions. That's 
way too low (I don't know what idiot set it to that.. oh wait, it was me...)

The old default in 1.0 is 64MB. Assuming an average of 100 bytes per txn, we 
should have the txn-count based interval default to at least 640,000. I'd like 
to change to 1M as a nice round number.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4621) additional logging to help diagnose slow QJM logSync

2013-03-20 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-4621:
-

 Summary: additional logging to help diagnose slow QJM logSync
 Key: HDFS-4621
 URL: https://issues.apache.org/jira/browse/HDFS-4621
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha
Affects Versions: 2.0.3-alpha
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor


I've been working on diagnosing an issue with a cluster which is seeing slow 
logSync calls occasionally to QJM. Adding a few more pieces of logging would 
help this:
- in the warning messages on the client side leading up to a timeout, include 
which nodes have responded and which ones are still pending
- on the server side, when we actually call FileChannel.force, log a warning if 
the sync takes longer than 1 second

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4617) warning while purging logs with QJM enabled

2013-03-19 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-4617:
-

 Summary: warning while purging logs with QJM enabled
 Key: HDFS-4617
 URL: https://issues.apache.org/jira/browse/HDFS-4617
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, namenode
Affects Versions: 2.0.3-alpha
Reporter: Todd Lipcon


HDFS-2946 changed the way that edit log purging is calculated, such that it 
calls selectInputStreams() with an arbitrary transaction ID calculated relative 
to the current one. The JournalNodes will reject such a request if that 
transaction ID falls in the middle of a segment (which it usually will). This 
means that selectInputStreams gets an exception, and the QJM journal manager is 
not included in this calculation. Additionally, a warning will be logged.

Purging itself still happens, because the detailed information on remote logs 
is not necessary to calculate a retention interval, but the feature from 
HDFS-2946 may not work as intended.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-4496) DFSClient: don't create a domain socket unless we need it

2013-02-12 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-4496.
---

  Resolution: Fixed
Hadoop Flags: Reviewed

 DFSClient: don't create a domain socket unless we need it
 -

 Key: HDFS-4496
 URL: https://issues.apache.org/jira/browse/HDFS-4496
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, hdfs-client, performance
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-4496.001.patch


 If we don't have conf.domainSocketDataTraffic or conf.shortCircuitLocalReads 
 set, the client shouldn't create a domain socket because we couldn't use it.  
 This is only an issue if you misconfigure things, but it's still good to fix.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4485) HDFS-347: DN should chmod socket path a+w

2013-02-08 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-4485:
-

 Summary: HDFS-347: DN should chmod socket path a+w
 Key: HDFS-4485
 URL: https://issues.apache.org/jira/browse/HDFS-4485
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Reporter: Todd Lipcon
Assignee: Colin Patrick McCabe
Priority: Critical


In cluster-testing HDFS-347, we found that in clusters where the MR job doesn't 
run as the same user as HDFS, clients wouldn't use short circuit read because 
of a 'permission denied' error connecting to the socket. It turns out that, in 
order to connect to a socket, clients need write permissions on the socket file.

The DN should set these permissions automatically after it creates the socket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4486) Add log category for long-running DFSClient notices

2013-02-08 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-4486:
-

 Summary: Add log category for long-running DFSClient notices
 Key: HDFS-4486
 URL: https://issues.apache.org/jira/browse/HDFS-4486
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Todd Lipcon
Priority: Minor


There are a number of features in the DFS client which are transparent but can 
make a fairly big difference for performance -- two in particular are short 
circuit reads and native checksumming. Because we don't want log spew for 
clients like hadoop fs -cat we currently log only at DEBUG level when these 
features are disabled. This makes it difficult to troubleshoot/verify for 
long-running perf-sensitive clients like HBase.

One simple solution is to add a new log category - eg 
o.a.h.h.DFSClient.PerformanceAdvisory - which long-running clients could enable 
at DEBUG level without getting the full debug spew.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-4433) make TestPeerCache not flaky

2013-01-23 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-4433.
---

  Resolution: Fixed
Hadoop Flags: Reviewed

 make TestPeerCache not flaky
 

 Key: HDFS-4433
 URL: https://issues.apache.org/jira/browse/HDFS-4433
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, hdfs-client, performance
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-4433.001.patch


 TestPeerCache is flaky now because it relies on using the same global cache 
 for every test function.  So the cache timeout can't be set to something 
 different for each test.
 Also, we should implement equals and hashCode for {{FakePeer}}, since 
 otherwise {{testMultiplePeersWithSameDnId}} is not really testing what 
 happens when multiple equal peers are inserted into the cache.  (The default 
 equals is object equality).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-4416) change dfs.datanode.domain.socket.path to dfs.domain.socket.path

2013-01-21 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-4416.
---

  Resolution: Fixed
Hadoop Flags: Reviewed

Committed to branch. Thanks, Colin.

 change dfs.datanode.domain.socket.path to dfs.domain.socket.path
 

 Key: HDFS-4416
 URL: https://issues.apache.org/jira/browse/HDFS-4416
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, hdfs-client, performance
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-4416.001.patch, HDFS-4416.002.patch, 
 HDFS-4416.003.patch, HDFS-4416.004.patch


 {{dfs.datanode.domain.socket.path}} is used by both clients and the DataNode, 
 so it might be best to avoid putting 'datanode' in the name.  Most of the 
 configuration keys that have 'datanode' in the name apply only to the DN.
 Also, should change __PORT__ to _PORT to be consistent with _HOST, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-4418) HDFS-347: increase default FileInputStreamCache size

2013-01-17 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-4418.
---

  Resolution: Fixed
Hadoop Flags: Reviewed

Committed to branch.

 HDFS-347: increase default FileInputStreamCache size
 

 Key: HDFS-4418
 URL: https://issues.apache.org/jira/browse/HDFS-4418
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, hdfs-client, performance
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: hdfs-4418.txt


 The FileInputStreamCache currently defaults to holding only 10 input stream 
 pairs (corresponding to 10 blocks). In many HBase workloads, the region 
 server will be issuing random reads against a local file which is 2-4GB in 
 size or even larger (hence 20+ blocks).
 Given that the memory usage for caching these input streams is low, and 
 applications like HBase tend to already increase their ulimit -n 
 substantially (eg up to 32,000), I think we should raise the default cache 
 size to 50 or more. In the rare case that someone has an application which 
 uses local reads with hundreds of open blocks and can't feasibly raise their 
 ulimit -n, they can lower the limit appropriately.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4417) HDFS-347: fix case where local reads get disabled incorrectly

2013-01-16 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-4417:
-

 Summary: HDFS-347: fix case where local reads get disabled 
incorrectly
 Key: HDFS-4417
 URL: https://issues.apache.org/jira/browse/HDFS-4417
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Todd Lipcon
Assignee: Todd Lipcon


In testing HDFS-347 against HBase (thanks [~jdcryans]) we ran into the 
following case:
- a workload is running which puts a bunch of local sockets in the PeerCache
- the workload abates for a while, causing the sockets to go stale (ie the DN 
side disconnects after the keepalive timeout)
- the workload starts again

In this case, the local socket retrieved from the cache failed the 
newBlockReader call, and it incorrectly disabled local sockets on that host. 
This is similar to an earlier bug HDFS-3376, but not quite the same.

The next issue we ran into is that, once this happened, it never tried local 
sockets again, because the cache held lots of TCP sockets. Since we always 
managed to get a cached socket to the local node, it didn't bother trying local 
read again.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4418) HDFS-347: increase default FileInputStreamCache size

2013-01-16 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-4418:
-

 Summary: HDFS-347: increase default FileInputStreamCache size
 Key: HDFS-4418
 URL: https://issues.apache.org/jira/browse/HDFS-4418
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Todd Lipcon
Assignee: Todd Lipcon


The FileInputStreamCache currently defaults to holding only 10 input stream 
pairs (corresponding to 10 blocks). In many HBase workloads, the region server 
will be issuing random reads against a local file which is 2-4GB in size or 
even larger (hence 20+ blocks).

Given that the memory usage for caching these input streams is low, and 
applications like HBase tend to already increase their ulimit -n substantially 
(eg up to 32,000), I think we should raise the default cache size to 50 or 
more. In the rare case that someone has an application which uses local reads 
with hundreds of open blocks and can't feasibly raise their ulimit -n, they can 
lower the limit appropriately.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-4400) DFSInputStream#getBlockReader: last retries should ignore the cache

2013-01-14 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-4400.
---

  Resolution: Fixed
Hadoop Flags: Reviewed

Committed to branch

 DFSInputStream#getBlockReader: last retries should ignore the cache
 ---

 Key: HDFS-4400
 URL: https://issues.apache.org/jira/browse/HDFS-4400
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-4400.001.patch


 In {{DFSInputStream#getBlockReader}}, the last tries to get a {{BlockReader}} 
 should ignore the cache.  This was broken by HDFS-4356, it seems.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-4401) Fix bug in DomainSocket path validation

2013-01-14 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-4401.
---

  Resolution: Fixed
Hadoop Flags: Reviewed

Committed to branch. Thanks for the fix.

 Fix bug in DomainSocket path validation
 ---

 Key: HDFS-4401
 URL: https://issues.apache.org/jira/browse/HDFS-4401
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-4401.001.patch


 DomainSocket path validation currently does not validate the second-to-last 
 path component.  This leads to insecure socket paths being accepted.  It 
 should validate all path components prior to the final one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-4402) some small DomainSocket fixes: avoid findbugs warning, change log level, etc.

2013-01-14 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-4402.
---

  Resolution: Fixed
Hadoop Flags: Reviewed

Committed to branch, thanks.

 some small DomainSocket fixes: avoid findbugs warning, change log level, etc.
 -

 Key: HDFS-4402
 URL: https://issues.apache.org/jira/browse/HDFS-4402
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-4402.001.patch, HDFS-4402.002.patch


 Some miscellaneous fixes:
 * findbugs complains about a short-circuit operator in {{DomainSocket.java}} 
 for some reason.  We don't need it (it doesn't help optimization since the 
 expressions lack side-effects), so let's ditch it to avoid the findbugs 
 warning.
 * change the log level of one error message to warn
 * BlockReaderLocal should use a BufferedInputStream to read the metadata file 
 header, to avoid doing multiple small reads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4403) DFSClient can infer checksum type when not provided by reading first byte

2013-01-14 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-4403:
-

 Summary: DFSClient can infer checksum type when not provided by 
reading first byte
 Key: HDFS-4403
 URL: https://issues.apache.org/jira/browse/HDFS-4403
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.0.2-alpha
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor


HDFS-3177 added the checksum type to OpBlockChecksumResponseProto, but the new 
protobuf field is optional, with a default of CRC32. This means that this API, 
when used against an older cluster (like earlier 0.23 releases) will falsely 
return CRC32 even if that cluster has written files with CRC32C. This can cause 
issues for distcp, for example.

Instead of defaulting the protobuf field to CRC32, we can leave it with no 
default, and if the OpBlockChecksumResponseProto has no checksum type set, the 
client can send OP_READ_BLOCK to read the first byte of the block, then grab 
the checksum type out of that response (which has always been present)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-4356) BlockReaderLocal should use passed file descriptors rather than paths

2013-01-11 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-4356.
---

  Resolution: Fixed
Hadoop Flags: Reviewed

 BlockReaderLocal should use passed file descriptors rather than paths
 -

 Key: HDFS-4356
 URL: https://issues.apache.org/jira/browse/HDFS-4356
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, hdfs-client, performance
Affects Versions: 2.0.3-alpha
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: 04b-cumulative.patch, _04b.patch, _04c.patch, 
 04-cumulative.patch, 04d-cumulative.patch, _04e.patch, 04f-cumulative.patch, 
 _04f.patch, 04g-cumulative.patch, _04g.patch


 {{BlockReaderLocal}} should use file descriptors passed over UNIX domain 
 sockets rather than paths.  We also need some configuration options for these 
 UNIX domain sockets.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-4388) DomainSocket should throw AsynchronousCloseException when appropriate

2013-01-11 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-4388.
---

  Resolution: Fixed
Hadoop Flags: Reviewed

Committed to branch. Thanks.

 DomainSocket should throw AsynchronousCloseException when appropriate
 -

 Key: HDFS-4388
 URL: https://issues.apache.org/jira/browse/HDFS-4388
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Trivial
 Attachments: _05a.patch


 {{DomainSocket}} should throw {{AsynchronousCloseException}} when appropriate 
 (i.e., when an {{accept}} or other blocking operation is interrupted by a 
 concurrent close.)  This is nicer than throwing a generic {{IOException}} or 
 {{SocketException}}.
 Similarly, we should well throw {{ClosedChannelException}} when an operation 
 is attempted on a closed {{DomainSocket}}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-4390) Bypass UNIX domain socket unit tests when they cannot be run

2013-01-11 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-4390.
---

  Resolution: Fixed
Hadoop Flags: Reviewed

Committed to branch

 Bypass UNIX domain socket unit tests when they cannot be run
 

 Key: HDFS-4390
 URL: https://issues.apache.org/jira/browse/HDFS-4390
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: _06.patch


 Testing revealed that the existing mechanisms for bypassing UNIX domain 
 socket-related tests when they are not available are inadequate.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4380) Opening a file for read before writer writes a block causes NPE

2013-01-09 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-4380:
-

 Summary: Opening a file for read before writer writes a block 
causes NPE
 Key: HDFS-4380
 URL: https://issues.apache.org/jira/browse/HDFS-4380
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 1.0.3
Reporter: Todd Lipcon


JD Cryans found this issue: it seems like, if you open a file for read 
immediately after it's been created by the writer, after a block has been 
allocated, but before the block is created on the DNs, then you can end up with 
the following NPE:

java.lang.NullPointerException
   at 
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.updateBlockInfo(DFSClient.java:1885)
   at 
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1858)
   at 
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.init(DFSClient.java:1834)
   at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:578)
   at 
org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:154)

This seems to be because {{getBlockInfo}} returns a null block when the DN 
doesn't yet have the replica. The client should probably either fall back to a 
different replica or treat it as zero-length.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Reopened] (HDFS-4352) Encapsulate arguments to BlockReaderFactory in a class

2013-01-08 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon reopened HDFS-4352:
---


By the way, I find it _very_ rude to close someone else's ticket as Invalid 
or Wont fix without waiting for the discussion to end. Just because you don't 
like a change doesn't give you license to do this.

 Encapsulate arguments to BlockReaderFactory in a class
 --

 Key: HDFS-4352
 URL: https://issues.apache.org/jira/browse/HDFS-4352
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Affects Versions: 2.0.3-alpha
Reporter: Colin Patrick McCabe
 Attachments: 01b.patch, 01.patch


 Encapsulate the arguments to BlockReaderFactory in a class to avoid having to 
 pass around 10+ arguments to a few different functions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4324) Track and report out-of-date blocks separately from corrupt blocks

2012-12-18 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-4324:
-

 Summary: Track and report out-of-date blocks separately from 
corrupt blocks
 Key: HDFS-4324
 URL: https://issues.apache.org/jira/browse/HDFS-4324
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 3.0.0
Reporter: Todd Lipcon


Currently in various places (metrics, dfsadmin -report, fsck, logs) we use the 
term corrupt to refer to blocks which have an out-of-date generation stamp. 
Since out-of-date blocks are a fairly normal occurrence if a DN restarts while 
data is being written, we should be avoid using 'scary' works like _corrupt_. 
This may need both some textual changes as well as some internal changes to 
count the corruption types distinctly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4305) Add a configurable limit on number of blocks per file, and min block size

2012-12-11 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-4305:
-

 Summary: Add a configurable limit on number of blocks per file, 
and min block size
 Key: HDFS-4305
 URL: https://issues.apache.org/jira/browse/HDFS-4305
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.0.2-alpha, 1.0.4, 3.0.0
Reporter: Todd Lipcon
Priority: Minor


We recently had an issue where a user set the block size very very low and 
managed to create a single file with hundreds of thousands of blocks. This 
caused problems with the edit log since the OP_ADD op was so large (HDFS-4304). 
I imagine it could also cause efficiency issues in the NN. To prevent users 
from making such mistakes, we should:
- introduce a configurable minimum block size, below which requests are rejected
- introduce a configurable maximum number of blocks per file, above which 
requests to add another block are rejected (with a suitably high default as to 
not prevent legitimate large files)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4298) StorageRetentionManager spews warnings when used with QJM

2012-12-10 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-4298:
-

 Summary: StorageRetentionManager spews warnings when used with QJM
 Key: HDFS-4298
 URL: https://issues.apache.org/jira/browse/HDFS-4298
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 3.0.0, 2.0.3-alpha
Reporter: Todd Lipcon
Assignee: Aaron T. Myers


When the NN is configured with a QJM, we see the following warning message 
every time a checkpoint is made or uploaded:
12/12/10 16:07:52 WARN namenode.FSEditLog: Unable to determine input streams 
from QJM to [127.0.0.1:13001, 127.0.0.1:13002, 127.0.0.1:13003]. Skipping.
org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many exceptions 
to achieve quorum size 2/3. 3 exceptions thrown:
127.0.0.1:13002: Asked for firstTxId 114837 which is in the middle of file 
/tmp/jn-2/myjournal/current/edits_0095185-0114846
...

This is because, since HDFS-2946, the NN calls {{selectInputStreams}} to 
determine the number of log segments and put a cap on the number. This API 
throws an exception in the case of QJM if the argument falls in the middle of 
an edit log boundary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-3571) Allow EditLogFileInputStream to read from a remote URL

2012-12-05 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-3571.
---

   Resolution: Fixed
Fix Version/s: 2.0.3-alpha

Committed backport to branch-2. Thanks for reviewing.

 Allow EditLogFileInputStream to read from a remote URL
 --

 Key: HDFS-3571
 URL: https://issues.apache.org/jira/browse/HDFS-3571
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha, namenode
Affects Versions: 3.0.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: 3.0.0, 2.0.3-alpha

 Attachments: hdfs-3571-branch-2.txt, hdfs-3571.txt, hdfs-3571.txt


 In order to start up from remote edits storage (like the JournalNodes of 
 HDFS-3077), the NN needs to be able to load edits from a URL, instead of just 
 local disk. This JIRA extends EditLogFileInputStream to be able to use a URL 
 reference in addition to the current File reference.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-3077) Quorum-based protocol for reading and writing edit logs

2012-12-05 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-3077.
---

   Resolution: Fixed
Fix Version/s: 2.0.3-alpha

Committed backport to branch-2. Thanks for looking at the backport patch, 
Andrew and Aaron.

 Quorum-based protocol for reading and writing edit logs
 ---

 Key: HDFS-3077
 URL: https://issues.apache.org/jira/browse/HDFS-3077
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: ha, namenode
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: 3.0.0, QuorumJournalManager (HDFS-3077), 2.0.3-alpha

 Attachments: hdfs-3077-branch-2.txt, hdfs-3077-partial.txt, 
 hdfs-3077-test-merge.txt, hdfs-3077.txt, hdfs-3077.txt, hdfs-3077.txt, 
 hdfs-3077.txt, hdfs-3077.txt, hdfs-3077.txt, hdfs-3077.txt, 
 qjournal-design.pdf, qjournal-design.pdf, qjournal-design.pdf, 
 qjournal-design.pdf, qjournal-design.pdf, qjournal-design.pdf, 
 qjournal-design.tex, qjournal-design.tex


 Currently, one of the weak points of the HA design is that it relies on 
 shared storage such as an NFS filer for the shared edit log. One alternative 
 that has been proposed is to depend on BookKeeper, a ZooKeeper subproject 
 which provides a highly available replicated edit log on commodity hardware. 
 This JIRA is to implement another alternative, based on a quorum commit 
 protocol, integrated more tightly in HDFS and with the requirements driven 
 only by HDFS's needs rather than more generic use cases. More details to 
 follow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Reopened] (HDFS-4110) Refine JNStorage log

2012-12-05 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon reopened HDFS-4110:
---


Reopening to backport to branch-2

 Refine JNStorage log
 

 Key: HDFS-4110
 URL: https://issues.apache.org/jira/browse/HDFS-4110
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: journal-node
Affects Versions: 3.0.0, 2.0.3-alpha
Reporter: liang xie
Assignee: liang xie
Priority: Trivial
  Labels: newbie
 Fix For: 3.0.0

 Attachments: HDFS-4110.txt


 Abstract class Storage has a toString method: 
 {quote}
 return Storage Directory  + this.root;
 {quote}
 and in the subclass JNStorage we could see:
 {quote}
 LOG.info(Formatting journal storage directory  + 
 sd +  with nsid:  + getNamespaceID());
 {quote}
 that'll print sth like Formatting journal storage directory Storage 
 Directory x
 Just one line change to:
 {quota}
 LOG.info(Formatting journal  + sd +  with nsid:  + getNamespaceID());
 {quota}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-4110) Refine JNStorage log

2012-12-05 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-4110.
---

   Resolution: Fixed
Fix Version/s: 2.0.3-alpha

Committed backport to branch-2 (same patch applied)

 Refine JNStorage log
 

 Key: HDFS-4110
 URL: https://issues.apache.org/jira/browse/HDFS-4110
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: journal-node
Affects Versions: 3.0.0, 2.0.3-alpha
Reporter: liang xie
Assignee: liang xie
Priority: Trivial
  Labels: newbie
 Fix For: 3.0.0, 2.0.3-alpha

 Attachments: HDFS-4110.txt


 Abstract class Storage has a toString method: 
 {quote}
 return Storage Directory  + this.root;
 {quote}
 and in the subclass JNStorage we could see:
 {quote}
 LOG.info(Formatting journal storage directory  + 
 sd +  with nsid:  + getNamespaceID());
 {quote}
 that'll print sth like Formatting journal storage directory Storage 
 Directory x
 Just one line change to:
 {quota}
 LOG.info(Formatting journal  + sd +  with nsid:  + getNamespaceID());
 {quota}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Reopened] (HDFS-3571) Allow EditLogFileInputStream to read from a remote URL

2012-12-03 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon reopened HDFS-3571:
---


Reopening for merge to branch-2 (this is needed for QJM in branch-2)

 Allow EditLogFileInputStream to read from a remote URL
 --

 Key: HDFS-3571
 URL: https://issues.apache.org/jira/browse/HDFS-3571
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha, namenode
Affects Versions: 3.0.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: 3.0.0

 Attachments: hdfs-3571.txt, hdfs-3571.txt


 In order to start up from remote edits storage (like the JournalNodes of 
 HDFS-3077), the NN needs to be able to load edits from a URL, instead of just 
 local disk. This JIRA extends EditLogFileInputStream to be able to use a URL 
 reference in addition to the current File reference.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4176) EditLogTailer should call rollEdits with a timeout

2012-11-12 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-4176:
-

 Summary: EditLogTailer should call rollEdits with a timeout
 Key: HDFS-4176
 URL: https://issues.apache.org/jira/browse/HDFS-4176
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, name-node
Affects Versions: 2.0.2-alpha, 3.0.0
Reporter: Todd Lipcon


When the EditLogTailer thread calls rollEdits() on the active NN via RPC, it 
currently does so without a timeout. So, if the active NN has frozen (but not 
actually crashed), this call can hang forever. This can then potentially 
prevent the standby from becoming active.

This may actually considered a side effect of HADOOP-6762 -- if the RPC were 
interruptible, that would also fix the issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4169) Add per-disk latency metrics to DataNode

2012-11-08 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-4169:
-

 Summary: Add per-disk latency metrics to DataNode
 Key: HDFS-4169
 URL: https://issues.apache.org/jira/browse/HDFS-4169
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node
Affects Versions: 3.0.0
Reporter: Todd Lipcon


Currently, if one of the drives on the DataNode is slow, it's hard to determine 
what the issue is. This can happen due to a failing disk, bad controller, etc. 
It would be preferable to expose per-drive MXBeans (or tagged metrics) with 
latency statistics about how long reads/writes are taking.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4128) 2NN gets stuck in inconsistent state if edit log replay fails in the middle

2012-10-29 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-4128:
-

 Summary: 2NN gets stuck in inconsistent state if edit log replay 
fails in the middle
 Key: HDFS-4128
 URL: https://issues.apache.org/jira/browse/HDFS-4128
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 2.0.2-alpha
Reporter: Todd Lipcon


We saw the following issue in a cluster:
- The 2NN downloads an edit log segment:
{code}
2012-10-29 12:30:57,433 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: 
Reading /xxx/current/edits_00049136809-00049176162 
expecting start txid #49136809
{code}
- It fails in the middle of replay due to an OOME:
{code}
2012-10-29 12:31:21,021 ERROR 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception 
on operation AddOp [length=0, path=/
java.lang.OutOfMemoryError: Java heap space
{code}
- Future checkpoints then fail because the prior edit log replay only got 
halfway through the stream:
{code}
2012-10-29 12:32:21,214 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: 
Reading /x/current/edits_00049176163-00049177224 expecting 
start txid #49144432
2012-10-29 12:32:21,216 ERROR 
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in 
doCheckpoint
java.io.IOException: There appears to be a gap in the edit log.  We expected 
txid 49144432, but got txid 49176163.
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4049) hflush performance regression due to nagling delays

2012-10-14 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-4049:
-

 Summary: hflush performance regression due to nagling delays
 Key: HDFS-4049
 URL: https://issues.apache.org/jira/browse/HDFS-4049
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, performance
Affects Versions: 2.0.2-alpha, 3.0.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Critical


HDFS-3721 reworked the way that packets are mirrored through the pipeline in 
the datanode. This caused two write() calls where there used to be one, which 
interacts badly with nagling so that there are 40ms bubbles on hflush() calls. 
We didn't notice this in the tests because the hflush perf test only uses a 
single datanode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4025) QJM: Sychronize past log segments to JNs that missed them

2012-10-10 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-4025:
-

 Summary: QJM: Sychronize past log segments to JNs that missed them
 Key: HDFS-4025
 URL: https://issues.apache.org/jira/browse/HDFS-4025
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Todd Lipcon
Assignee: Todd Lipcon


Currently, if a JournalManager crashes and misses some segment of logs, and 
then comes back, it will be re-added as a valid part of the quorum on the next 
log roll. However, it will not have a complete history of log segments (i.e any 
individual JN may have gaps in its transaction history). This mirrors the 
behavior of the NameNode when there are multiple local directories specified.

However, it would be better if a background thread noticed these gaps and 
filled them in by grabbing the segments from other JournalNodes. This 
increases the resilience of the system when JournalNodes get reformatted or 
otherwise lose their local disk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-4017) Unclosed FileInputStream in GetJournalEditServlet

2012-10-08 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-4017.
---

   Resolution: Fixed
Fix Version/s: QuorumJournalManager (HDFS-3077)
 Hadoop Flags: Reviewed

Committed to branch, thanks Chao.

 Unclosed FileInputStream in GetJournalEditServlet
 -

 Key: HDFS-4017
 URL: https://issues.apache.org/jira/browse/HDFS-4017
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Chao Shi
Priority: Trivial
 Fix For: QuorumJournalManager (HDFS-3077)

 Attachments: hdfs-4017.txt


 The FileInputStream to read editFile is not closed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-4004) TestJournalNode#testJournal fails because of test case execution order

2012-10-03 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-4004.
---

  Resolution: Fixed
Target Version/s: QuorumJournalManager (HDFS-3077)
Hadoop Flags: Reviewed

Committed to the branch, thanks Chao!

 TestJournalNode#testJournal fails because of test case execution order
 --

 Key: HDFS-4004
 URL: https://issues.apache.org/jira/browse/HDFS-4004
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Reporter: Chao Shi
Assignee: Chao Shi
Priority: Minor
 Fix For: QuorumJournalManager (HDFS-3077)

 Attachments: hdfs-4004.txt


 I'm running HDFS test on HDFS-3077 branch. I found 
 TestJournalNode#testJournal fails sometimes. The assertion failed is:  
 MetricsAsserts.assertCounter(BatchesWritten, 0L, metrics);
 The reason is when testHttpServer is running before testJournal, it will 
 write some logs to JN. The fix is simple: assign a new JID for each test 
 case, so that they will use different metrics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-3972) Trash emptier fails in secure cluster

2012-09-25 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-3972:
-

 Summary: Trash emptier fails in secure cluster
 Key: HDFS-3972
 URL: https://issues.apache.org/jira/browse/HDFS-3972
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 2.0.1-alpha, 3.0.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Critical


In a secure cluster, we're seeing the following issue on the NN when the trash 
emptier tries to run:

WARN org.apache.hadoop.fs.TrashPolicyDefault: Trash can't list homes: 
java.io.IOException: Failed on local exception: java.io.IOException: 
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: 
No valid credentials provided (Mechanism level: Failed to find any Kerberos 
tgt)]; Host Details : local host \
is: x; destination host is: :8020;  Sleeping.

The issue seems to be that the trash emptier thread sends RPCs back to itself, 
but isn't wrapped in a doAs. Credit goes to Stephen Chu for discovering this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-3974) dfsadmin -metasave throws NPE when under-replicated blocks are recently deleted

2012-09-25 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-3974:
-

 Summary: dfsadmin -metasave throws NPE when under-replicated 
blocks are recently deleted
 Key: HDFS-3974
 URL: https://issues.apache.org/jira/browse/HDFS-3974
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 3.0.0
Reporter: Todd Lipcon
Priority: Minor


We currently have the following race:
- Block is underreplicated, hence it's present in neededReplications
- User deletes the block - its BlockInfo.blockCollection is set to null
- Admin runs metaSave before the replication monitor runs.

This causes an NPE since block.getBlockCollection() for one of the 
neededReplication blocks has become null.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-3969) Small bug fixes and improvements for disk locations API

2012-09-24 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-3969:
-

 Summary: Small bug fixes and improvements for disk locations API
 Key: HDFS-3969
 URL: https://issues.apache.org/jira/browse/HDFS-3969
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 3.0.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon


The new disk block locations API has a configurable timeout, but it's used 
inconsistently: the invokeAll() call to the thread pool assumes the timeout is 
in seconds, but the RPC timeout is set in milliseconds.

Also, we can improve the wire protocol for this API to be a lot more efficient.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-3967) NN should bail our earlier when logs to load have a gap

2012-09-21 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-3967:
-

 Summary: NN should bail our earlier when logs to load have a gap
 Key: HDFS-3967
 URL: https://issues.apache.org/jira/browse/HDFS-3967
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 2.0.1-alpha, 3.0.0
Reporter: Todd Lipcon
Priority: Minor


i was testing an HA setup with a lowered edit log retention period, and ended 
up in a state where one of the two NNs had fallen too far behind, such that it 
couldn't start up again (due to the too-low retention period). When I started 
the NN, I got the following:

12/09/21 13:03:20 INFO namenode.FSImage: Loaded image for txid 45781083 from 
/tmp/name1-name/current/fsimage_00045781083
12/09/21 13:03:20 INFO namenode.FSImage: Reading 
org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@239a0feb 
expecting start txid #45781084
12/09/21 13:03:20 INFO namenode.EditLogInputStream: Fast-forwarding stream 
'http://localhost:13081/getJournal?jid=myjournalsegmentTxId=45928954storageInfo=-40%3A292785232%3A0%3ACID-0553884b-f3ea-46a3-9154-200d4f84304b,
 
http://localhost:13082/getJournal?jid=myjournalsegmentTxId=45928954storageInfo=-40%3A292785232%3A0%3ACID-0553884b-f3ea-46a3-9154-200d4f84304b,
 
http://localhost:13083/getJournal?jid=myjournalsegmentTxId=45928954storageInfo=-40%3A292785232%3A0%3ACID-0553884b-f3ea-46a3-9154-200d4f84304b'
 to transaction ID 45781084
12/09/21 13:03:20 INFO namenode.EditLogInputStream: Fast-forwarding stream 
'http://localhost:13081/getJournal?jid=myjournalsegmentTxId=45928954storageInfo=-40%3A292785232%3A0%3ACID-0553884b-f3ea-46a3-9154-200d4f84304b'
 to transaction ID 45781084
12/09/21 13:03:20 FATAL namenode.NameNode: Exception in namenode join
java.io.IOException: There appears to be a gap in the edit log.  We expected 
txid 45781084, but got txid 45928954.

Rather than trying to 'fast forward' the stream to a transaction which is 
actually prior to the first tx, we should bail earlier with a nicer error.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-3962) NN should periodically check writability of 'required' journals

2012-09-20 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-3962:
-

 Summary: NN should periodically check writability of 'required' 
journals
 Key: HDFS-3962
 URL: https://issues.apache.org/jira/browse/HDFS-3962
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha, name-node
Affects Versions: 2.0.1-alpha, 3.0.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon


Currently, our HA design ensures write fencing by having the failover 
controller call a fencing script before transitioning a new node to active. 
However, if the fencing script is based on storage fencing (and not stonith), 
there is no _read_ fencing. That is to say, the old active may continue to 
believe himself active for an unbounded amount of time, assuming that it does 
not try to write to its edit log.

This isn't super problematic, but it would be beneficial for monitoring, etc, 
to have the old NN periodically check the writability of any required 
journals, and abort if they become unwritable, even if there are no writes 
coming into the system.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-3950) QJM: misc TODO cleanup, improved log messages, etc

2012-09-19 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-3950.
---

   Resolution: Fixed
Fix Version/s: QuorumJournalManager (HDFS-3077)
 Hadoop Flags: Reviewed

Committed to branch, thanks for the reviews

 QJM: misc TODO cleanup, improved log messages, etc
 --

 Key: HDFS-3950
 URL: https://issues.apache.org/jira/browse/HDFS-3950
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor
 Fix For: QuorumJournalManager (HDFS-3077)

 Attachments: hdfs-3950.txt, hdfs-3950.txt


 General JIRA for a bunch of miscellaneous clean-up in the QJM branch:
 - fix most remaining TODOs
 - improve some log/error messages
 - add some more sanity checks where appropriate
 - address any findbugs that might have crept into branch

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-3955) QJM: Make acceptRecovery() atomic

2012-09-19 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-3955.
---

   Resolution: Fixed
Fix Version/s: QuorumJournalManager (HDFS-3077)
 Hadoop Flags: Reviewed

 QJM: Make acceptRecovery() atomic
 -

 Key: HDFS-3955
 URL: https://issues.apache.org/jira/browse/HDFS-3955
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: QuorumJournalManager (HDFS-3077)

 Attachments: hdfs-3955.txt


 Per one of the TODOs in Journal.java, there is currently a lack of atomicity 
 in the {{acceptRecovery()}} code path. In particular, we have the following 
 actions executed non-atomically:
 - Download a new edits_inprogress_N from some other node
 - Persist the paxos recovery file to disk.
 If the JN crashes between these two steps, then we may be left in the state 
 whereby the edits_inprogress file has different data than the Paxos data left 
 over on the disk from a previous recovery attempt. This causes the next 
 {{prepareRecovery()}} to fail with an AssertionError.
 I discovered this by randomly injecting a fault between the two steps, and 
 then running the randomized fault test on a cluster. This resulted in some 
 AssertionErrors in the test logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-3956) QJM: purge temporary files when no longer within retention period

2012-09-19 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-3956.
---

   Resolution: Fixed
Fix Version/s: QuorumJournalManager (HDFS-3077)
 Hadoop Flags: Reviewed

Committed to branch, thanks Eli.

 QJM: purge temporary files when no longer within retention period
 -

 Key: HDFS-3956
 URL: https://issues.apache.org/jira/browse/HDFS-3956
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor
 Fix For: QuorumJournalManager (HDFS-3077)

 Attachments: hdfs-3956.txt


 After doing a bunch of fault testing, I noticed that the JNs had a bunch of 
 temporary files left around in their journal directories which were no longer 
 within the retention period. For example, if a JN crashes in the middle of 
 recovery, it can leave around a file like {{edits_inprogress_123.epoch=10}}. 
 These files are handy to keep around for forensics/debugging while they are 
 still in their retention period, but we should not leave them forever. The 
 normal purging policy should apply.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-3950) QJM: misc TODO cleanup, improved log messages, etc

2012-09-18 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-3950:
-

 Summary: QJM: misc TODO cleanup, improved log messages, etc
 Key: HDFS-3950
 URL: https://issues.apache.org/jira/browse/HDFS-3950
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor


General JIRA for a bunch of miscellaneous clean-up in the QJM branch:
- fix most remaining TODOs
- improve some log/error messages
- add some more sanity checks where appropriate
- address any findbugs that might have crept into branch

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-3955) QJM: Make acceptRecovery() atomic

2012-09-18 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-3955:
-

 Summary: QJM: Make acceptRecovery() atomic
 Key: HDFS-3955
 URL: https://issues.apache.org/jira/browse/HDFS-3955
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Todd Lipcon
Assignee: Todd Lipcon


Per one of the TODOs in Journal.java, there is currently a lack of atomicity in 
the {{acceptRecovery()}} code path. In particular, we have the following 
actions executed non-atomically:
- Download a new edits_inprogress_N from some other node
- Persist the paxos recovery file to disk.

If the JN crashes between these two steps, then we may be left in the state 
whereby the edits_inprogress file has different data than the Paxos data left 
over on the disk from a previous recovery attempt. This causes the next 
{{prepareRecovery()}} to fail with an AssertionError.

I discovered this by randomly injecting a fault between the two steps, and then 
running the randomized fault test on a cluster. This resulted in some 
AssertionErrors in the test logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-3956) QJM: purge temporary files when no longer within retention period

2012-09-18 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-3956:
-

 Summary: QJM: purge temporary files when no longer within 
retention period
 Key: HDFS-3956
 URL: https://issues.apache.org/jira/browse/HDFS-3956
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor


After doing a bunch of fault testing, I noticed that the JNs had a bunch of 
temporary files left around in their journal directories which were no longer 
within the retention period. For example, if a JN crashes in the middle of 
recovery, it can leave around a file like {{edits_inprogress_123.epoch=10}}. 
These files are handy to keep around for forensics/debugging while they are 
still in their retention period, but we should not leave them forever. The 
normal purging policy should apply.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-3958) Integrate upgrade/finalize/rollback with external journals

2012-09-18 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-3958:
-

 Summary: Integrate upgrade/finalize/rollback with external journals
 Key: HDFS-3958
 URL: https://issues.apache.org/jira/browse/HDFS-3958
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 3.0.0
Reporter: Todd Lipcon


Currently the NameNode upgrade/rollback/finalize framework only supports local 
storage. With edits being stored in pluggable Journals, this could create 
certain difficulties - in particular, rollback wouldn't actually rollback the 
external storage to the old state.

We should look at how to expose the right hooks to the external journal storage 
to snapshot/rollback/finalize.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-3943) QJM: remove currently unused md5sum field.

2012-09-17 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-3943:
-

 Summary: QJM: remove currently unused md5sum field.
 Key: HDFS-3943
 URL: https://issues.apache.org/jira/browse/HDFS-3943
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Todd Lipcon
Assignee: Todd Lipcon


Per later discussion in HDFS-3859, it turns out to be rather difficult to 
integrate md5sum verification into QJM at this point. The crux of the issue is 
that different replicas may be semantically identical, but bytewise unequal due 
to the padding at the end of the file.

Given this, I'd like to temporarily remove the md5sum field from the protocol 
while we work on the more complex verification (which ignores the trailing 
padding) in HDFS-3859.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-3943) QJM: remove currently unused md5sum field.

2012-09-17 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-3943.
---

   Resolution: Fixed
Fix Version/s: QuorumJournalManager (HDFS-3077)
 Hadoop Flags: Reviewed

Committed to branch, thx.

 QJM: remove currently unused md5sum field.
 

 Key: HDFS-3943
 URL: https://issues.apache.org/jira/browse/HDFS-3943
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: QuorumJournalManager (HDFS-3077)

 Attachments: hdfs-3943.txt


 Per later discussion in HDFS-3859, it turns out to be rather difficult to 
 integrate md5sum verification into QJM at this point. The crux of the issue 
 is that different replicas may be semantically identical, but bytewise 
 unequal due to the padding at the end of the file.
 Given this, I'd like to temporarily remove the md5sum field from the 
 protocol while we work on the more complex verification (which ignores the 
 trailing padding) in HDFS-3859.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-3894) QJM: testRecoverAfterDoubleFailures can be flaky due to IPC client caching

2012-09-13 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-3894.
---

   Resolution: Fixed
Fix Version/s: QuorumJournalManager (HDFS-3077)
 Hadoop Flags: Reviewed

Committed to branch, thx for review

 QJM: testRecoverAfterDoubleFailures can be flaky due to IPC client caching
 --

 Key: HDFS-3894
 URL: https://issues.apache.org/jira/browse/HDFS-3894
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: test
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: QuorumJournalManager (HDFS-3077)

 Attachments: hdfs-3894.txt


 TestQJMWithFaults.testRecoverAfterDoubleFailures fails really occasionally. 
 Looking into it, the issue seems to be that it's possible by random chance 
 for an IPC server port to be reused between two different iterations of the 
 test loop. The client will then pick up and re-use the existing IPC 
 connection to the old server. However, the old server was shut down and 
 restarted, so the old IPC connection is stale (ie disconnected). This causes 
 the new client to get an EOF when it sends the format() call.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-3906) QJM: quorum timeout on failover with large log segment

2012-09-11 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-3906.
---

   Resolution: Fixed
Fix Version/s: QuorumJournalManager (HDFS-3077)
 Hadoop Flags: Reviewed

Committed to branch, thanks

 QJM: quorum timeout on failover with large log segment
 --

 Key: HDFS-3906
 URL: https://issues.apache.org/jira/browse/HDFS-3906
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Critical
 Fix For: QuorumJournalManager (HDFS-3077)

 Attachments: hdfs-3906.txt


 In doing some stress tests, I ran into an issue with failover if the current 
 edit log segment written by the old active is large. With a 327MB log segment 
 containing 6.4M transactions, the JN took ~11 seconds to read and validate it 
 during the recovery step. This was longer than the 10 second timeout for 
 createNewEpoch, which caused the recovery to fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-3840) JournalNodes log JournalNotFormattedException backtrace error before being formatted

2012-09-11 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-3840.
---

   Resolution: Fixed
Fix Version/s: QuorumJournalManager (HDFS-3077)
 Hadoop Flags: Reviewed

Committed to branch, thanks for the quick review.

 JournalNodes log JournalNotFormattedException backtrace error before being 
 formatted
 

 Key: HDFS-3840
 URL: https://issues.apache.org/jira/browse/HDFS-3840
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Stephen Chu
Assignee: Todd Lipcon
 Fix For: QuorumJournalManager (HDFS-3077)

 Attachments: hdfs-3840.txt


 I started 3 JournalNodes for the first time. Then I formatted the NN. The 
 JournalNodes, log the following error backtrace:
 {noformat}
 [root@cs-10-20-193-121 ~]# sudo -u hdfs hdfs journalnode
 12/08/22 00:52:22 INFO impl.MetricsConfig: loaded properties from 
 hadoop-metrics2.properties
 12/08/22 00:52:22 INFO impl.MetricsSystemImpl: Scheduled snapshot period at 
 10 second(s).
 12/08/22 00:52:22 INFO impl.MetricsSystemImpl: JournalNode metrics system 
 started
 12/08/22 00:52:22 INFO server.JournalNodeHttpServer: Starting web server as: 
 hdfs
 12/08/22 00:52:22 INFO mortbay.log: Logging to 
 org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via 
 org.mortbay.log.Slf4jLog
 12/08/22 00:52:22 INFO http.HttpServer: Added global filter 'safety' 
 (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
 12/08/22 00:52:22 INFO http.HttpServer: Added filter static_user_filter 
 (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to 
 context journal
 12/08/22 00:52:22 INFO http.HttpServer: Added filter static_user_filter 
 (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to 
 context static
 12/08/22 00:52:22 INFO http.HttpServer: Added filter static_user_filter 
 (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to 
 context logs
 12/08/22 00:52:22 INFO http.HttpServer: Jetty bound to port 8480
 12/08/22 00:52:22 INFO mortbay.log: jetty-6.1.26.cloudera.1
 12/08/22 00:52:23 INFO mortbay.log: Started 
 SelectChannelConnector@qjm6.cs1cloud.internal:8480
 12/08/22 00:52:23 INFO server.JournalNodeHttpServer: Journal Web-server up 
 at: qjm6.cs1cloud.internal/10.20.193.121:8480:8480
 12/08/22 00:52:23 INFO ipc.Server: Starting Socket Reader #1 for port 8485
 12/08/22 00:52:23 INFO ipc.Server: IPC Server Responder: starting
 12/08/22 00:52:23 INFO ipc.Server: IPC Server listener on 8485: starting
 12/08/22 00:52:41 INFO server.JournalNode: Initializing journal in directory 
 /dfs/jn/journal
 12/08/22 00:52:41 INFO common.Storage: Storage directory /dfs/jn/journal does 
 not exist.
 12/08/22 00:52:41 ERROR security.UserGroupInformation: 
 PriviledgedActionException as:hdfs (auth:SIMPLE) 
 cause:org.apache.hadoop.hdfs.qjournal.protocol.JournalNotFormattedException: 
 Journal lv=0;cid=;nsid=0;c=0 not formatted
 12/08/22 00:52:41 INFO ipc.Server: IPC Server handler 0 on 8485, call 
 org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocol.getJournalState 
 from 10.20.187.169:44857: error: 
 org.apache.hadoop.hdfs.qjournal.protocol.JournalNotFormattedException: 
 Journal lv=0;cid=;nsid=0;c=0 not formatted
 org.apache.hadoop.hdfs.qjournal.protocol.JournalNotFormattedException: 
 Journal lv=0;cid=;nsid=0;c=0 not formatted
 at 
 org.apache.hadoop.hdfs.qjournal.server.Journal.checkFormatted(Journal.java:265)
 at 
 org.apache.hadoop.hdfs.qjournal.server.Journal.getLastPromisedEpoch(Journal.java:152)
 at 
 org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.getJournalState(JournalNodeRpcServer.java:97)
 at 
 org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.getJournalState(QJournalProtocolServerSideTranslatorPB.java:71)
 at 
 org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:12230)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687)
 12/08/22 00:52:41 INFO server.Journal: Formatting 
 

[jira] [Resolved] (HDFS-3898) QJM: enable TCP_NODELAY for IPC

2012-09-10 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-3898.
---

   Resolution: Fixed
Fix Version/s: QuorumJournalManager (HDFS-3077)
 Hadoop Flags: Reviewed

Committed to branch, thanks for the reviews

 QJM: enable TCP_NODELAY for IPC
 ---

 Key: HDFS-3898
 URL: https://issues.apache.org/jira/browse/HDFS-3898
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Blocker
 Fix For: QuorumJournalManager (HDFS-3077)

 Attachments: hdfs-3898.txt, hdfs-3898.txt


 Currently, if the size of the edits batches is larger than the MTU, it can 
 result in 40ms delays due to interaction between nagle's algorithm and 
 delayed ack. Enabling TCP_NODELAY on the sockets solves this issue, so we 
 should set those configs by default for all of the QJM-related IPC.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-3885) QJM: optimize log sync when JN is lagging behind

2012-09-10 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-3885.
---

   Resolution: Fixed
Fix Version/s: QuorumJournalManager (HDFS-3077)
 Hadoop Flags: Reviewed

Committed to branch, thanks for the review.

 QJM: optimize log sync when JN is lagging behind
 

 Key: HDFS-3885
 URL: https://issues.apache.org/jira/browse/HDFS-3885
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: QuorumJournalManager (HDFS-3077)

 Attachments: hdfs-3885.txt


 This is a potential optimization that we can add to the JournalNode: when one 
 of the nodes is lagging behind the others (eg because its local disk is 
 slower or there was a network blip), it receives edits after they've been 
 committed to a majority. It can tell this because the committed txid included 
 in the request info is higher than the highest txid in the actual batch to be 
 written. In this case, we know that this batch has already been fsynced to a 
 quorum of nodes, so we can skip the fsync() on the laggy node, helping it to 
 catch back up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-3900) QJM: avoid validating log segments on log rolls

2012-09-10 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-3900.
---

   Resolution: Fixed
Fix Version/s: QuorumJournalManager (HDFS-3077)
 Hadoop Flags: Reviewed

Committed to branch, thanks

 QJM: avoid validating log segments on log rolls
 ---

 Key: HDFS-3900
 URL: https://issues.apache.org/jira/browse/HDFS-3900
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: QuorumJournalManager (HDFS-3077)

 Attachments: hdfs-3900.txt


 Currently, we are paranoid and validate every log segment when it is 
 finalized. For the a log segment that has been written entirely by one 
 writer, with no recovery in between, this is overly paranoid (we don't do 
 this for local journals). It also causes log rolls to be slow and take time 
 linear in the size of the segment. Instead, we should optimize this path to 
 simply trust that the segment is correct so long as the txids match up as 
 expected.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-3914) QJM: acceptRecovery should abort current segment

2012-09-10 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-3914:
-

 Summary: QJM: acceptRecovery should abort current segment
 Key: HDFS-3914
 URL: https://issues.apache.org/jira/browse/HDFS-3914
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Todd Lipcon
Assignee: Todd Lipcon


Found this bug with randomized testing. The following sequence causes a problem:

- JN writing segment starting at txid 1, and successfully wrote txid 1, but no 
more
- JN becomes partitioned from NN, and a new NN takes over
- new NN is also partitioned for the prepareRecovery phase of recovery, but 
properly connects for the acceptRecovery call
- acceptRecovery copies over a longer log segment (eg txns 1-3) from a good 
logger
- new NN calls finalizeLogSegment(), but gets the following error: 
JournalOutOfSyncException: Trying to finalize in-progress log segment 1 to end 
at txid 3 but only written up to txid 1

This is because the syncLog call (which copies the new segment) isn't 
properly aborting the old segment before replacing it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-3915) QJM: Failover fails with auth error in secure cluster

2012-09-10 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-3915:
-

 Summary: QJM: Failover fails with auth error in secure cluster
 Key: HDFS-3915
 URL: https://issues.apache.org/jira/browse/HDFS-3915
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha, security
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: hdfs-3915.txt

When testing failover in a secure cluster with QJM, we ran into the following 
error:
{code}
java.io.IOException: Exception trying to open authenticated connection to 
http://x:8480/getJournal?jid=journalsegmentTxId=4325storageInfo=-40%3A1049822920%3A0%3ACID-d7c84ac3-bb09-4d55-baae-0d561bb55e9b
at 
org.apache.hadoop.security.SecurityUtil.openSecureHttpConnection(SecurityUtil.java:510)
at 
org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream$URLLog$1.run(EditLogFileInputStream.java:376)
... at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:217)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.catchupDuringFailover(EditLogTailer.java:176)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:635)
Caused by: GSSException: No valid credentials provided (Mechanism level: Failed 
to find any Kerberos tgt)
{code}

The issue is that the EditLogFileInputStream uses the current user, which in 
the case of the failover trigger is the admin's remote user, rather than the 
NN's login user.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-3901) QJM: send 'heartbeat' messages to JNs even when they are out-of-sync

2012-09-10 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-3901.
---

   Resolution: Fixed
Fix Version/s: QuorumJournalManager (HDFS-3077)
 Hadoop Flags: Reviewed

Committed to branch, thanks for the reviews, Eli and ATM.

 QJM: send 'heartbeat' messages to JNs even when they are out-of-sync
 

 Key: HDFS-3901
 URL: https://issues.apache.org/jira/browse/HDFS-3901
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: QuorumJournalManager (HDFS-3077)

 Attachments: hdfs-3901.txt, hdfs-3901.txt


 Currently, if one of the JNs has fallen out of sync with the writer (eg 
 because it went down), it will be marked as such until the next log roll. 
 This causes the writer to no longer send any RPCs to it. This means that the 
 JN's metrics will no longer reflect up-to-date information on how far laggy 
 they are.
 This patch will introduce a heartbeat() RPC that has no effect except to 
 update the JN's view of the latest committed txid. When the writer is talking 
 to an out-of-sync logger, it will send these heartbeat messages once a second.
 In a future patch we can extend the heartbeat functionality so that NNs 
 periodically check their connections to JNs if no edits arrive, such that a 
 fenced NN won't accidentally continue to serve reads indefinitely.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-3899) QJM: Writer-side metrics

2012-09-10 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-3899.
---

   Resolution: Fixed
Fix Version/s: QuorumJournalManager (HDFS-3077)

Committed to branch, thx.

 QJM: Writer-side metrics
 

 Key: HDFS-3899
 URL: https://issues.apache.org/jira/browse/HDFS-3899
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: QuorumJournalManager (HDFS-3077)

 Attachments: hdfs-3899.txt, hdfs-3899.txt


 We already have some metrics on the server side (JournalNode) but it's useful 
 to also gather metrics from the client side (NameNode). This is important in 
 order to monitor that the client is seeing good performance from the 
 individual JNs, and so that administrators can set up alerts if any of the 
 JNs has become inaccessible to the NN.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-3914) QJM: acceptRecovery should abort current segment

2012-09-10 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-3914.
---

   Resolution: Fixed
Fix Version/s: QuorumJournalManager (HDFS-3077)
 Hadoop Flags: Reviewed

 QJM: acceptRecovery should abort current segment
 

 Key: HDFS-3914
 URL: https://issues.apache.org/jira/browse/HDFS-3914
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: QuorumJournalManager (HDFS-3077)

 Attachments: hdfs-3914.txt


 Found this bug with randomized testing. The following sequence causes a 
 problem:
 - JN writing segment starting at txid 1, and successfully wrote txid 1, but 
 no more
 - JN becomes partitioned from NN, and a new NN takes over
 - new NN is also partitioned for the prepareRecovery phase of recovery, but 
 properly connects for the acceptRecovery call
 - acceptRecovery copies over a longer log segment (eg txns 1-3) from a good 
 logger
 - new NN calls finalizeLogSegment(), but gets the following error: 
 JournalOutOfSyncException: Trying to finalize in-progress log segment 1 to 
 end at txid 3 but only written up to txid 1
 This is because the syncLog call (which copies the new segment) isn't 
 properly aborting the old segment before replacing it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-3904) QJM: journalnode does not die/log ERROR when keytab is not found in secure mode

2012-09-10 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-3904.
---

Resolution: Duplicate

 QJM: journalnode does not die/log ERROR when keytab is not found in secure 
 mode
 ---

 Key: HDFS-3904
 URL: https://issues.apache.org/jira/browse/HDFS-3904
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Eli Collins
Priority: Minor

 Credit to Stephen Chu for finding this. The journalnode was incorrectly 
 configured (misplaced the keytab) with security enabled, when started the 
 JournalNode didn't die. It stayed running and logged a WARN message:
 {noformat}
 2012-08-23 15:44:15,497 WARN org.mortbay.log: Failed startup of context 
 org.mortbay.jetty.webapp.WebAppContext@58c16b18{/,file:/usr/lib/hadoop-hdfs/webapps/journal}
 javax.servlet.ServletException: javax.servlet.ServletException: Keytab does 
 not exist: /etc/hadoop/conf/hdfs.keytab
   at 
 org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler.init(KerberosAuthenticationHandler.java:185)
   at 
 org.apache.hadoop.security.authentication.server.AuthenticationFilter.init(AuthenticationFilter.java:146)
   at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97)
   at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
   at 
 org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:713)
   at org.mortbay.jetty.servlet.Context.startContext(Context.java:140)
   at 
 org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1282)
   at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:518)
   at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:499)
   at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
   at 
 org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
   at 
 org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156)
   at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
   at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130)
   at org.mortbay.jetty.Server.doStart(Server.java:224)
   at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
   at org.apache.hadoop.http.HttpServer.start(HttpServer.java:657)
   at 
 org.apache.hadoop.hdfs.qjournal.server.JournalNodeHttpServer.start(JournalNodeHttpServer.java:83)
   at 
 org.apache.hadoop.hdfs.qjournal.server.JournalNode.start(JournalNode.java:138)
   at 
 org.apache.hadoop.hdfs.qjournal.server.JournalNode.run(JournalNode.java:120)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
   at 
 org.apache.hadoop.hdfs.qjournal.server.JournalNode.main(JournalNode.java:228)
 Caused by: javax.servlet.ServletException: Keytab does not exist: 
 /etc/hadoop/conf/hdfs.keytab
   at 
 org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler.init(KerberosAuthenticationHandler.java:153)
   ... 22 more
 {noformat}
 The other HDFS daemons, if I remember correctly, would die if they can't 
 authenticate.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-3915) QJM: Failover fails with auth error in secure cluster

2012-09-10 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-3915.
---

   Resolution: Fixed
Fix Version/s: QuorumJournalManager (HDFS-3077)
 Hadoop Flags: Reviewed

 QJM: Failover fails with auth error in secure cluster
 -

 Key: HDFS-3915
 URL: https://issues.apache.org/jira/browse/HDFS-3915
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha, security
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: QuorumJournalManager (HDFS-3077)

 Attachments: hdfs-3915.txt


 When testing failover in a secure cluster with QJM, we ran into the following 
 error:
 {code}
 java.io.IOException: Exception trying to open authenticated connection to 
 http://x:8480/getJournal?jid=journalsegmentTxId=4325storageInfo=-40%3A1049822920%3A0%3ACID-d7c84ac3-bb09-4d55-baae-0d561bb55e9b
   at 
 org.apache.hadoop.security.SecurityUtil.openSecureHttpConnection(SecurityUtil.java:510)
   at 
 org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream$URLLog$1.run(EditLogFileInputStream.java:376)
 ...   at 
 org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:217)
   at 
 org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.catchupDuringFailover(EditLogTailer.java:176)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:635)
 Caused by: GSSException: No valid credentials provided (Mechanism level: 
 Failed to find any Kerberos tgt)
 {code}
 The issue is that the EditLogFileInputStream uses the current user, which 
 in the case of the failover trigger is the admin's remote user, rather than 
 the NN's login user.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-3906) QJM: quorum timeout on failover with large log segment

2012-09-07 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-3906:
-

 Summary: QJM: quorum timeout on failover with large log segment
 Key: HDFS-3906
 URL: https://issues.apache.org/jira/browse/HDFS-3906
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Critical


In doing some stress tests, I ran into an issue with failover if the current 
edit log segment written by the old active is large. With a 327MB log segment 
containing 6.4M transactions, the JN took ~11 seconds to read and validate it 
during the recovery step. This was longer than the 10 second timeout for 
createNewEpoch, which caused the recovery to fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-3894) QJM: testRecoverAfterDoubleFailures can be flaky due to IPC client caching

2012-09-06 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-3894:
-

 Summary: QJM: testRecoverAfterDoubleFailures can be flaky due to 
IPC client caching
 Key: HDFS-3894
 URL: https://issues.apache.org/jira/browse/HDFS-3894
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: test
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Todd Lipcon
Assignee: Todd Lipcon


TestQJMWithFaults.testRecoverAfterDoubleFailures fails really occasionally. 
Looking into it, the issue seems to be that it's possible by random chance for 
an IPC server port to be reused between two different iterations of the test 
loop. The client will then pick up and re-use the existing IPC connection to 
the old server. However, the old server was shut down and restarted, so the old 
IPC connection is stale (ie disconnected). This causes the new client to get an 
EOF when it sends the format() call.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-3891) QJM: SBN fails if selectInputStreams throws RTE

2012-09-06 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-3891.
---

   Resolution: Fixed
Fix Version/s: QuorumJournalManager (HDFS-3077)
 Hadoop Flags: Reviewed

 QJM: SBN fails if selectInputStreams throws RTE
 ---

 Key: HDFS-3891
 URL: https://issues.apache.org/jira/browse/HDFS-3891
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: name-node
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: QuorumJournalManager (HDFS-3077)

 Attachments: hdfs-3891.txt, hdfs-3891.txt


 Currently, QJM's {{selectInputStream}} method throws an RTE if a quorum 
 cannot be reached. This propagates into the Standby Node and causes the whole 
 node to crash. It should handle this error appropriately.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-3726) QJM: if a logger misses an RPC, don't retry that logger until next segment

2012-09-06 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-3726.
---

   Resolution: Fixed
Fix Version/s: QuorumJournalManager (HDFS-3077)
 Hadoop Flags: Reviewed

Committed to branch, thanks for the review.

 QJM: if a logger misses an RPC, don't retry that logger until next segment
 --

 Key: HDFS-3726
 URL: https://issues.apache.org/jira/browse/HDFS-3726
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: QuorumJournalManager (HDFS-3077)

 Attachments: hdfs-3726.txt, hdfs-3726.txt


 Currently, if a logger misses an RPC in the middle of a log segment, or 
 misses the {{startLogSegment}} RPC (eg it was down or network was 
 disconnected during that time period), then it will throw an exception on 
 every subsequent {{journal()}} call in that segment, since it knows that it 
 missed some edits in the middle.
 We should change this exception to a specific IOE subclass, and have the 
 client side of QJM detect the situation and stop sending IPCs until the next 
 {{startLogSegment}} call.
 This isn't critical for correctness but will help reduce log spew on both 
 sides.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-3898) QJM: enable TCP_NODELAY for IPC

2012-09-06 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-3898:
-

 Summary: QJM: enable TCP_NODELAY for IPC
 Key: HDFS-3898
 URL: https://issues.apache.org/jira/browse/HDFS-3898
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Blocker


Currently, if the size of the edits batches is larger than the MTU, it can 
result in 40ms delays due to interaction between nagle's algorithm and delayed 
ack. Enabling TCP_NODELAY on the sockets solves this issue, so we should set 
those configs by default for all of the QJM-related IPC.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-3900) QJM: avoid validating log segments on log rolls

2012-09-06 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-3900:
-

 Summary: QJM: avoid validating log segments on log rolls
 Key: HDFS-3900
 URL: https://issues.apache.org/jira/browse/HDFS-3900
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Todd Lipcon
Assignee: Todd Lipcon


Currently, we are paranoid and validate every log segment when it is finalized. 
For the a log segment that has been written entirely by one writer, with no 
recovery in between, this is overly paranoid (we don't do this for local 
journals). It also causes log rolls to be slow and take time linear in the size 
of the segment. Instead, we should optimize this path to simply trust that the 
segment is correct so long as the txids match up as expected.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-3899) QJM: Writer-side metrics

2012-09-06 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-3899:
-

 Summary: QJM: Writer-side metrics
 Key: HDFS-3899
 URL: https://issues.apache.org/jira/browse/HDFS-3899
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Todd Lipcon
Assignee: Todd Lipcon


We already have some metrics on the server side (JournalNode) but it's useful 
to also gather metrics from the client side (NameNode). This is important in 
order to monitor that the client is seeing good performance from the individual 
JNs, and so that administrators can set up alerts if any of the JNs has become 
inaccessible to the NN.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


  1   2   3   4   >