[jira] [Resolved] (HDFS-14535) The default 8KB buffer in requestFileDescriptors#BufferedOutputStream is causing lots of heap allocation in HBase when using short-circut read
[ https://issues.apache.org/jira/browse/HDFS-14535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-14535. Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.3.0 > The default 8KB buffer in requestFileDescriptors#BufferedOutputStream is > causing lots of heap allocation in HBase when using short-circut read > -- > > Key: HDFS-14535 > URL: https://issues.apache.org/jira/browse/HDFS-14535 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14535.patch > > > Our HBase team are trying to read the blocks from HDFS into pooled offheap > ByteBuffers directly (HBASE-21879), and recently we had some benchmark, > found that almost 45% heap allocation from the DFS client. The heap > allocation flame graph can be see here: > https://issues.apache.org/jira/secure/attachment/12970295/async-prof-pid-25042-alloc-2.svg > After checking the code path, we found that when requesting file descriptors > from a DomainPeer, we allocated huge 8KB buffer for BufferedOutputStream, > though the protocal content was quite small and just few bytes. > It made a heavy GC pressure for HBase when cacheHitRatio < 60%, which > increased the HBase P999 latency. Actually, we can pre-allocate a small > buffer for the BufferedOutputStream, such as 512 bytes, it's enough to read > the short-circuit fd protocal content. we've created a patch like that, and > the allocation flame graph show that after the patch, the heap allocation > from DFS client dropped from 45% to 27%, that's a very good thing I think. > see: > https://issues.apache.org/jira/secure/attachment/12970475/async-prof-pid-24534-alloc-2.svg > Hope this attached patch can be merged into HDFS trunk, also Hadoop-2.8.x, > HBase will benifit a lot from this. > Thanks. > For more details, can see here: > https://issues.apache.org/jira/browse/HBASE-22387?focusedCommentId=16851639=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16851639 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-14482) Crash when using libhdfs with bad classpath
[ https://issues.apache.org/jira/browse/HDFS-14482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-14482. Resolution: Fixed > Crash when using libhdfs with bad classpath > --- > > Key: HDFS-14482 > URL: https://issues.apache.org/jira/browse/HDFS-14482 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: Todd Lipcon >Assignee: Sahil Takiar >Priority: Major > Fix For: 3.3.0 > > > HDFS-14304 added a call to initCachedClasses in getJNIEnv after creating the > env but before checking whether it's null. In the case that getJNIEnv() fails > to create an env, it returns NULL, and then we crash when calling > initCachedClasses() on line 555 > {code} > 551 state->env = getGlobalJNIEnv(); > 552 mutexUnlock(); > 553 > 554 jthrowable jthr = NULL; > 555 jthr = initCachedClasses(state->env); > 556 if (jthr) { > 557 printExceptionAndFree(state->env, jthr, PRINT_EXC_ALL, > 558 "initCachedClasses failed"); > 559 goto fail; > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14482) Crash when using libhdfs with bad classpath
Todd Lipcon created HDFS-14482: -- Summary: Crash when using libhdfs with bad classpath Key: HDFS-14482 URL: https://issues.apache.org/jira/browse/HDFS-14482 Project: Hadoop HDFS Issue Type: Bug Reporter: Todd Lipcon Assignee: Sahil Takiar HDFS-14304 added a call to initCachedClasses in getJNIEnv after creating the env but before checking whether it's null. In the case that getJNIEnv() fails to create an env, it returns NULL, and then we crash when calling initCachedClasses() on line 555 {code} 551 state->env = getGlobalJNIEnv(); 552 mutexUnlock(); 553 554 jthrowable jthr = NULL; 555 jthr = initCachedClasses(state->env); 556 if (jthr) { 557 printExceptionAndFree(state->env, jthr, PRINT_EXC_ALL, 558 "initCachedClasses failed"); 559 goto fail; {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14111) hdfsOpenFile on HDFS causes unnecessary IO from file offset 0
Todd Lipcon created HDFS-14111: -- Summary: hdfsOpenFile on HDFS causes unnecessary IO from file offset 0 Key: HDFS-14111 URL: https://issues.apache.org/jira/browse/HDFS-14111 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client, libhdfs Affects Versions: 3.2.0 Reporter: Todd Lipcon hdfsOpenFile() calls readDirect() with a 0-length argument in order to check whether the underlying stream supports bytebuffer reads. With DFSInputStream, the read(0) isn't short circuited, and results in the DFSClient opening a block reader. In the case of a remote block, the block reader will actually issue a read of the whole block, causing the datanode to perform unnecessary IO and network transfers in order to fill up the client's TCP buffers. This causes performance degradation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-10369) hdfsread crash when reading data reaches to 128M
[ https://issues.apache.org/jira/browse/HDFS-10369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-10369. Resolution: Invalid You're mallocing a buffer of 5 bytes here, seems your C code is just broken. > hdfsread crash when reading data reaches to 128M > > > Key: HDFS-10369 > URL: https://issues.apache.org/jira/browse/HDFS-10369 > Project: Hadoop HDFS > Issue Type: Bug > Components: fs >Reporter: vince zhang >Priority: Major > > see code below, it would crash after printf("hdfsGetDefaultBlockSize2:%d, > ret:%d\n", hdfsGetDefaultBlockSize(fs), ret); > > hdfsFile read_file = hdfsOpenFile(fs, "/testpath", O_RDONLY, 0, 0, 1); > int total = hdfsAvailable(fs, read_file); > printf("Total:%d\n", total); > char* buffer = (char*)malloc(sizeof(size+1) * sizeof(char)); > int ret = -1; > int len = 0; > ret = hdfsSeek(fs, read_file, 134152192); > printf("hdfsGetDefaultBlockSize1:%d, ret:%d\n", > hdfsGetDefaultBlockSize(fs), ret); > ret = hdfsRead(fs, read_file, (void*)buffer, size); > printf("hdfsGetDefaultBlockSize2:%d, ret:%d\n", > hdfsGetDefaultBlockSize(fs), ret); > ret = hdfsRead(fs, read_file, (void*)buffer, size); > printf("hdfsGetDefaultBlockSize3:%d, ret:%d\n", > hdfsGetDefaultBlockSize(fs), ret); > return 0; -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13826) Add a hidden configuration for NameNode to generate fake block locations
Todd Lipcon created HDFS-13826: -- Summary: Add a hidden configuration for NameNode to generate fake block locations Key: HDFS-13826 URL: https://issues.apache.org/jira/browse/HDFS-13826 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Todd Lipcon Assignee: Todd Lipcon In doing testing and benchmarking of the NameNode and dependent systems, it's often useful to be able to use an fsimage provided by some production system in a controlled environment without actually having access to any of the data. For example, while doing some recent work on Apache Impala I was trying to optimize the transmission and storage of block locations and tokens and measure the results based on metadata from a production user. In order to achieve this, it would be useful for the NN to expose a developer-only (undocumented) configuration to generate fake block locations and return them to callers. The "fake" locations should be randomly distributed across a fixed set of fake datanodes. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13747) Statistic for list_located_status is incremented incorrectly by listStatusIterator
Todd Lipcon created HDFS-13747: -- Summary: Statistic for list_located_status is incremented incorrectly by listStatusIterator Key: HDFS-13747 URL: https://issues.apache.org/jira/browse/HDFS-13747 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 3.0.3 Reporter: Todd Lipcon -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13703) Avoid allocation of CorruptedBlocks hashmap when no corrupted blocks are hit
Todd Lipcon created HDFS-13703: -- Summary: Avoid allocation of CorruptedBlocks hashmap when no corrupted blocks are hit Key: HDFS-13703 URL: https://issues.apache.org/jira/browse/HDFS-13703 Project: Hadoop HDFS Issue Type: Improvement Components: performance Reporter: Todd Lipcon Assignee: Todd Lipcon The DFSClient creates a CorruptedBlocks object, which contains a HashMap, on every read call. In most cases, a read will not hit any corrupted blocks, and this hashmap is not used. It seems the JIT isn't smart enough to eliminate this allocation. We would be better off avoiding it and only allocating in the rare case when a corrupt block is hit. Removing this allocation reduced CPU usage of a TeraValidate job by about 10%. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13702) HTrace hooks taking 10-15% CPU in DFS client when disabled
Todd Lipcon created HDFS-13702: -- Summary: HTrace hooks taking 10-15% CPU in DFS client when disabled Key: HDFS-13702 URL: https://issues.apache.org/jira/browse/HDFS-13702 Project: Hadoop HDFS Issue Type: Bug Components: performance Affects Versions: 3.0.0 Reporter: Todd Lipcon I am seeing DFSClient.newReaderTraceScope take ~15% CPU in a teravalidate workload even when HTrace is disabled. This is because it stringifies several integers. We should avoid all allocation and stringification when htrace is disabled. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-13701) Removal of logging guards regressed performance
Todd Lipcon created HDFS-13701: -- Summary: Removal of logging guards regressed performance Key: HDFS-13701 URL: https://issues.apache.org/jira/browse/HDFS-13701 Project: Hadoop HDFS Issue Type: Bug Components: performance Affects Versions: 3.0.0 Reporter: Todd Lipcon HDFS-8971 removed various logging guards from hot methods in the DFS client. In theory using a format string with {} placeholders is equivalent, but in fact it's not equivalent when one or more of the variable arguments are primitives. To be passed as part of the varargs array, the primitives need to be boxed. I am seeing Integer.valueOf() inside BlockReaderLocal.read taking ~3% of CPU. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-3653) 1.x: Add a retention period for purged edit logs
[ https://issues.apache.org/jira/browse/HDFS-3653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-3653. --- Resolution: Won't Fix > 1.x: Add a retention period for purged edit logs > > > Key: HDFS-3653 > URL: https://issues.apache.org/jira/browse/HDFS-3653 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 1.1.0 >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Major > > Occasionally we have a bug which causes something to go wrong with edits > files. Even more occasionally the bug is such that the namenode mistakenly > deletes an {{edits}} file without merging it into {{fsimage}} properly -- e.g > if the bug mistakenly writes an OP_INVALID at the top of the log. > In trunk/2.0 we retain many edit log segments going back in time to be more > robust to this kind of error. I'd like to implement something similar (but > much simpler) in 1.x, which would be used only by HDFS developers in > root-causing or repairing from these rare scenarios: the NN should never > directly delete an edit log file. Instead, it should rename the file into > some kind of "trash" directory inside the name dir, and associate it with a > timestamp. Then, periodically a separate thread should scan the trash dirs > and delete any logs older than a configurable time. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-3069) If an edits file has more edits in it than expected by its name, should trigger an error
[ https://issues.apache.org/jira/browse/HDFS-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-3069. --- Resolution: Won't Fix Target Version/s: (was: ) > If an edits file has more edits in it than expected by its name, should > trigger an error > > > Key: HDFS-3069 > URL: https://issues.apache.org/jira/browse/HDFS-3069 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 0.23.0, 2.0.0-alpha >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Major > > In testing what happens in HA split brain scenarios, I ended up with an edits > log that was named edits_47-47 but actually had two edits in it (#47 and > #48). The edits loading process should detect this situation and barf. > Otherwise, the problem shows up later during loading or even on the next > restart, and is tough to fix. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-957) FSImage layout version should be only once file is complete
[ https://issues.apache.org/jira/browse/HDFS-957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-957. -- Resolution: Won't Fix FSImage layout version should be only once file is complete --- Key: HDFS-957 URL: https://issues.apache.org/jira/browse/HDFS-957 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: hdfs-957.txt Right now, the FSImage save code writes the LAYOUT_VERSION at the head of the file, along with some other headers, and then dumps the directory into the file. Instead, it should write a special IMAGE_IN_PROGRESS entry for the layout version, dump all of the data, then seek back to the head of the file to write the proper LAYOUT_VERSION. This would make it very easy to detect the case where the FSImage save got interrupted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-3528) Use native CRC32 in DFS write path
[ https://issues.apache.org/jira/browse/HDFS-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-3528. --- Resolution: Fixed Fix Version/s: 2.6.0 Use native CRC32 in DFS write path -- Key: HDFS-3528 URL: https://issues.apache.org/jira/browse/HDFS-3528 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, hdfs-client, performance Affects Versions: 2.0.0-alpha Reporter: Todd Lipcon Assignee: James Thomas Fix For: 2.6.0 HDFS-2080 improved the CPU efficiency of the read path by using native SSE-enabled code for CRC verification. Benchmarks of the write path show that it's often CPU bound by checksums as well, so we should make the same improvement there. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HDFS-3278) Umbrella Jira for HDFS-HA Phase 2
[ https://issues.apache.org/jira/browse/HDFS-3278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-3278. --- Resolution: Fixed Fix Version/s: 2.1.0-beta Assignee: Todd Lipcon (was: Sanjay Radia) These subtasks were completed quite a while back. Umbrella Jira for HDFS-HA Phase 2 - Key: HDFS-3278 URL: https://issues.apache.org/jira/browse/HDFS-3278 Project: Hadoop HDFS Issue Type: New Feature Reporter: Sanjay Radia Assignee: Todd Lipcon Fix For: 2.1.0-beta HDFS-1623 gives a high level architecture and design for hot automatic failover of the NN. Branch HDFS-1623 was merged into trunk for tactical reasons even though the work for HA was not complete, Branch HDFS-1623 contained mechanisms for keeping a standby Hot (ie read from shared journal), dual block reports, fencing of DNs, Zookeeper library for leader election etc. This Umbrella jira covers the remaining work for HA and will link all the jiras for the remaining work. Unlike HDFS-1623 no single branch will be created - work will proceed in parallel branches. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-5790) LeaseManager.findPath is very slow when many leases need recovery
Todd Lipcon created HDFS-5790: - Summary: LeaseManager.findPath is very slow when many leases need recovery Key: HDFS-5790 URL: https://issues.apache.org/jira/browse/HDFS-5790 Project: Hadoop HDFS Issue Type: Bug Components: namenode, performance Affects Versions: 2.4.0 Reporter: Todd Lipcon We recently saw an issue where the NN restarted while tens of thousands of files were open. The NN then ended up spending multiple seconds for each commitBlockSynchronization() call, spending most of its time inside LeaseManager.findPath(). findPath currently works by looping over all files held for a given writer, and traversing the filesystem for each one. This takes way too long when tens of thousands of files are open by a single writer. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5287) JN need not validate finalized log segments in newEpoch
Todd Lipcon created HDFS-5287: - Summary: JN need not validate finalized log segments in newEpoch Key: HDFS-5287 URL: https://issues.apache.org/jira/browse/HDFS-5287 Project: Hadoop HDFS Issue Type: Bug Components: qjm Affects Versions: 2.1.1-beta Reporter: Todd Lipcon Priority: Minor In {{scanStorageForLatestEdits}}, the JN will call {{validateLog}} on the last log segment, regardless of whether it is finalized. If it's finalized, then this is a needless pass over the logs which can adversely affect failover time for a graceful failover. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Resolved] (HDFS-3656) ZKFC may write a null breadcrumb znode
[ https://issues.apache.org/jira/browse/HDFS-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-3656. --- Resolution: Duplicate Target Version/s: (was: ) Yep, I think you're right. Thanks. ZKFC may write a null breadcrumb znode Key: HDFS-3656 URL: https://issues.apache.org/jira/browse/HDFS-3656 Project: Hadoop HDFS Issue Type: Bug Components: auto-failover Affects Versions: 2.0.0-alpha Reporter: Todd Lipcon A user [reported|https://issues.cloudera.org/browse/DISTRO-412] an NPE trying to read the breadcrumb znode in the failover controller. This happened repeatedly, implying that an earlier process set the znode to null - probably some race, though I don't see anything obvious in the code. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-5074) Allow starting up from an fsimage checkpoint in the middle of a segment
Todd Lipcon created HDFS-5074: - Summary: Allow starting up from an fsimage checkpoint in the middle of a segment Key: HDFS-5074 URL: https://issues.apache.org/jira/browse/HDFS-5074 Project: Hadoop HDFS Issue Type: Bug Components: ha, namenode Affects Versions: 3.0.0, 2.1.0-beta Reporter: Todd Lipcon We've seen the following behavior a couple times: - SBN is running and somehow encounters an error in the middle of replaying an edit log in the tailer (eg the JN it's reading from crashes) - SBN successfully has processed half of the edits in the segment it was reading. - SBN saves a checkpoint, which now falls in the middle of a segment, and then restarts Upon restart, the SBN will load this checkpoint which falls in the middle of a segment. {{selectInputStreams}} then fails when the SBN requests a mid-segment txid. We should handle this case by downloading the right segment and fast-forwarding to the correct txid. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-5058) QJM should validate startLogSegment() more strictly
Todd Lipcon created HDFS-5058: - Summary: QJM should validate startLogSegment() more strictly Key: HDFS-5058 URL: https://issues.apache.org/jira/browse/HDFS-5058 Project: Hadoop HDFS Issue Type: Bug Components: qjm Affects Versions: 3.0.0, 2.1.0-beta Reporter: Todd Lipcon Assignee: Todd Lipcon We've seen a small handful of times a case where one of the NNs in an HA cluster ends up with an fsimage checkpoint that falls in the middle of an edit segment. We're not sure yet how this happens, but one issue can happen as a result: - Node has fsimage_500. Cluster has edits_1-1000, edits_1001_inprogress - Node restarts, loads fsimage_500 - Node wants to become active. It calls selectInputStreams(500). Currently, this API logs a WARN that 500 falls in the middle of the 1-1000 segment, but continues and returns no results. - Node calls startLogSegment(501). Currently, the QJM will accept this (incorrectly). The node then crashes when it first tries to journal a real transaction, but it ends up leaving the edits_501_inprogress lying around, potentially causing more issues later. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-5037) Active NN should trigger its own edit log rolls
Todd Lipcon created HDFS-5037: - Summary: Active NN should trigger its own edit log rolls Key: HDFS-5037 URL: https://issues.apache.org/jira/browse/HDFS-5037 Project: Hadoop HDFS Issue Type: Improvement Components: ha, namenode Affects Versions: 3.0.0, 2.1.0-beta Reporter: Todd Lipcon We've seen cases where the SBN/2NN went down, and then users accumulated very very large edit log segments. This causes a slow startup time because the last edit log segment must be read fully to recover it before the NN can start up again. Additionally, in the case of QJM, it can trigger timeouts on recovery or edit log syncing because the very-large segment has to get processed within a certain time bound. We could easily improve this by having the NN trigger its own edit log rolls on a configurable size (eg every 256MB) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4982) JournalNode should relogin from keytab before fetching logs from other JNs
Todd Lipcon created HDFS-4982: - Summary: JournalNode should relogin from keytab before fetching logs from other JNs Key: HDFS-4982 URL: https://issues.apache.org/jira/browse/HDFS-4982 Project: Hadoop HDFS Issue Type: Bug Components: journal-node, security Affects Versions: 3.0.0, 2.1.0-beta Reporter: Todd Lipcon Assignee: Todd Lipcon We've seen an issue in a secure cluster where, after a failover, the new NN isn't able to properly coordinate QJM recovery. The JNs fail to fetch logs from each other due to apparently not having a Kerberos TGT. It seems that we need to add the {{checkTGTAndReloginFromKeytab}} call prior to making the HTTP connection, since the java HTTP stuff doesn't do an automatic relogin -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4915) Add config to ZKFC to disable fencing
Todd Lipcon created HDFS-4915: - Summary: Add config to ZKFC to disable fencing Key: HDFS-4915 URL: https://issues.apache.org/jira/browse/HDFS-4915 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 3.0.0 Reporter: Todd Lipcon With QuorumJournalManager, it's not important for the ZKFCs to perform any fencing. We currently workaround this by setting the fencer to /bin/true, but the ZKFC still does things like create breadcrumb znodes, etc. It would be simpler to add a config to disable fencing, and then the ZKFC's job would be simpler -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4879) Add blocked ArrayList collection to avoid CMS full GCs
Todd Lipcon created HDFS-4879: - Summary: Add blocked ArrayList collection to avoid CMS full GCs Key: HDFS-4879 URL: https://issues.apache.org/jira/browse/HDFS-4879 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.0.4-alpha, 3.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon We recently saw an issue where a large deletion was issued which caused 25M blocks to be collected during {{deleteInternal}}. Currently, the list of collected blocks is an ArrayList, meaning that we had to allocate a contiguous 25M-entry array (~400MB). After a NN has been running for a long amount of time, the old generation may become fragmented such that it's hard to find a 400MB contiguous chunk of heap. In general, we should try to design the NN such that the only large objects are long-lived and created at startup time. We can improve this particular case (and perhaps some others) by introducing a new List implementation which is made of a linked list of arrays, each of which is size-limited (eg to 1MB). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-4184) Add the ability for Client to provide more hint information for DataNode to manage the OS buffer cache more accurate
[ https://issues.apache.org/jira/browse/HDFS-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-4184. --- Resolution: Duplicate Add the ability for Client to provide more hint information for DataNode to manage the OS buffer cache more accurate Key: HDFS-4184 URL: https://issues.apache.org/jira/browse/HDFS-4184 Project: Hadoop HDFS Issue Type: New Feature Reporter: binlijin HDFS now has the ability to use posix_fadvise and sync_data_range syscalls to manage the OS buffer cache. {code} When hbase read hlog the data we can set dfs.datanode.drop.cache.behind.reads to true to drop data out of the buffer cache when performing sequential reads. When hbase write hlog we can set dfs.datanode.drop.cache.behind.writes to true to drop data out of the buffer cache after writing When hbase read hfile during compaction we can set dfs.datanode.readahead.bytes to a non-zero value to trigger readahead for sequential reads, and also set dfs.datanode.drop.cache.behind.reads to true to drop data out of the buffer cache when performing sequential reads. and so on... {code} Current we can only set these feature global in datanode,we should set these feature per session. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4828) Make QJM epoch-related errors more understandable
Todd Lipcon created HDFS-4828: - Summary: Make QJM epoch-related errors more understandable Key: HDFS-4828 URL: https://issues.apache.org/jira/browse/HDFS-4828 Project: Hadoop HDFS Issue Type: Improvement Components: qjm Affects Versions: 3.0.0, 2.0.5-beta Reporter: Todd Lipcon Since we started running QJM on production clusters, we've found that users are very confused by some of the error messages that it produces. For example, when a failover occurs and an old NN gets fenced out, it sees errors about its epoch being out of date. We should amend these errors to add text like This is likely because another NameNode took over as Active. Potentially we can even include the other NN's hostname, timestamp it became active, etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4833) Corrupt blocks are not invalidated when first processing repl queues
Todd Lipcon created HDFS-4833: - Summary: Corrupt blocks are not invalidated when first processing repl queues Key: HDFS-4833 URL: https://issues.apache.org/jira/browse/HDFS-4833 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.4-alpha Reporter: Todd Lipcon When the NN processes misreplicated blocks in {{processMisReplicatedBlock}} (eg during initial startup when first processing repl queues), it does not invalidate corrupt replicas unless the block is also over-replicated. This can result in replicas stuck in corrupt state forever if they were that way when the cluster booted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4799) Corrupt replica can be prematurely removed from corruptReplicas map
Todd Lipcon created HDFS-4799: - Summary: Corrupt replica can be prematurely removed from corruptReplicas map Key: HDFS-4799 URL: https://issues.apache.org/jira/browse/HDFS-4799 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.0.4-alpha Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Blocker We saw the following sequence of events in a cluster result in losing the most recent genstamp of a block: - client is writing to a pipeline of 3 - the pipeline had nodes fail over some period of time, such that it left 3 old-genstamp replicas on the original three nodes, having recruited 3 new replicas with a later genstamp. -- so, we have 6 total replicas in the cluster, three with old genstamps on downed nodes, and 3 with the latest genstamp - cluster reboots, and the nodes with old genstamps blockReport first. The replicas are correctly added to the corrupt replicas map since they have a too-old genstamp - the nodes with the new genstamp block report. When the latest one block reports, chooseExcessReplicates is called and incorrectly decides to remove the three good replicas, leaving only the old-genstamp replicas. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4643) Fix flakiness in TestQuorumJournalManager
Todd Lipcon created HDFS-4643: - Summary: Fix flakiness in TestQuorumJournalManager Key: HDFS-4643 URL: https://issues.apache.org/jira/browse/HDFS-4643 Project: Hadoop HDFS Issue Type: Bug Components: qjm, test Affects Versions: 2.0.3-alpha, 3.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Trivial TestQuorumJournalManager can occasionally fail if two consecutive test cases pick the same port number for the JournalNodes. In this case, sometimes an IPC client can be cached from a previous test case, and then fail when it tries to make an IPC over that cached connection to the now-broken connection. We need to more carefully call close() on all the QJMs to prevent this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-4538) allow use of legacy blockreader
[ https://issues.apache.org/jira/browse/HDFS-4538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-4538. --- Resolution: Fixed Hadoop Flags: Reviewed Committed to HDFS-347 branch. allow use of legacy blockreader --- Key: HDFS-4538 URL: https://issues.apache.org/jira/browse/HDFS-4538 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, hdfs-client, performance Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-4538.001.patch, HDFS-4538.002.patch, HDFS-4538.003.patch, HDFS-4538.004.patch Some users might want to use the legacy block reader, because it is available on Windows, whereas the secure solution has not yet been implemented there. As described in the mailing list discussion, let's enable this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-4617) warning while purging logs with QJM enabled
[ https://issues.apache.org/jira/browse/HDFS-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-4617. --- Resolution: Duplicate ATM points out that I already found this bug 3 months ago... resolving as duplicate with HDFS-4298 warning while purging logs with QJM enabled --- Key: HDFS-4617 URL: https://issues.apache.org/jira/browse/HDFS-4617 Project: Hadoop HDFS Issue Type: Bug Components: ha, namenode, qjm Affects Versions: 2.0.3-alpha Reporter: Todd Lipcon HDFS-2946 changed the way that edit log purging is calculated, such that it calls selectInputStreams() with an arbitrary transaction ID calculated relative to the current one. The JournalNodes will reject such a request if that transaction ID falls in the middle of a segment (which it usually will). This means that selectInputStreams gets an exception, and the QJM journal manager is not included in this calculation. Additionally, a warning will be logged. Purging itself still happens, because the detailed information on remote logs is not necessary to calculate a retention interval, but the feature from HDFS-2946 may not work as intended. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4618) default for checkpoint txn interval is too low
Todd Lipcon created HDFS-4618: - Summary: default for checkpoint txn interval is too low Key: HDFS-4618 URL: https://issues.apache.org/jira/browse/HDFS-4618 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.0.4-alpha Reporter: Todd Lipcon Assignee: Todd Lipcon The default checkpoint interval is currently set to 40k transactions. That's way too low (I don't know what idiot set it to that.. oh wait, it was me...) The old default in 1.0 is 64MB. Assuming an average of 100 bytes per txn, we should have the txn-count based interval default to at least 640,000. I'd like to change to 1M as a nice round number. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4621) additional logging to help diagnose slow QJM logSync
Todd Lipcon created HDFS-4621: - Summary: additional logging to help diagnose slow QJM logSync Key: HDFS-4621 URL: https://issues.apache.org/jira/browse/HDFS-4621 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.0.3-alpha Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor I've been working on diagnosing an issue with a cluster which is seeing slow logSync calls occasionally to QJM. Adding a few more pieces of logging would help this: - in the warning messages on the client side leading up to a timeout, include which nodes have responded and which ones are still pending - on the server side, when we actually call FileChannel.force, log a warning if the sync takes longer than 1 second -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4617) warning while purging logs with QJM enabled
Todd Lipcon created HDFS-4617: - Summary: warning while purging logs with QJM enabled Key: HDFS-4617 URL: https://issues.apache.org/jira/browse/HDFS-4617 Project: Hadoop HDFS Issue Type: Bug Components: ha, namenode Affects Versions: 2.0.3-alpha Reporter: Todd Lipcon HDFS-2946 changed the way that edit log purging is calculated, such that it calls selectInputStreams() with an arbitrary transaction ID calculated relative to the current one. The JournalNodes will reject such a request if that transaction ID falls in the middle of a segment (which it usually will). This means that selectInputStreams gets an exception, and the QJM journal manager is not included in this calculation. Additionally, a warning will be logged. Purging itself still happens, because the detailed information on remote logs is not necessary to calculate a retention interval, but the feature from HDFS-2946 may not work as intended. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-4496) DFSClient: don't create a domain socket unless we need it
[ https://issues.apache.org/jira/browse/HDFS-4496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-4496. --- Resolution: Fixed Hadoop Flags: Reviewed DFSClient: don't create a domain socket unless we need it - Key: HDFS-4496 URL: https://issues.apache.org/jira/browse/HDFS-4496 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, hdfs-client, performance Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Attachments: HDFS-4496.001.patch If we don't have conf.domainSocketDataTraffic or conf.shortCircuitLocalReads set, the client shouldn't create a domain socket because we couldn't use it. This is only an issue if you misconfigure things, but it's still good to fix. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4485) HDFS-347: DN should chmod socket path a+w
Todd Lipcon created HDFS-4485: - Summary: HDFS-347: DN should chmod socket path a+w Key: HDFS-4485 URL: https://issues.apache.org/jira/browse/HDFS-4485 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Reporter: Todd Lipcon Assignee: Colin Patrick McCabe Priority: Critical In cluster-testing HDFS-347, we found that in clusters where the MR job doesn't run as the same user as HDFS, clients wouldn't use short circuit read because of a 'permission denied' error connecting to the socket. It turns out that, in order to connect to a socket, clients need write permissions on the socket file. The DN should set these permissions automatically after it creates the socket. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4486) Add log category for long-running DFSClient notices
Todd Lipcon created HDFS-4486: - Summary: Add log category for long-running DFSClient notices Key: HDFS-4486 URL: https://issues.apache.org/jira/browse/HDFS-4486 Project: Hadoop HDFS Issue Type: Improvement Reporter: Todd Lipcon Priority: Minor There are a number of features in the DFS client which are transparent but can make a fairly big difference for performance -- two in particular are short circuit reads and native checksumming. Because we don't want log spew for clients like hadoop fs -cat we currently log only at DEBUG level when these features are disabled. This makes it difficult to troubleshoot/verify for long-running perf-sensitive clients like HBase. One simple solution is to add a new log category - eg o.a.h.h.DFSClient.PerformanceAdvisory - which long-running clients could enable at DEBUG level without getting the full debug spew. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-4433) make TestPeerCache not flaky
[ https://issues.apache.org/jira/browse/HDFS-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-4433. --- Resolution: Fixed Hadoop Flags: Reviewed make TestPeerCache not flaky Key: HDFS-4433 URL: https://issues.apache.org/jira/browse/HDFS-4433 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, hdfs-client, performance Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Attachments: HDFS-4433.001.patch TestPeerCache is flaky now because it relies on using the same global cache for every test function. So the cache timeout can't be set to something different for each test. Also, we should implement equals and hashCode for {{FakePeer}}, since otherwise {{testMultiplePeersWithSameDnId}} is not really testing what happens when multiple equal peers are inserted into the cache. (The default equals is object equality). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-4416) change dfs.datanode.domain.socket.path to dfs.domain.socket.path
[ https://issues.apache.org/jira/browse/HDFS-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-4416. --- Resolution: Fixed Hadoop Flags: Reviewed Committed to branch. Thanks, Colin. change dfs.datanode.domain.socket.path to dfs.domain.socket.path Key: HDFS-4416 URL: https://issues.apache.org/jira/browse/HDFS-4416 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, hdfs-client, performance Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Attachments: HDFS-4416.001.patch, HDFS-4416.002.patch, HDFS-4416.003.patch, HDFS-4416.004.patch {{dfs.datanode.domain.socket.path}} is used by both clients and the DataNode, so it might be best to avoid putting 'datanode' in the name. Most of the configuration keys that have 'datanode' in the name apply only to the DN. Also, should change __PORT__ to _PORT to be consistent with _HOST, etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-4418) HDFS-347: increase default FileInputStreamCache size
[ https://issues.apache.org/jira/browse/HDFS-4418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-4418. --- Resolution: Fixed Hadoop Flags: Reviewed Committed to branch. HDFS-347: increase default FileInputStreamCache size Key: HDFS-4418 URL: https://issues.apache.org/jira/browse/HDFS-4418 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, hdfs-client, performance Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: hdfs-4418.txt The FileInputStreamCache currently defaults to holding only 10 input stream pairs (corresponding to 10 blocks). In many HBase workloads, the region server will be issuing random reads against a local file which is 2-4GB in size or even larger (hence 20+ blocks). Given that the memory usage for caching these input streams is low, and applications like HBase tend to already increase their ulimit -n substantially (eg up to 32,000), I think we should raise the default cache size to 50 or more. In the rare case that someone has an application which uses local reads with hundreds of open blocks and can't feasibly raise their ulimit -n, they can lower the limit appropriately. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4417) HDFS-347: fix case where local reads get disabled incorrectly
Todd Lipcon created HDFS-4417: - Summary: HDFS-347: fix case where local reads get disabled incorrectly Key: HDFS-4417 URL: https://issues.apache.org/jira/browse/HDFS-4417 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Todd Lipcon Assignee: Todd Lipcon In testing HDFS-347 against HBase (thanks [~jdcryans]) we ran into the following case: - a workload is running which puts a bunch of local sockets in the PeerCache - the workload abates for a while, causing the sockets to go stale (ie the DN side disconnects after the keepalive timeout) - the workload starts again In this case, the local socket retrieved from the cache failed the newBlockReader call, and it incorrectly disabled local sockets on that host. This is similar to an earlier bug HDFS-3376, but not quite the same. The next issue we ran into is that, once this happened, it never tried local sockets again, because the cache held lots of TCP sockets. Since we always managed to get a cached socket to the local node, it didn't bother trying local read again. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4418) HDFS-347: increase default FileInputStreamCache size
Todd Lipcon created HDFS-4418: - Summary: HDFS-347: increase default FileInputStreamCache size Key: HDFS-4418 URL: https://issues.apache.org/jira/browse/HDFS-4418 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Todd Lipcon Assignee: Todd Lipcon The FileInputStreamCache currently defaults to holding only 10 input stream pairs (corresponding to 10 blocks). In many HBase workloads, the region server will be issuing random reads against a local file which is 2-4GB in size or even larger (hence 20+ blocks). Given that the memory usage for caching these input streams is low, and applications like HBase tend to already increase their ulimit -n substantially (eg up to 32,000), I think we should raise the default cache size to 50 or more. In the rare case that someone has an application which uses local reads with hundreds of open blocks and can't feasibly raise their ulimit -n, they can lower the limit appropriately. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-4400) DFSInputStream#getBlockReader: last retries should ignore the cache
[ https://issues.apache.org/jira/browse/HDFS-4400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-4400. --- Resolution: Fixed Hadoop Flags: Reviewed Committed to branch DFSInputStream#getBlockReader: last retries should ignore the cache --- Key: HDFS-4400 URL: https://issues.apache.org/jira/browse/HDFS-4400 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-4400.001.patch In {{DFSInputStream#getBlockReader}}, the last tries to get a {{BlockReader}} should ignore the cache. This was broken by HDFS-4356, it seems. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-4401) Fix bug in DomainSocket path validation
[ https://issues.apache.org/jira/browse/HDFS-4401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-4401. --- Resolution: Fixed Hadoop Flags: Reviewed Committed to branch. Thanks for the fix. Fix bug in DomainSocket path validation --- Key: HDFS-4401 URL: https://issues.apache.org/jira/browse/HDFS-4401 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Attachments: HDFS-4401.001.patch DomainSocket path validation currently does not validate the second-to-last path component. This leads to insecure socket paths being accepted. It should validate all path components prior to the final one. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-4402) some small DomainSocket fixes: avoid findbugs warning, change log level, etc.
[ https://issues.apache.org/jira/browse/HDFS-4402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-4402. --- Resolution: Fixed Hadoop Flags: Reviewed Committed to branch, thanks. some small DomainSocket fixes: avoid findbugs warning, change log level, etc. - Key: HDFS-4402 URL: https://issues.apache.org/jira/browse/HDFS-4402 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Colin Patrick McCabe Priority: Minor Attachments: HDFS-4402.001.patch, HDFS-4402.002.patch Some miscellaneous fixes: * findbugs complains about a short-circuit operator in {{DomainSocket.java}} for some reason. We don't need it (it doesn't help optimization since the expressions lack side-effects), so let's ditch it to avoid the findbugs warning. * change the log level of one error message to warn * BlockReaderLocal should use a BufferedInputStream to read the metadata file header, to avoid doing multiple small reads. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4403) DFSClient can infer checksum type when not provided by reading first byte
Todd Lipcon created HDFS-4403: - Summary: DFSClient can infer checksum type when not provided by reading first byte Key: HDFS-4403 URL: https://issues.apache.org/jira/browse/HDFS-4403 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.0.2-alpha Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor HDFS-3177 added the checksum type to OpBlockChecksumResponseProto, but the new protobuf field is optional, with a default of CRC32. This means that this API, when used against an older cluster (like earlier 0.23 releases) will falsely return CRC32 even if that cluster has written files with CRC32C. This can cause issues for distcp, for example. Instead of defaulting the protobuf field to CRC32, we can leave it with no default, and if the OpBlockChecksumResponseProto has no checksum type set, the client can send OP_READ_BLOCK to read the first byte of the block, then grab the checksum type out of that response (which has always been present) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-4356) BlockReaderLocal should use passed file descriptors rather than paths
[ https://issues.apache.org/jira/browse/HDFS-4356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-4356. --- Resolution: Fixed Hadoop Flags: Reviewed BlockReaderLocal should use passed file descriptors rather than paths - Key: HDFS-4356 URL: https://issues.apache.org/jira/browse/HDFS-4356 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, hdfs-client, performance Affects Versions: 2.0.3-alpha Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: 04b-cumulative.patch, _04b.patch, _04c.patch, 04-cumulative.patch, 04d-cumulative.patch, _04e.patch, 04f-cumulative.patch, _04f.patch, 04g-cumulative.patch, _04g.patch {{BlockReaderLocal}} should use file descriptors passed over UNIX domain sockets rather than paths. We also need some configuration options for these UNIX domain sockets. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-4388) DomainSocket should throw AsynchronousCloseException when appropriate
[ https://issues.apache.org/jira/browse/HDFS-4388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-4388. --- Resolution: Fixed Hadoop Flags: Reviewed Committed to branch. Thanks. DomainSocket should throw AsynchronousCloseException when appropriate - Key: HDFS-4388 URL: https://issues.apache.org/jira/browse/HDFS-4388 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Trivial Attachments: _05a.patch {{DomainSocket}} should throw {{AsynchronousCloseException}} when appropriate (i.e., when an {{accept}} or other blocking operation is interrupted by a concurrent close.) This is nicer than throwing a generic {{IOException}} or {{SocketException}}. Similarly, we should well throw {{ClosedChannelException}} when an operation is attempted on a closed {{DomainSocket}}. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-4390) Bypass UNIX domain socket unit tests when they cannot be run
[ https://issues.apache.org/jira/browse/HDFS-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-4390. --- Resolution: Fixed Hadoop Flags: Reviewed Committed to branch Bypass UNIX domain socket unit tests when they cannot be run Key: HDFS-4390 URL: https://issues.apache.org/jira/browse/HDFS-4390 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Attachments: _06.patch Testing revealed that the existing mechanisms for bypassing UNIX domain socket-related tests when they are not available are inadequate. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4380) Opening a file for read before writer writes a block causes NPE
Todd Lipcon created HDFS-4380: - Summary: Opening a file for read before writer writes a block causes NPE Key: HDFS-4380 URL: https://issues.apache.org/jira/browse/HDFS-4380 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 1.0.3 Reporter: Todd Lipcon JD Cryans found this issue: it seems like, if you open a file for read immediately after it's been created by the writer, after a block has been allocated, but before the block is created on the DNs, then you can end up with the following NPE: java.lang.NullPointerException at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.updateBlockInfo(DFSClient.java:1885) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1858) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.init(DFSClient.java:1834) at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:578) at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:154) This seems to be because {{getBlockInfo}} returns a null block when the DN doesn't yet have the replica. The client should probably either fall back to a different replica or treat it as zero-length. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (HDFS-4352) Encapsulate arguments to BlockReaderFactory in a class
[ https://issues.apache.org/jira/browse/HDFS-4352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon reopened HDFS-4352: --- By the way, I find it _very_ rude to close someone else's ticket as Invalid or Wont fix without waiting for the discussion to end. Just because you don't like a change doesn't give you license to do this. Encapsulate arguments to BlockReaderFactory in a class -- Key: HDFS-4352 URL: https://issues.apache.org/jira/browse/HDFS-4352 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Affects Versions: 2.0.3-alpha Reporter: Colin Patrick McCabe Attachments: 01b.patch, 01.patch Encapsulate the arguments to BlockReaderFactory in a class to avoid having to pass around 10+ arguments to a few different functions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4324) Track and report out-of-date blocks separately from corrupt blocks
Todd Lipcon created HDFS-4324: - Summary: Track and report out-of-date blocks separately from corrupt blocks Key: HDFS-4324 URL: https://issues.apache.org/jira/browse/HDFS-4324 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 3.0.0 Reporter: Todd Lipcon Currently in various places (metrics, dfsadmin -report, fsck, logs) we use the term corrupt to refer to blocks which have an out-of-date generation stamp. Since out-of-date blocks are a fairly normal occurrence if a DN restarts while data is being written, we should be avoid using 'scary' works like _corrupt_. This may need both some textual changes as well as some internal changes to count the corruption types distinctly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4305) Add a configurable limit on number of blocks per file, and min block size
Todd Lipcon created HDFS-4305: - Summary: Add a configurable limit on number of blocks per file, and min block size Key: HDFS-4305 URL: https://issues.apache.org/jira/browse/HDFS-4305 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.0.2-alpha, 1.0.4, 3.0.0 Reporter: Todd Lipcon Priority: Minor We recently had an issue where a user set the block size very very low and managed to create a single file with hundreds of thousands of blocks. This caused problems with the edit log since the OP_ADD op was so large (HDFS-4304). I imagine it could also cause efficiency issues in the NN. To prevent users from making such mistakes, we should: - introduce a configurable minimum block size, below which requests are rejected - introduce a configurable maximum number of blocks per file, above which requests to add another block are rejected (with a suitably high default as to not prevent legitimate large files) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4298) StorageRetentionManager spews warnings when used with QJM
Todd Lipcon created HDFS-4298: - Summary: StorageRetentionManager spews warnings when used with QJM Key: HDFS-4298 URL: https://issues.apache.org/jira/browse/HDFS-4298 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 3.0.0, 2.0.3-alpha Reporter: Todd Lipcon Assignee: Aaron T. Myers When the NN is configured with a QJM, we see the following warning message every time a checkpoint is made or uploaded: 12/12/10 16:07:52 WARN namenode.FSEditLog: Unable to determine input streams from QJM to [127.0.0.1:13001, 127.0.0.1:13002, 127.0.0.1:13003]. Skipping. org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many exceptions to achieve quorum size 2/3. 3 exceptions thrown: 127.0.0.1:13002: Asked for firstTxId 114837 which is in the middle of file /tmp/jn-2/myjournal/current/edits_0095185-0114846 ... This is because, since HDFS-2946, the NN calls {{selectInputStreams}} to determine the number of log segments and put a cap on the number. This API throws an exception in the case of QJM if the argument falls in the middle of an edit log boundary. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-3571) Allow EditLogFileInputStream to read from a remote URL
[ https://issues.apache.org/jira/browse/HDFS-3571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-3571. --- Resolution: Fixed Fix Version/s: 2.0.3-alpha Committed backport to branch-2. Thanks for reviewing. Allow EditLogFileInputStream to read from a remote URL -- Key: HDFS-3571 URL: https://issues.apache.org/jira/browse/HDFS-3571 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, namenode Affects Versions: 3.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 3.0.0, 2.0.3-alpha Attachments: hdfs-3571-branch-2.txt, hdfs-3571.txt, hdfs-3571.txt In order to start up from remote edits storage (like the JournalNodes of HDFS-3077), the NN needs to be able to load edits from a URL, instead of just local disk. This JIRA extends EditLogFileInputStream to be able to use a URL reference in addition to the current File reference. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-3077) Quorum-based protocol for reading and writing edit logs
[ https://issues.apache.org/jira/browse/HDFS-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-3077. --- Resolution: Fixed Fix Version/s: 2.0.3-alpha Committed backport to branch-2. Thanks for looking at the backport patch, Andrew and Aaron. Quorum-based protocol for reading and writing edit logs --- Key: HDFS-3077 URL: https://issues.apache.org/jira/browse/HDFS-3077 Project: Hadoop HDFS Issue Type: New Feature Components: ha, namenode Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 3.0.0, QuorumJournalManager (HDFS-3077), 2.0.3-alpha Attachments: hdfs-3077-branch-2.txt, hdfs-3077-partial.txt, hdfs-3077-test-merge.txt, hdfs-3077.txt, hdfs-3077.txt, hdfs-3077.txt, hdfs-3077.txt, hdfs-3077.txt, hdfs-3077.txt, hdfs-3077.txt, qjournal-design.pdf, qjournal-design.pdf, qjournal-design.pdf, qjournal-design.pdf, qjournal-design.pdf, qjournal-design.pdf, qjournal-design.tex, qjournal-design.tex Currently, one of the weak points of the HA design is that it relies on shared storage such as an NFS filer for the shared edit log. One alternative that has been proposed is to depend on BookKeeper, a ZooKeeper subproject which provides a highly available replicated edit log on commodity hardware. This JIRA is to implement another alternative, based on a quorum commit protocol, integrated more tightly in HDFS and with the requirements driven only by HDFS's needs rather than more generic use cases. More details to follow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (HDFS-4110) Refine JNStorage log
[ https://issues.apache.org/jira/browse/HDFS-4110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon reopened HDFS-4110: --- Reopening to backport to branch-2 Refine JNStorage log Key: HDFS-4110 URL: https://issues.apache.org/jira/browse/HDFS-4110 Project: Hadoop HDFS Issue Type: Improvement Components: journal-node Affects Versions: 3.0.0, 2.0.3-alpha Reporter: liang xie Assignee: liang xie Priority: Trivial Labels: newbie Fix For: 3.0.0 Attachments: HDFS-4110.txt Abstract class Storage has a toString method: {quote} return Storage Directory + this.root; {quote} and in the subclass JNStorage we could see: {quote} LOG.info(Formatting journal storage directory + sd + with nsid: + getNamespaceID()); {quote} that'll print sth like Formatting journal storage directory Storage Directory x Just one line change to: {quota} LOG.info(Formatting journal + sd + with nsid: + getNamespaceID()); {quota} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-4110) Refine JNStorage log
[ https://issues.apache.org/jira/browse/HDFS-4110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-4110. --- Resolution: Fixed Fix Version/s: 2.0.3-alpha Committed backport to branch-2 (same patch applied) Refine JNStorage log Key: HDFS-4110 URL: https://issues.apache.org/jira/browse/HDFS-4110 Project: Hadoop HDFS Issue Type: Improvement Components: journal-node Affects Versions: 3.0.0, 2.0.3-alpha Reporter: liang xie Assignee: liang xie Priority: Trivial Labels: newbie Fix For: 3.0.0, 2.0.3-alpha Attachments: HDFS-4110.txt Abstract class Storage has a toString method: {quote} return Storage Directory + this.root; {quote} and in the subclass JNStorage we could see: {quote} LOG.info(Formatting journal storage directory + sd + with nsid: + getNamespaceID()); {quote} that'll print sth like Formatting journal storage directory Storage Directory x Just one line change to: {quota} LOG.info(Formatting journal + sd + with nsid: + getNamespaceID()); {quota} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (HDFS-3571) Allow EditLogFileInputStream to read from a remote URL
[ https://issues.apache.org/jira/browse/HDFS-3571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon reopened HDFS-3571: --- Reopening for merge to branch-2 (this is needed for QJM in branch-2) Allow EditLogFileInputStream to read from a remote URL -- Key: HDFS-3571 URL: https://issues.apache.org/jira/browse/HDFS-3571 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, namenode Affects Versions: 3.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 3.0.0 Attachments: hdfs-3571.txt, hdfs-3571.txt In order to start up from remote edits storage (like the JournalNodes of HDFS-3077), the NN needs to be able to load edits from a URL, instead of just local disk. This JIRA extends EditLogFileInputStream to be able to use a URL reference in addition to the current File reference. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4176) EditLogTailer should call rollEdits with a timeout
Todd Lipcon created HDFS-4176: - Summary: EditLogTailer should call rollEdits with a timeout Key: HDFS-4176 URL: https://issues.apache.org/jira/browse/HDFS-4176 Project: Hadoop HDFS Issue Type: Bug Components: ha, name-node Affects Versions: 2.0.2-alpha, 3.0.0 Reporter: Todd Lipcon When the EditLogTailer thread calls rollEdits() on the active NN via RPC, it currently does so without a timeout. So, if the active NN has frozen (but not actually crashed), this call can hang forever. This can then potentially prevent the standby from becoming active. This may actually considered a side effect of HADOOP-6762 -- if the RPC were interruptible, that would also fix the issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4169) Add per-disk latency metrics to DataNode
Todd Lipcon created HDFS-4169: - Summary: Add per-disk latency metrics to DataNode Key: HDFS-4169 URL: https://issues.apache.org/jira/browse/HDFS-4169 Project: Hadoop HDFS Issue Type: Improvement Components: data-node Affects Versions: 3.0.0 Reporter: Todd Lipcon Currently, if one of the drives on the DataNode is slow, it's hard to determine what the issue is. This can happen due to a failing disk, bad controller, etc. It would be preferable to expose per-drive MXBeans (or tagged metrics) with latency statistics about how long reads/writes are taking. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4128) 2NN gets stuck in inconsistent state if edit log replay fails in the middle
Todd Lipcon created HDFS-4128: - Summary: 2NN gets stuck in inconsistent state if edit log replay fails in the middle Key: HDFS-4128 URL: https://issues.apache.org/jira/browse/HDFS-4128 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 2.0.2-alpha Reporter: Todd Lipcon We saw the following issue in a cluster: - The 2NN downloads an edit log segment: {code} 2012-10-29 12:30:57,433 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Reading /xxx/current/edits_00049136809-00049176162 expecting start txid #49136809 {code} - It fails in the middle of replay due to an OOME: {code} 2012-10-29 12:31:21,021 ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception on operation AddOp [length=0, path=/ java.lang.OutOfMemoryError: Java heap space {code} - Future checkpoints then fail because the prior edit log replay only got halfway through the stream: {code} 2012-10-29 12:32:21,214 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Reading /x/current/edits_00049176163-00049177224 expecting start txid #49144432 2012-10-29 12:32:21,216 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in doCheckpoint java.io.IOException: There appears to be a gap in the edit log. We expected txid 49144432, but got txid 49176163. {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4049) hflush performance regression due to nagling delays
Todd Lipcon created HDFS-4049: - Summary: hflush performance regression due to nagling delays Key: HDFS-4049 URL: https://issues.apache.org/jira/browse/HDFS-4049 Project: Hadoop HDFS Issue Type: Bug Components: data-node, performance Affects Versions: 2.0.2-alpha, 3.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Critical HDFS-3721 reworked the way that packets are mirrored through the pipeline in the datanode. This caused two write() calls where there used to be one, which interacts badly with nagling so that there are 40ms bubbles on hflush() calls. We didn't notice this in the tests because the hflush perf test only uses a single datanode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4025) QJM: Sychronize past log segments to JNs that missed them
Todd Lipcon created HDFS-4025: - Summary: QJM: Sychronize past log segments to JNs that missed them Key: HDFS-4025 URL: https://issues.apache.org/jira/browse/HDFS-4025 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Todd Lipcon Assignee: Todd Lipcon Currently, if a JournalManager crashes and misses some segment of logs, and then comes back, it will be re-added as a valid part of the quorum on the next log roll. However, it will not have a complete history of log segments (i.e any individual JN may have gaps in its transaction history). This mirrors the behavior of the NameNode when there are multiple local directories specified. However, it would be better if a background thread noticed these gaps and filled them in by grabbing the segments from other JournalNodes. This increases the resilience of the system when JournalNodes get reformatted or otherwise lose their local disk. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-4017) Unclosed FileInputStream in GetJournalEditServlet
[ https://issues.apache.org/jira/browse/HDFS-4017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-4017. --- Resolution: Fixed Fix Version/s: QuorumJournalManager (HDFS-3077) Hadoop Flags: Reviewed Committed to branch, thanks Chao. Unclosed FileInputStream in GetJournalEditServlet - Key: HDFS-4017 URL: https://issues.apache.org/jira/browse/HDFS-4017 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Chao Shi Priority: Trivial Fix For: QuorumJournalManager (HDFS-3077) Attachments: hdfs-4017.txt The FileInputStream to read editFile is not closed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-4004) TestJournalNode#testJournal fails because of test case execution order
[ https://issues.apache.org/jira/browse/HDFS-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-4004. --- Resolution: Fixed Target Version/s: QuorumJournalManager (HDFS-3077) Hadoop Flags: Reviewed Committed to the branch, thanks Chao! TestJournalNode#testJournal fails because of test case execution order -- Key: HDFS-4004 URL: https://issues.apache.org/jira/browse/HDFS-4004 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Reporter: Chao Shi Assignee: Chao Shi Priority: Minor Fix For: QuorumJournalManager (HDFS-3077) Attachments: hdfs-4004.txt I'm running HDFS test on HDFS-3077 branch. I found TestJournalNode#testJournal fails sometimes. The assertion failed is: MetricsAsserts.assertCounter(BatchesWritten, 0L, metrics); The reason is when testHttpServer is running before testJournal, it will write some logs to JN. The fix is simple: assign a new JID for each test case, so that they will use different metrics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3972) Trash emptier fails in secure cluster
Todd Lipcon created HDFS-3972: - Summary: Trash emptier fails in secure cluster Key: HDFS-3972 URL: https://issues.apache.org/jira/browse/HDFS-3972 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 2.0.1-alpha, 3.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Critical In a secure cluster, we're seeing the following issue on the NN when the trash emptier tries to run: WARN org.apache.hadoop.fs.TrashPolicyDefault: Trash can't list homes: java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host \ is: x; destination host is: :8020; Sleeping. The issue seems to be that the trash emptier thread sends RPCs back to itself, but isn't wrapped in a doAs. Credit goes to Stephen Chu for discovering this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3974) dfsadmin -metasave throws NPE when under-replicated blocks are recently deleted
Todd Lipcon created HDFS-3974: - Summary: dfsadmin -metasave throws NPE when under-replicated blocks are recently deleted Key: HDFS-3974 URL: https://issues.apache.org/jira/browse/HDFS-3974 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 3.0.0 Reporter: Todd Lipcon Priority: Minor We currently have the following race: - Block is underreplicated, hence it's present in neededReplications - User deletes the block - its BlockInfo.blockCollection is set to null - Admin runs metaSave before the replication monitor runs. This causes an NPE since block.getBlockCollection() for one of the neededReplication blocks has become null. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3969) Small bug fixes and improvements for disk locations API
Todd Lipcon created HDFS-3969: - Summary: Small bug fixes and improvements for disk locations API Key: HDFS-3969 URL: https://issues.apache.org/jira/browse/HDFS-3969 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 3.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon The new disk block locations API has a configurable timeout, but it's used inconsistently: the invokeAll() call to the thread pool assumes the timeout is in seconds, but the RPC timeout is set in milliseconds. Also, we can improve the wire protocol for this API to be a lot more efficient. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3967) NN should bail our earlier when logs to load have a gap
Todd Lipcon created HDFS-3967: - Summary: NN should bail our earlier when logs to load have a gap Key: HDFS-3967 URL: https://issues.apache.org/jira/browse/HDFS-3967 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 2.0.1-alpha, 3.0.0 Reporter: Todd Lipcon Priority: Minor i was testing an HA setup with a lowered edit log retention period, and ended up in a state where one of the two NNs had fallen too far behind, such that it couldn't start up again (due to the too-low retention period). When I started the NN, I got the following: 12/09/21 13:03:20 INFO namenode.FSImage: Loaded image for txid 45781083 from /tmp/name1-name/current/fsimage_00045781083 12/09/21 13:03:20 INFO namenode.FSImage: Reading org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@239a0feb expecting start txid #45781084 12/09/21 13:03:20 INFO namenode.EditLogInputStream: Fast-forwarding stream 'http://localhost:13081/getJournal?jid=myjournalsegmentTxId=45928954storageInfo=-40%3A292785232%3A0%3ACID-0553884b-f3ea-46a3-9154-200d4f84304b, http://localhost:13082/getJournal?jid=myjournalsegmentTxId=45928954storageInfo=-40%3A292785232%3A0%3ACID-0553884b-f3ea-46a3-9154-200d4f84304b, http://localhost:13083/getJournal?jid=myjournalsegmentTxId=45928954storageInfo=-40%3A292785232%3A0%3ACID-0553884b-f3ea-46a3-9154-200d4f84304b' to transaction ID 45781084 12/09/21 13:03:20 INFO namenode.EditLogInputStream: Fast-forwarding stream 'http://localhost:13081/getJournal?jid=myjournalsegmentTxId=45928954storageInfo=-40%3A292785232%3A0%3ACID-0553884b-f3ea-46a3-9154-200d4f84304b' to transaction ID 45781084 12/09/21 13:03:20 FATAL namenode.NameNode: Exception in namenode join java.io.IOException: There appears to be a gap in the edit log. We expected txid 45781084, but got txid 45928954. Rather than trying to 'fast forward' the stream to a transaction which is actually prior to the first tx, we should bail earlier with a nicer error. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3962) NN should periodically check writability of 'required' journals
Todd Lipcon created HDFS-3962: - Summary: NN should periodically check writability of 'required' journals Key: HDFS-3962 URL: https://issues.apache.org/jira/browse/HDFS-3962 Project: Hadoop HDFS Issue Type: Improvement Components: ha, name-node Affects Versions: 2.0.1-alpha, 3.0.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Currently, our HA design ensures write fencing by having the failover controller call a fencing script before transitioning a new node to active. However, if the fencing script is based on storage fencing (and not stonith), there is no _read_ fencing. That is to say, the old active may continue to believe himself active for an unbounded amount of time, assuming that it does not try to write to its edit log. This isn't super problematic, but it would be beneficial for monitoring, etc, to have the old NN periodically check the writability of any required journals, and abort if they become unwritable, even if there are no writes coming into the system. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-3950) QJM: misc TODO cleanup, improved log messages, etc
[ https://issues.apache.org/jira/browse/HDFS-3950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-3950. --- Resolution: Fixed Fix Version/s: QuorumJournalManager (HDFS-3077) Hadoop Flags: Reviewed Committed to branch, thanks for the reviews QJM: misc TODO cleanup, improved log messages, etc -- Key: HDFS-3950 URL: https://issues.apache.org/jira/browse/HDFS-3950 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Fix For: QuorumJournalManager (HDFS-3077) Attachments: hdfs-3950.txt, hdfs-3950.txt General JIRA for a bunch of miscellaneous clean-up in the QJM branch: - fix most remaining TODOs - improve some log/error messages - add some more sanity checks where appropriate - address any findbugs that might have crept into branch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-3955) QJM: Make acceptRecovery() atomic
[ https://issues.apache.org/jira/browse/HDFS-3955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-3955. --- Resolution: Fixed Fix Version/s: QuorumJournalManager (HDFS-3077) Hadoop Flags: Reviewed QJM: Make acceptRecovery() atomic - Key: HDFS-3955 URL: https://issues.apache.org/jira/browse/HDFS-3955 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: QuorumJournalManager (HDFS-3077) Attachments: hdfs-3955.txt Per one of the TODOs in Journal.java, there is currently a lack of atomicity in the {{acceptRecovery()}} code path. In particular, we have the following actions executed non-atomically: - Download a new edits_inprogress_N from some other node - Persist the paxos recovery file to disk. If the JN crashes between these two steps, then we may be left in the state whereby the edits_inprogress file has different data than the Paxos data left over on the disk from a previous recovery attempt. This causes the next {{prepareRecovery()}} to fail with an AssertionError. I discovered this by randomly injecting a fault between the two steps, and then running the randomized fault test on a cluster. This resulted in some AssertionErrors in the test logs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-3956) QJM: purge temporary files when no longer within retention period
[ https://issues.apache.org/jira/browse/HDFS-3956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-3956. --- Resolution: Fixed Fix Version/s: QuorumJournalManager (HDFS-3077) Hadoop Flags: Reviewed Committed to branch, thanks Eli. QJM: purge temporary files when no longer within retention period - Key: HDFS-3956 URL: https://issues.apache.org/jira/browse/HDFS-3956 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Fix For: QuorumJournalManager (HDFS-3077) Attachments: hdfs-3956.txt After doing a bunch of fault testing, I noticed that the JNs had a bunch of temporary files left around in their journal directories which were no longer within the retention period. For example, if a JN crashes in the middle of recovery, it can leave around a file like {{edits_inprogress_123.epoch=10}}. These files are handy to keep around for forensics/debugging while they are still in their retention period, but we should not leave them forever. The normal purging policy should apply. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3950) QJM: misc TODO cleanup, improved log messages, etc
Todd Lipcon created HDFS-3950: - Summary: QJM: misc TODO cleanup, improved log messages, etc Key: HDFS-3950 URL: https://issues.apache.org/jira/browse/HDFS-3950 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor General JIRA for a bunch of miscellaneous clean-up in the QJM branch: - fix most remaining TODOs - improve some log/error messages - add some more sanity checks where appropriate - address any findbugs that might have crept into branch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3955) QJM: Make acceptRecovery() atomic
Todd Lipcon created HDFS-3955: - Summary: QJM: Make acceptRecovery() atomic Key: HDFS-3955 URL: https://issues.apache.org/jira/browse/HDFS-3955 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Todd Lipcon Assignee: Todd Lipcon Per one of the TODOs in Journal.java, there is currently a lack of atomicity in the {{acceptRecovery()}} code path. In particular, we have the following actions executed non-atomically: - Download a new edits_inprogress_N from some other node - Persist the paxos recovery file to disk. If the JN crashes between these two steps, then we may be left in the state whereby the edits_inprogress file has different data than the Paxos data left over on the disk from a previous recovery attempt. This causes the next {{prepareRecovery()}} to fail with an AssertionError. I discovered this by randomly injecting a fault between the two steps, and then running the randomized fault test on a cluster. This resulted in some AssertionErrors in the test logs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3956) QJM: purge temporary files when no longer within retention period
Todd Lipcon created HDFS-3956: - Summary: QJM: purge temporary files when no longer within retention period Key: HDFS-3956 URL: https://issues.apache.org/jira/browse/HDFS-3956 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor After doing a bunch of fault testing, I noticed that the JNs had a bunch of temporary files left around in their journal directories which were no longer within the retention period. For example, if a JN crashes in the middle of recovery, it can leave around a file like {{edits_inprogress_123.epoch=10}}. These files are handy to keep around for forensics/debugging while they are still in their retention period, but we should not leave them forever. The normal purging policy should apply. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3958) Integrate upgrade/finalize/rollback with external journals
Todd Lipcon created HDFS-3958: - Summary: Integrate upgrade/finalize/rollback with external journals Key: HDFS-3958 URL: https://issues.apache.org/jira/browse/HDFS-3958 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 3.0.0 Reporter: Todd Lipcon Currently the NameNode upgrade/rollback/finalize framework only supports local storage. With edits being stored in pluggable Journals, this could create certain difficulties - in particular, rollback wouldn't actually rollback the external storage to the old state. We should look at how to expose the right hooks to the external journal storage to snapshot/rollback/finalize. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3943) QJM: remove currently unused md5sum field.
Todd Lipcon created HDFS-3943: - Summary: QJM: remove currently unused md5sum field. Key: HDFS-3943 URL: https://issues.apache.org/jira/browse/HDFS-3943 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Todd Lipcon Assignee: Todd Lipcon Per later discussion in HDFS-3859, it turns out to be rather difficult to integrate md5sum verification into QJM at this point. The crux of the issue is that different replicas may be semantically identical, but bytewise unequal due to the padding at the end of the file. Given this, I'd like to temporarily remove the md5sum field from the protocol while we work on the more complex verification (which ignores the trailing padding) in HDFS-3859. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-3943) QJM: remove currently unused md5sum field.
[ https://issues.apache.org/jira/browse/HDFS-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-3943. --- Resolution: Fixed Fix Version/s: QuorumJournalManager (HDFS-3077) Hadoop Flags: Reviewed Committed to branch, thx. QJM: remove currently unused md5sum field. Key: HDFS-3943 URL: https://issues.apache.org/jira/browse/HDFS-3943 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: QuorumJournalManager (HDFS-3077) Attachments: hdfs-3943.txt Per later discussion in HDFS-3859, it turns out to be rather difficult to integrate md5sum verification into QJM at this point. The crux of the issue is that different replicas may be semantically identical, but bytewise unequal due to the padding at the end of the file. Given this, I'd like to temporarily remove the md5sum field from the protocol while we work on the more complex verification (which ignores the trailing padding) in HDFS-3859. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-3894) QJM: testRecoverAfterDoubleFailures can be flaky due to IPC client caching
[ https://issues.apache.org/jira/browse/HDFS-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-3894. --- Resolution: Fixed Fix Version/s: QuorumJournalManager (HDFS-3077) Hadoop Flags: Reviewed Committed to branch, thx for review QJM: testRecoverAfterDoubleFailures can be flaky due to IPC client caching -- Key: HDFS-3894 URL: https://issues.apache.org/jira/browse/HDFS-3894 Project: Hadoop HDFS Issue Type: Sub-task Components: test Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: QuorumJournalManager (HDFS-3077) Attachments: hdfs-3894.txt TestQJMWithFaults.testRecoverAfterDoubleFailures fails really occasionally. Looking into it, the issue seems to be that it's possible by random chance for an IPC server port to be reused between two different iterations of the test loop. The client will then pick up and re-use the existing IPC connection to the old server. However, the old server was shut down and restarted, so the old IPC connection is stale (ie disconnected). This causes the new client to get an EOF when it sends the format() call. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-3906) QJM: quorum timeout on failover with large log segment
[ https://issues.apache.org/jira/browse/HDFS-3906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-3906. --- Resolution: Fixed Fix Version/s: QuorumJournalManager (HDFS-3077) Hadoop Flags: Reviewed Committed to branch, thanks QJM: quorum timeout on failover with large log segment -- Key: HDFS-3906 URL: https://issues.apache.org/jira/browse/HDFS-3906 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Critical Fix For: QuorumJournalManager (HDFS-3077) Attachments: hdfs-3906.txt In doing some stress tests, I ran into an issue with failover if the current edit log segment written by the old active is large. With a 327MB log segment containing 6.4M transactions, the JN took ~11 seconds to read and validate it during the recovery step. This was longer than the 10 second timeout for createNewEpoch, which caused the recovery to fail. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-3840) JournalNodes log JournalNotFormattedException backtrace error before being formatted
[ https://issues.apache.org/jira/browse/HDFS-3840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-3840. --- Resolution: Fixed Fix Version/s: QuorumJournalManager (HDFS-3077) Hadoop Flags: Reviewed Committed to branch, thanks for the quick review. JournalNodes log JournalNotFormattedException backtrace error before being formatted Key: HDFS-3840 URL: https://issues.apache.org/jira/browse/HDFS-3840 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Stephen Chu Assignee: Todd Lipcon Fix For: QuorumJournalManager (HDFS-3077) Attachments: hdfs-3840.txt I started 3 JournalNodes for the first time. Then I formatted the NN. The JournalNodes, log the following error backtrace: {noformat} [root@cs-10-20-193-121 ~]# sudo -u hdfs hdfs journalnode 12/08/22 00:52:22 INFO impl.MetricsConfig: loaded properties from hadoop-metrics2.properties 12/08/22 00:52:22 INFO impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). 12/08/22 00:52:22 INFO impl.MetricsSystemImpl: JournalNode metrics system started 12/08/22 00:52:22 INFO server.JournalNodeHttpServer: Starting web server as: hdfs 12/08/22 00:52:22 INFO mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog 12/08/22 00:52:22 INFO http.HttpServer: Added global filter 'safety' (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter) 12/08/22 00:52:22 INFO http.HttpServer: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context journal 12/08/22 00:52:22 INFO http.HttpServer: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context static 12/08/22 00:52:22 INFO http.HttpServer: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context logs 12/08/22 00:52:22 INFO http.HttpServer: Jetty bound to port 8480 12/08/22 00:52:22 INFO mortbay.log: jetty-6.1.26.cloudera.1 12/08/22 00:52:23 INFO mortbay.log: Started SelectChannelConnector@qjm6.cs1cloud.internal:8480 12/08/22 00:52:23 INFO server.JournalNodeHttpServer: Journal Web-server up at: qjm6.cs1cloud.internal/10.20.193.121:8480:8480 12/08/22 00:52:23 INFO ipc.Server: Starting Socket Reader #1 for port 8485 12/08/22 00:52:23 INFO ipc.Server: IPC Server Responder: starting 12/08/22 00:52:23 INFO ipc.Server: IPC Server listener on 8485: starting 12/08/22 00:52:41 INFO server.JournalNode: Initializing journal in directory /dfs/jn/journal 12/08/22 00:52:41 INFO common.Storage: Storage directory /dfs/jn/journal does not exist. 12/08/22 00:52:41 ERROR security.UserGroupInformation: PriviledgedActionException as:hdfs (auth:SIMPLE) cause:org.apache.hadoop.hdfs.qjournal.protocol.JournalNotFormattedException: Journal lv=0;cid=;nsid=0;c=0 not formatted 12/08/22 00:52:41 INFO ipc.Server: IPC Server handler 0 on 8485, call org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocol.getJournalState from 10.20.187.169:44857: error: org.apache.hadoop.hdfs.qjournal.protocol.JournalNotFormattedException: Journal lv=0;cid=;nsid=0;c=0 not formatted org.apache.hadoop.hdfs.qjournal.protocol.JournalNotFormattedException: Journal lv=0;cid=;nsid=0;c=0 not formatted at org.apache.hadoop.hdfs.qjournal.server.Journal.checkFormatted(Journal.java:265) at org.apache.hadoop.hdfs.qjournal.server.Journal.getLastPromisedEpoch(Journal.java:152) at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.getJournalState(JournalNodeRpcServer.java:97) at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.getJournalState(QJournalProtocolServerSideTranslatorPB.java:71) at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:12230) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687) 12/08/22 00:52:41 INFO server.Journal: Formatting
[jira] [Resolved] (HDFS-3898) QJM: enable TCP_NODELAY for IPC
[ https://issues.apache.org/jira/browse/HDFS-3898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-3898. --- Resolution: Fixed Fix Version/s: QuorumJournalManager (HDFS-3077) Hadoop Flags: Reviewed Committed to branch, thanks for the reviews QJM: enable TCP_NODELAY for IPC --- Key: HDFS-3898 URL: https://issues.apache.org/jira/browse/HDFS-3898 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Blocker Fix For: QuorumJournalManager (HDFS-3077) Attachments: hdfs-3898.txt, hdfs-3898.txt Currently, if the size of the edits batches is larger than the MTU, it can result in 40ms delays due to interaction between nagle's algorithm and delayed ack. Enabling TCP_NODELAY on the sockets solves this issue, so we should set those configs by default for all of the QJM-related IPC. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-3885) QJM: optimize log sync when JN is lagging behind
[ https://issues.apache.org/jira/browse/HDFS-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-3885. --- Resolution: Fixed Fix Version/s: QuorumJournalManager (HDFS-3077) Hadoop Flags: Reviewed Committed to branch, thanks for the review. QJM: optimize log sync when JN is lagging behind Key: HDFS-3885 URL: https://issues.apache.org/jira/browse/HDFS-3885 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: QuorumJournalManager (HDFS-3077) Attachments: hdfs-3885.txt This is a potential optimization that we can add to the JournalNode: when one of the nodes is lagging behind the others (eg because its local disk is slower or there was a network blip), it receives edits after they've been committed to a majority. It can tell this because the committed txid included in the request info is higher than the highest txid in the actual batch to be written. In this case, we know that this batch has already been fsynced to a quorum of nodes, so we can skip the fsync() on the laggy node, helping it to catch back up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-3900) QJM: avoid validating log segments on log rolls
[ https://issues.apache.org/jira/browse/HDFS-3900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-3900. --- Resolution: Fixed Fix Version/s: QuorumJournalManager (HDFS-3077) Hadoop Flags: Reviewed Committed to branch, thanks QJM: avoid validating log segments on log rolls --- Key: HDFS-3900 URL: https://issues.apache.org/jira/browse/HDFS-3900 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: QuorumJournalManager (HDFS-3077) Attachments: hdfs-3900.txt Currently, we are paranoid and validate every log segment when it is finalized. For the a log segment that has been written entirely by one writer, with no recovery in between, this is overly paranoid (we don't do this for local journals). It also causes log rolls to be slow and take time linear in the size of the segment. Instead, we should optimize this path to simply trust that the segment is correct so long as the txids match up as expected. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3914) QJM: acceptRecovery should abort current segment
Todd Lipcon created HDFS-3914: - Summary: QJM: acceptRecovery should abort current segment Key: HDFS-3914 URL: https://issues.apache.org/jira/browse/HDFS-3914 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Todd Lipcon Assignee: Todd Lipcon Found this bug with randomized testing. The following sequence causes a problem: - JN writing segment starting at txid 1, and successfully wrote txid 1, but no more - JN becomes partitioned from NN, and a new NN takes over - new NN is also partitioned for the prepareRecovery phase of recovery, but properly connects for the acceptRecovery call - acceptRecovery copies over a longer log segment (eg txns 1-3) from a good logger - new NN calls finalizeLogSegment(), but gets the following error: JournalOutOfSyncException: Trying to finalize in-progress log segment 1 to end at txid 3 but only written up to txid 1 This is because the syncLog call (which copies the new segment) isn't properly aborting the old segment before replacing it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3915) QJM: Failover fails with auth error in secure cluster
Todd Lipcon created HDFS-3915: - Summary: QJM: Failover fails with auth error in secure cluster Key: HDFS-3915 URL: https://issues.apache.org/jira/browse/HDFS-3915 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, security Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: hdfs-3915.txt When testing failover in a secure cluster with QJM, we ran into the following error: {code} java.io.IOException: Exception trying to open authenticated connection to http://x:8480/getJournal?jid=journalsegmentTxId=4325storageInfo=-40%3A1049822920%3A0%3ACID-d7c84ac3-bb09-4d55-baae-0d561bb55e9b at org.apache.hadoop.security.SecurityUtil.openSecureHttpConnection(SecurityUtil.java:510) at org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream$URLLog$1.run(EditLogFileInputStream.java:376) ... at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:217) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.catchupDuringFailover(EditLogTailer.java:176) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:635) Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt) {code} The issue is that the EditLogFileInputStream uses the current user, which in the case of the failover trigger is the admin's remote user, rather than the NN's login user. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-3901) QJM: send 'heartbeat' messages to JNs even when they are out-of-sync
[ https://issues.apache.org/jira/browse/HDFS-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-3901. --- Resolution: Fixed Fix Version/s: QuorumJournalManager (HDFS-3077) Hadoop Flags: Reviewed Committed to branch, thanks for the reviews, Eli and ATM. QJM: send 'heartbeat' messages to JNs even when they are out-of-sync Key: HDFS-3901 URL: https://issues.apache.org/jira/browse/HDFS-3901 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: QuorumJournalManager (HDFS-3077) Attachments: hdfs-3901.txt, hdfs-3901.txt Currently, if one of the JNs has fallen out of sync with the writer (eg because it went down), it will be marked as such until the next log roll. This causes the writer to no longer send any RPCs to it. This means that the JN's metrics will no longer reflect up-to-date information on how far laggy they are. This patch will introduce a heartbeat() RPC that has no effect except to update the JN's view of the latest committed txid. When the writer is talking to an out-of-sync logger, it will send these heartbeat messages once a second. In a future patch we can extend the heartbeat functionality so that NNs periodically check their connections to JNs if no edits arrive, such that a fenced NN won't accidentally continue to serve reads indefinitely. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-3899) QJM: Writer-side metrics
[ https://issues.apache.org/jira/browse/HDFS-3899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-3899. --- Resolution: Fixed Fix Version/s: QuorumJournalManager (HDFS-3077) Committed to branch, thx. QJM: Writer-side metrics Key: HDFS-3899 URL: https://issues.apache.org/jira/browse/HDFS-3899 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: QuorumJournalManager (HDFS-3077) Attachments: hdfs-3899.txt, hdfs-3899.txt We already have some metrics on the server side (JournalNode) but it's useful to also gather metrics from the client side (NameNode). This is important in order to monitor that the client is seeing good performance from the individual JNs, and so that administrators can set up alerts if any of the JNs has become inaccessible to the NN. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-3914) QJM: acceptRecovery should abort current segment
[ https://issues.apache.org/jira/browse/HDFS-3914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-3914. --- Resolution: Fixed Fix Version/s: QuorumJournalManager (HDFS-3077) Hadoop Flags: Reviewed QJM: acceptRecovery should abort current segment Key: HDFS-3914 URL: https://issues.apache.org/jira/browse/HDFS-3914 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: QuorumJournalManager (HDFS-3077) Attachments: hdfs-3914.txt Found this bug with randomized testing. The following sequence causes a problem: - JN writing segment starting at txid 1, and successfully wrote txid 1, but no more - JN becomes partitioned from NN, and a new NN takes over - new NN is also partitioned for the prepareRecovery phase of recovery, but properly connects for the acceptRecovery call - acceptRecovery copies over a longer log segment (eg txns 1-3) from a good logger - new NN calls finalizeLogSegment(), but gets the following error: JournalOutOfSyncException: Trying to finalize in-progress log segment 1 to end at txid 3 but only written up to txid 1 This is because the syncLog call (which copies the new segment) isn't properly aborting the old segment before replacing it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-3904) QJM: journalnode does not die/log ERROR when keytab is not found in secure mode
[ https://issues.apache.org/jira/browse/HDFS-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-3904. --- Resolution: Duplicate QJM: journalnode does not die/log ERROR when keytab is not found in secure mode --- Key: HDFS-3904 URL: https://issues.apache.org/jira/browse/HDFS-3904 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Eli Collins Priority: Minor Credit to Stephen Chu for finding this. The journalnode was incorrectly configured (misplaced the keytab) with security enabled, when started the JournalNode didn't die. It stayed running and logged a WARN message: {noformat} 2012-08-23 15:44:15,497 WARN org.mortbay.log: Failed startup of context org.mortbay.jetty.webapp.WebAppContext@58c16b18{/,file:/usr/lib/hadoop-hdfs/webapps/journal} javax.servlet.ServletException: javax.servlet.ServletException: Keytab does not exist: /etc/hadoop/conf/hdfs.keytab at org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler.init(KerberosAuthenticationHandler.java:185) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.init(AuthenticationFilter.java:146) at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:713) at org.mortbay.jetty.servlet.Context.startContext(Context.java:140) at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1282) at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:518) at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:499) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152) at org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130) at org.mortbay.jetty.Server.doStart(Server.java:224) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.apache.hadoop.http.HttpServer.start(HttpServer.java:657) at org.apache.hadoop.hdfs.qjournal.server.JournalNodeHttpServer.start(JournalNodeHttpServer.java:83) at org.apache.hadoop.hdfs.qjournal.server.JournalNode.start(JournalNode.java:138) at org.apache.hadoop.hdfs.qjournal.server.JournalNode.run(JournalNode.java:120) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.hdfs.qjournal.server.JournalNode.main(JournalNode.java:228) Caused by: javax.servlet.ServletException: Keytab does not exist: /etc/hadoop/conf/hdfs.keytab at org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler.init(KerberosAuthenticationHandler.java:153) ... 22 more {noformat} The other HDFS daemons, if I remember correctly, would die if they can't authenticate. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-3915) QJM: Failover fails with auth error in secure cluster
[ https://issues.apache.org/jira/browse/HDFS-3915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-3915. --- Resolution: Fixed Fix Version/s: QuorumJournalManager (HDFS-3077) Hadoop Flags: Reviewed QJM: Failover fails with auth error in secure cluster - Key: HDFS-3915 URL: https://issues.apache.org/jira/browse/HDFS-3915 Project: Hadoop HDFS Issue Type: Sub-task Components: ha, security Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: QuorumJournalManager (HDFS-3077) Attachments: hdfs-3915.txt When testing failover in a secure cluster with QJM, we ran into the following error: {code} java.io.IOException: Exception trying to open authenticated connection to http://x:8480/getJournal?jid=journalsegmentTxId=4325storageInfo=-40%3A1049822920%3A0%3ACID-d7c84ac3-bb09-4d55-baae-0d561bb55e9b at org.apache.hadoop.security.SecurityUtil.openSecureHttpConnection(SecurityUtil.java:510) at org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream$URLLog$1.run(EditLogFileInputStream.java:376) ... at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:217) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.catchupDuringFailover(EditLogTailer.java:176) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:635) Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt) {code} The issue is that the EditLogFileInputStream uses the current user, which in the case of the failover trigger is the admin's remote user, rather than the NN's login user. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3906) QJM: quorum timeout on failover with large log segment
Todd Lipcon created HDFS-3906: - Summary: QJM: quorum timeout on failover with large log segment Key: HDFS-3906 URL: https://issues.apache.org/jira/browse/HDFS-3906 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Critical In doing some stress tests, I ran into an issue with failover if the current edit log segment written by the old active is large. With a 327MB log segment containing 6.4M transactions, the JN took ~11 seconds to read and validate it during the recovery step. This was longer than the 10 second timeout for createNewEpoch, which caused the recovery to fail. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3894) QJM: testRecoverAfterDoubleFailures can be flaky due to IPC client caching
Todd Lipcon created HDFS-3894: - Summary: QJM: testRecoverAfterDoubleFailures can be flaky due to IPC client caching Key: HDFS-3894 URL: https://issues.apache.org/jira/browse/HDFS-3894 Project: Hadoop HDFS Issue Type: Sub-task Components: test Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Todd Lipcon Assignee: Todd Lipcon TestQJMWithFaults.testRecoverAfterDoubleFailures fails really occasionally. Looking into it, the issue seems to be that it's possible by random chance for an IPC server port to be reused between two different iterations of the test loop. The client will then pick up and re-use the existing IPC connection to the old server. However, the old server was shut down and restarted, so the old IPC connection is stale (ie disconnected). This causes the new client to get an EOF when it sends the format() call. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-3891) QJM: SBN fails if selectInputStreams throws RTE
[ https://issues.apache.org/jira/browse/HDFS-3891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-3891. --- Resolution: Fixed Fix Version/s: QuorumJournalManager (HDFS-3077) Hadoop Flags: Reviewed QJM: SBN fails if selectInputStreams throws RTE --- Key: HDFS-3891 URL: https://issues.apache.org/jira/browse/HDFS-3891 Project: Hadoop HDFS Issue Type: Sub-task Components: name-node Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: QuorumJournalManager (HDFS-3077) Attachments: hdfs-3891.txt, hdfs-3891.txt Currently, QJM's {{selectInputStream}} method throws an RTE if a quorum cannot be reached. This propagates into the Standby Node and causes the whole node to crash. It should handle this error appropriately. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-3726) QJM: if a logger misses an RPC, don't retry that logger until next segment
[ https://issues.apache.org/jira/browse/HDFS-3726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HDFS-3726. --- Resolution: Fixed Fix Version/s: QuorumJournalManager (HDFS-3077) Hadoop Flags: Reviewed Committed to branch, thanks for the review. QJM: if a logger misses an RPC, don't retry that logger until next segment -- Key: HDFS-3726 URL: https://issues.apache.org/jira/browse/HDFS-3726 Project: Hadoop HDFS Issue Type: Sub-task Components: ha Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: QuorumJournalManager (HDFS-3077) Attachments: hdfs-3726.txt, hdfs-3726.txt Currently, if a logger misses an RPC in the middle of a log segment, or misses the {{startLogSegment}} RPC (eg it was down or network was disconnected during that time period), then it will throw an exception on every subsequent {{journal()}} call in that segment, since it knows that it missed some edits in the middle. We should change this exception to a specific IOE subclass, and have the client side of QJM detect the situation and stop sending IPCs until the next {{startLogSegment}} call. This isn't critical for correctness but will help reduce log spew on both sides. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3898) QJM: enable TCP_NODELAY for IPC
Todd Lipcon created HDFS-3898: - Summary: QJM: enable TCP_NODELAY for IPC Key: HDFS-3898 URL: https://issues.apache.org/jira/browse/HDFS-3898 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Blocker Currently, if the size of the edits batches is larger than the MTU, it can result in 40ms delays due to interaction between nagle's algorithm and delayed ack. Enabling TCP_NODELAY on the sockets solves this issue, so we should set those configs by default for all of the QJM-related IPC. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3900) QJM: avoid validating log segments on log rolls
Todd Lipcon created HDFS-3900: - Summary: QJM: avoid validating log segments on log rolls Key: HDFS-3900 URL: https://issues.apache.org/jira/browse/HDFS-3900 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Todd Lipcon Assignee: Todd Lipcon Currently, we are paranoid and validate every log segment when it is finalized. For the a log segment that has been written entirely by one writer, with no recovery in between, this is overly paranoid (we don't do this for local journals). It also causes log rolls to be slow and take time linear in the size of the segment. Instead, we should optimize this path to simply trust that the segment is correct so long as the txids match up as expected. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3899) QJM: Writer-side metrics
Todd Lipcon created HDFS-3899: - Summary: QJM: Writer-side metrics Key: HDFS-3899 URL: https://issues.apache.org/jira/browse/HDFS-3899 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: QuorumJournalManager (HDFS-3077) Reporter: Todd Lipcon Assignee: Todd Lipcon We already have some metrics on the server side (JournalNode) but it's useful to also gather metrics from the client side (NameNode). This is important in order to monitor that the client is seeing good performance from the individual JNs, and so that administrators can set up alerts if any of the JNs has become inaccessible to the NN. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira