[jira] [Created] (HDFS-13762) Support non-volatile memory or storage class memory(SCM) in HDFS cache
SammiChen created HDFS-13762: Summary: Support non-volatile memory or storage class memory(SCM) in HDFS cache Key: HDFS-13762 URL: https://issues.apache.org/jira/browse/HDFS-13762 Project: Hadoop HDFS Issue Type: Improvement Components: caching, datanode Reporter: SammiChen Assignee: SammiChen Non-volatile memory is a type of memory that can keep the data content after power failure or between the power cycle. Non-volatile memory device usually has near access speed as memory DIMM while has lower cost than memory. So today It is usually used as a supplement to memory to hold long tern persistent data, such as data in cache. Currently in HDFS, we have OS page cache backed read only cache and RAMDISK based lazy write cache. Non-volatile memory suits for both these functions. This Jira aims to enable non-volatile memory first in read cache, and then lazy write case. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13642) Creating a file with block size smaller than EC policy's cell size should throw
[ https://issues.apache.org/jira/browse/HDFS-13642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16505749#comment-16505749 ] SammiChen commented on HDFS-13642: -- [~xiaochen], sorry I didn't notice the second blockManager.verifyReplication. My + 1 for the last patch. Thanks for the contribution. > Creating a file with block size smaller than EC policy's cell size should > throw > --- > > Key: HDFS-13642 > URL: https://issues.apache.org/jira/browse/HDFS-13642 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0 >Reporter: Xiao Chen >Assignee: Xiao Chen >Priority: Major > Attachments: HDFS-13642.01.patch, HDFS-13642.02.patch, > HDFS-13642.03.patch, editsStored > > > The following command causes an exception: > {noformat} > hadoop fs -Ddfs.block.size=349696 -put -f lineitem_sixblocks.parquet > /test-warehouse/tmp123ec > {noformat} > {noformat} > 18/05/25 16:00:59 WARN hdfs.DataStreamer: DataStreamer Exception > java.io.IOException: BlockSize 349696 < lastByteOffsetInBlock, #0: > blk_-9223372036854574256_14634, packet seqno: 7 offsetInBlock: 349696 > lastPacketInBlock: false lastByteOffsetInBlock: 350208 > at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:729) > at > org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:46) > 18/05/25 16:00:59 WARN hdfs.DFSOutputStream: Failed: offset=4096, length=512, > DFSStripedOutputStream:#0: failed, blk_-9223372036854574256_14634 > java.io.IOException: BlockSize 349696 < lastByteOffsetInBlock, #0: > blk_-9223372036854574256_14634, packet seqno: 7 offsetInBlock: 349696 > lastPacketInBlock: false lastByteOffsetInBlock: 350208 > at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:729) > at > org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:46) > {noformat} > Then the streamer is confused and hangs. > The local file is under 6MB, the hdfs file has a RS-3-2-1024k EC policy. > > Credit to [~tarasbob] for reporting this issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13642) Creating a file with block size smaller than EC policy's cell size should throw
[ https://issues.apache.org/jira/browse/HDFS-13642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16502852#comment-16502852 ] SammiChen commented on HDFS-13642: -- Right, the current {{hasErasureCodingPolicy}} doesn't check if it's replication EC policy or normal EC policy. We should improve the function to add the check. The original check should be kept. {quote} if (shouldReplicate || {color:#f79232} (org.apache.commons.lang.StringUtils.isEmpty(ecPolicyName) && !FSDirErasureCodingOp.hasErasureCodingPolicy(this, iip))){color} { blockManager.verifyReplication(src, replication, clientMachine); } {quote} When the file is a 3 replica file, {{blockManager.verifyReplication}} should be called to verify the replication factor. The value of {{shouldReplicate}} doesn't indicate file is 3 replica or not. The value of {{shouldReplicate}} only reflect if the {{CreateFlag.SHOULD_REPLICATE}} is explicated set. > Creating a file with block size smaller than EC policy's cell size should > throw > --- > > Key: HDFS-13642 > URL: https://issues.apache.org/jira/browse/HDFS-13642 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0 >Reporter: Xiao Chen >Assignee: Xiao Chen >Priority: Major > Attachments: HDFS-13642.01.patch, HDFS-13642.02.patch, > HDFS-13642.03.patch, editsStored > > > The following command causes an exception: > {noformat} > hadoop fs -Ddfs.block.size=349696 -put -f lineitem_sixblocks.parquet > /test-warehouse/tmp123ec > {noformat} > {noformat} > 18/05/25 16:00:59 WARN hdfs.DataStreamer: DataStreamer Exception > java.io.IOException: BlockSize 349696 < lastByteOffsetInBlock, #0: > blk_-9223372036854574256_14634, packet seqno: 7 offsetInBlock: 349696 > lastPacketInBlock: false lastByteOffsetInBlock: 350208 > at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:729) > at > org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:46) > 18/05/25 16:00:59 WARN hdfs.DFSOutputStream: Failed: offset=4096, length=512, > DFSStripedOutputStream:#0: failed, blk_-9223372036854574256_14634 > java.io.IOException: BlockSize 349696 < lastByteOffsetInBlock, #0: > blk_-9223372036854574256_14634, packet seqno: 7 offsetInBlock: 349696 > lastPacketInBlock: false lastByteOffsetInBlock: 350208 > at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:729) > at > org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:46) > {noformat} > Then the streamer is confused and hangs. > The local file is under 6MB, the hdfs file has a RS-3-2-1024k EC policy. > > Credit to [~tarasbob] for reporting this issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13642) Creating a file with block size smaller than EC policy's cell size should throw
[ https://issues.apache.org/jira/browse/HDFS-13642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16499852#comment-16499852 ] SammiChen commented on HDFS-13642: -- Some comments, 1. *private static final int BLOCK_SIZE = 1 << 20; // 16k* change the comments from 16k to 1MB 2. {quote} if (!shouldReplicate) { final ErasureCodingPolicy ecPolicy = FSDirErasureCodingOp .getErasureCodingPolicy(this, ecPolicyName, iip); if (ecPolicy != null && (!ecPolicy.isReplicationPolicy())) { if (blockSize < ecPolicy.getCellSize()) { throw new IOException("Specified block size " + blockSize + " is less than the cell" + " size (" + ecPolicy.getCellSize() + ") of the erasure coding policy on this file."); } } } {quote} When create a normal 3-replica file, {{shouldReplicate}} value is false. This value is true when user set the {{CreateFlag.SHOULD_REPLICATE}} explicitly when calling the create API. One suggestion is adding the block size, cell size compare statements as the else statement of {quote} if (shouldReplicate || (org.apache.commons.lang.StringUtils.isEmpty(ecPolicyName) && !FSDirErasureCodingOp.hasErasureCodingPolicy(this, iip))) { blockManager.verifyReplication(src, replication, clientMachine); } {quote} Thanks for working on it, [~xiaochen]. > Creating a file with block size smaller than EC policy's cell size should > throw > --- > > Key: HDFS-13642 > URL: https://issues.apache.org/jira/browse/HDFS-13642 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0 >Reporter: Xiao Chen >Assignee: Xiao Chen >Priority: Major > Attachments: HDFS-13642.01.patch, HDFS-13642.02.patch, editsStored > > > The following command causes an exception: > {noformat} > hadoop fs -Ddfs.block.size=349696 -put -f lineitem_sixblocks.parquet > /test-warehouse/tmp123ec > {noformat} > {noformat} > 18/05/25 16:00:59 WARN hdfs.DataStreamer: DataStreamer Exception > java.io.IOException: BlockSize 349696 < lastByteOffsetInBlock, #0: > blk_-9223372036854574256_14634, packet seqno: 7 offsetInBlock: 349696 > lastPacketInBlock: false lastByteOffsetInBlock: 350208 > at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:729) > at > org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:46) > 18/05/25 16:00:59 WARN hdfs.DFSOutputStream: Failed: offset=4096, length=512, > DFSStripedOutputStream:#0: failed, blk_-9223372036854574256_14634 > java.io.IOException: BlockSize 349696 < lastByteOffsetInBlock, #0: > blk_-9223372036854574256_14634, packet seqno: 7 offsetInBlock: 349696 > lastPacketInBlock: false lastByteOffsetInBlock: 350208 > at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:729) > at > org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:46) > {noformat} > Then the streamer is confused and hangs. > The local file is under 6MB, the hdfs file has a RS-3-2-1024k EC policy. > > Credit to [~tarasbob] for reporting this issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13642) Creating a file with block size smaller than EC policy's cell size should throw
[ https://issues.apache.org/jira/browse/HDFS-13642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16496266#comment-16496266 ] SammiChen commented on HDFS-13642: -- [~xiaochen], agree, NN should reject the request when the block size is less than minimum block size. NN should also reject if EC policy cell size is greater than the block size. I will find time tomorrow to review the code. > Creating a file with block size smaller than EC policy's cell size should > throw > --- > > Key: HDFS-13642 > URL: https://issues.apache.org/jira/browse/HDFS-13642 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0 >Reporter: Xiao Chen >Assignee: Xiao Chen >Priority: Major > Attachments: HDFS-13642.01.patch, HDFS-13642.02.patch, editsStored > > > The following command causes an exception: > {noformat} > hadoop fs -Ddfs.block.size=349696 -put -f lineitem_sixblocks.parquet > /test-warehouse/tmp123ec > {noformat} > {noformat} > 18/05/25 16:00:59 WARN hdfs.DataStreamer: DataStreamer Exception > java.io.IOException: BlockSize 349696 < lastByteOffsetInBlock, #0: > blk_-9223372036854574256_14634, packet seqno: 7 offsetInBlock: 349696 > lastPacketInBlock: false lastByteOffsetInBlock: 350208 > at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:729) > at > org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:46) > 18/05/25 16:00:59 WARN hdfs.DFSOutputStream: Failed: offset=4096, length=512, > DFSStripedOutputStream:#0: failed, blk_-9223372036854574256_14634 > java.io.IOException: BlockSize 349696 < lastByteOffsetInBlock, #0: > blk_-9223372036854574256_14634, packet seqno: 7 offsetInBlock: 349696 > lastPacketInBlock: false lastByteOffsetInBlock: 350208 > at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:729) > at > org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:46) > {noformat} > Then the streamer is confused and hangs. > The local file is under 6MB, the hdfs file has a RS-3-2-1024k EC policy. > > Credit to [~tarasbob] for reporting this issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13540) DFSStripedInputStream should only allocate new buffers when reading
[ https://issues.apache.org/jira/browse/HDFS-13540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SammiChen updated HDFS-13540: - Resolution: Fixed Target Version/s: 3.0.3 (was: 3.0.4) Status: Resolved (was: Patch Available) > DFSStripedInputStream should only allocate new buffers when reading > --- > > Key: HDFS-13540 > URL: https://issues.apache.org/jira/browse/HDFS-13540 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Xiao Chen >Assignee: Xiao Chen >Priority: Major > Fix For: 3.2.0, 3.1.1, 3.0.3 > > Attachments: HDFS-13540.01.patch, HDFS-13540.02.patch, > HDFS-13540.03.patch, HDFS-13540.04.patch, HDFS-13540.05.patch, > HDFS-13540.06.patch > > > This was found in the same scenario where HDFS-13539 is caught. > There are 2 OOM that looks interesting: > {noformat} > FSDataInputStream#close error: > OutOfMemoryError: Direct buffer memoryjava.lang.OutOfMemoryError: Direct > buffer memory > at java.nio.Bits.reserveMemory(Bits.java:694) > at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) > at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) > at > org.apache.hadoop.io.ElasticByteBufferPool.getBuffer(ElasticByteBufferPool.java:95) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.resetCurStripeBuffer(DFSStripedInputStream.java:118) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.closeCurrentBlockReaders(DFSStripedInputStream.java:205) > at > org.apache.hadoop.hdfs.DFSInputStream.close(DFSInputStream.java:672) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.close(DFSStripedInputStream.java:181) > at java.io.FilterInputStream.close(FilterInputStream.java:181) > {noformat} > and > {noformat} > org/apache/hadoop/fs/FSDataInputStream#unbuffer failed: error: > OutOfMemoryError: Direct buffer memoryjava.lang.OutOfMemoryError: Direct > buffer memory > at java.nio.Bits.reserveMemory(Bits.java:694) > at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) > at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) > at > org.apache.hadoop.io.ElasticByteBufferPool.getBuffer(ElasticByteBufferPool.java:95) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.resetCurStripeBuffer(DFSStripedInputStream.java:118) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.closeCurrentBlockReaders(DFSStripedInputStream.java:205) > at > org.apache.hadoop.hdfs.DFSInputStream.unbuffer(DFSInputStream.java:1782) > at > org.apache.hadoop.fs.StreamCapabilitiesPolicy.unbuffer(StreamCapabilitiesPolicy.java:48) > at > org.apache.hadoop.fs.FSDataInputStream.unbuffer(FSDataInputStream.java:230) > {noformat} > As the stack trace goes, {{resetCurStripeBuffer}} will get buffer from the > buffer pool. We could save the cost of doing so if it's not for a read (e.g. > close, unbuffer etc.) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13540) DFSStripedInputStream should only allocate new buffers when reading
[ https://issues.apache.org/jira/browse/HDFS-13540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16487136#comment-16487136 ] SammiChen commented on HDFS-13540: -- +1. Thanks [~xiaochen] for the contribution. Committed to trunk, branch-3.0 and branch-3.1. > DFSStripedInputStream should only allocate new buffers when reading > --- > > Key: HDFS-13540 > URL: https://issues.apache.org/jira/browse/HDFS-13540 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Xiao Chen >Assignee: Xiao Chen >Priority: Major > Fix For: 3.2.0, 3.1.1, 3.0.3 > > Attachments: HDFS-13540.01.patch, HDFS-13540.02.patch, > HDFS-13540.03.patch, HDFS-13540.04.patch, HDFS-13540.05.patch, > HDFS-13540.06.patch > > > This was found in the same scenario where HDFS-13539 is caught. > There are 2 OOM that looks interesting: > {noformat} > FSDataInputStream#close error: > OutOfMemoryError: Direct buffer memoryjava.lang.OutOfMemoryError: Direct > buffer memory > at java.nio.Bits.reserveMemory(Bits.java:694) > at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) > at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) > at > org.apache.hadoop.io.ElasticByteBufferPool.getBuffer(ElasticByteBufferPool.java:95) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.resetCurStripeBuffer(DFSStripedInputStream.java:118) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.closeCurrentBlockReaders(DFSStripedInputStream.java:205) > at > org.apache.hadoop.hdfs.DFSInputStream.close(DFSInputStream.java:672) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.close(DFSStripedInputStream.java:181) > at java.io.FilterInputStream.close(FilterInputStream.java:181) > {noformat} > and > {noformat} > org/apache/hadoop/fs/FSDataInputStream#unbuffer failed: error: > OutOfMemoryError: Direct buffer memoryjava.lang.OutOfMemoryError: Direct > buffer memory > at java.nio.Bits.reserveMemory(Bits.java:694) > at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) > at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) > at > org.apache.hadoop.io.ElasticByteBufferPool.getBuffer(ElasticByteBufferPool.java:95) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.resetCurStripeBuffer(DFSStripedInputStream.java:118) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.closeCurrentBlockReaders(DFSStripedInputStream.java:205) > at > org.apache.hadoop.hdfs.DFSInputStream.unbuffer(DFSInputStream.java:1782) > at > org.apache.hadoop.fs.StreamCapabilitiesPolicy.unbuffer(StreamCapabilitiesPolicy.java:48) > at > org.apache.hadoop.fs.FSDataInputStream.unbuffer(FSDataInputStream.java:230) > {noformat} > As the stack trace goes, {{resetCurStripeBuffer}} will get buffer from the > buffer pool. We could save the cost of doing so if it's not for a read (e.g. > close, unbuffer etc.) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13540) DFSStripedInputStream should only allocate new buffers when reading
[ https://issues.apache.org/jira/browse/HDFS-13540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SammiChen updated HDFS-13540: - Fix Version/s: 3.0.3 3.1.1 3.2.0 > DFSStripedInputStream should only allocate new buffers when reading > --- > > Key: HDFS-13540 > URL: https://issues.apache.org/jira/browse/HDFS-13540 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Xiao Chen >Assignee: Xiao Chen >Priority: Major > Fix For: 3.2.0, 3.1.1, 3.0.3 > > Attachments: HDFS-13540.01.patch, HDFS-13540.02.patch, > HDFS-13540.03.patch, HDFS-13540.04.patch, HDFS-13540.05.patch, > HDFS-13540.06.patch > > > This was found in the same scenario where HDFS-13539 is caught. > There are 2 OOM that looks interesting: > {noformat} > FSDataInputStream#close error: > OutOfMemoryError: Direct buffer memoryjava.lang.OutOfMemoryError: Direct > buffer memory > at java.nio.Bits.reserveMemory(Bits.java:694) > at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) > at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) > at > org.apache.hadoop.io.ElasticByteBufferPool.getBuffer(ElasticByteBufferPool.java:95) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.resetCurStripeBuffer(DFSStripedInputStream.java:118) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.closeCurrentBlockReaders(DFSStripedInputStream.java:205) > at > org.apache.hadoop.hdfs.DFSInputStream.close(DFSInputStream.java:672) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.close(DFSStripedInputStream.java:181) > at java.io.FilterInputStream.close(FilterInputStream.java:181) > {noformat} > and > {noformat} > org/apache/hadoop/fs/FSDataInputStream#unbuffer failed: error: > OutOfMemoryError: Direct buffer memoryjava.lang.OutOfMemoryError: Direct > buffer memory > at java.nio.Bits.reserveMemory(Bits.java:694) > at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) > at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) > at > org.apache.hadoop.io.ElasticByteBufferPool.getBuffer(ElasticByteBufferPool.java:95) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.resetCurStripeBuffer(DFSStripedInputStream.java:118) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.closeCurrentBlockReaders(DFSStripedInputStream.java:205) > at > org.apache.hadoop.hdfs.DFSInputStream.unbuffer(DFSInputStream.java:1782) > at > org.apache.hadoop.fs.StreamCapabilitiesPolicy.unbuffer(StreamCapabilitiesPolicy.java:48) > at > org.apache.hadoop.fs.FSDataInputStream.unbuffer(FSDataInputStream.java:230) > {noformat} > As the stack trace goes, {{resetCurStripeBuffer}} will get buffer from the > buffer pool. We could save the cost of doing so if it's not for a read (e.g. > close, unbuffer etc.) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13540) DFSStripedInputStream should only allocate new buffers when reading
[ https://issues.apache.org/jira/browse/HDFS-13540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483581#comment-16483581 ] SammiChen commented on HDFS-13540: -- Hi Xiao, the overall idea looks good to me. 1. There are two relevant unit tests failed. The error message is "expected:<0> but was:<2>". Maybe we can dig into why 2 buffers are allocated for a open stream which haven't read any content net. 2. @VisibleForTesting ahead of resetCurStripeBuffer is not necessary now. > DFSStripedInputStream should only allocate new buffers when reading > --- > > Key: HDFS-13540 > URL: https://issues.apache.org/jira/browse/HDFS-13540 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Xiao Chen >Assignee: Xiao Chen >Priority: Major > Attachments: HDFS-13540.01.patch, HDFS-13540.02.patch, > HDFS-13540.03.patch, HDFS-13540.04.patch, HDFS-13540.05.patch > > > This was found in the same scenario where HDFS-13539 is caught. > There are 2 OOM that looks interesting: > {noformat} > FSDataInputStream#close error: > OutOfMemoryError: Direct buffer memoryjava.lang.OutOfMemoryError: Direct > buffer memory > at java.nio.Bits.reserveMemory(Bits.java:694) > at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) > at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) > at > org.apache.hadoop.io.ElasticByteBufferPool.getBuffer(ElasticByteBufferPool.java:95) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.resetCurStripeBuffer(DFSStripedInputStream.java:118) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.closeCurrentBlockReaders(DFSStripedInputStream.java:205) > at > org.apache.hadoop.hdfs.DFSInputStream.close(DFSInputStream.java:672) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.close(DFSStripedInputStream.java:181) > at java.io.FilterInputStream.close(FilterInputStream.java:181) > {noformat} > and > {noformat} > org/apache/hadoop/fs/FSDataInputStream#unbuffer failed: error: > OutOfMemoryError: Direct buffer memoryjava.lang.OutOfMemoryError: Direct > buffer memory > at java.nio.Bits.reserveMemory(Bits.java:694) > at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) > at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) > at > org.apache.hadoop.io.ElasticByteBufferPool.getBuffer(ElasticByteBufferPool.java:95) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.resetCurStripeBuffer(DFSStripedInputStream.java:118) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.closeCurrentBlockReaders(DFSStripedInputStream.java:205) > at > org.apache.hadoop.hdfs.DFSInputStream.unbuffer(DFSInputStream.java:1782) > at > org.apache.hadoop.fs.StreamCapabilitiesPolicy.unbuffer(StreamCapabilitiesPolicy.java:48) > at > org.apache.hadoop.fs.FSDataInputStream.unbuffer(FSDataInputStream.java:230) > {noformat} > As the stack trace goes, {{resetCurStripeBuffer}} will get buffer from the > buffer pool. We could save the cost of doing so if it's not for a read (e.g. > close, unbuffer etc.) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13540) DFSStripedInputStream should not allocate new buffers during close / unbuffer
[ https://issues.apache.org/jira/browse/HDFS-13540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16478871#comment-16478871 ] SammiChen commented on HDFS-13540: -- [~xiaochen], thanks for the explanation. It makes sense to change the Jira title as your proposal. I double checked the code, *curStripeBuf* is only used in two EC read functions. For the new test case, I would suggest, # change the name from testCloseDoesNotGetBuffer to testCloseDoesNotAllocateNewBuffer. It's more clear. # the test case always passes even when I use "true" in closeCurrentBlockReaders. Because the *curStripeBuf* will be set to *null* after *stream.close* is called. So *assertNull(stream.getCurStripeBuf());* always stands. The alternative to check whether buffer is allocated or not is to check the number of buffers holds by *ElasticByteBufferPool*. > DFSStripedInputStream should not allocate new buffers during close / unbuffer > - > > Key: HDFS-13540 > URL: https://issues.apache.org/jira/browse/HDFS-13540 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Xiao Chen >Assignee: Xiao Chen >Priority: Major > Attachments: HDFS-13540.01.patch, HDFS-13540.02.patch, > HDFS-13540.03.patch > > > This was found in the same scenario where HDFS-13539 is caught. > There are 2 OOM that looks interesting: > {noformat} > FSDataInputStream#close error: > OutOfMemoryError: Direct buffer memoryjava.lang.OutOfMemoryError: Direct > buffer memory > at java.nio.Bits.reserveMemory(Bits.java:694) > at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) > at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) > at > org.apache.hadoop.io.ElasticByteBufferPool.getBuffer(ElasticByteBufferPool.java:95) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.resetCurStripeBuffer(DFSStripedInputStream.java:118) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.closeCurrentBlockReaders(DFSStripedInputStream.java:205) > at > org.apache.hadoop.hdfs.DFSInputStream.close(DFSInputStream.java:672) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.close(DFSStripedInputStream.java:181) > at java.io.FilterInputStream.close(FilterInputStream.java:181) > {noformat} > and > {noformat} > org/apache/hadoop/fs/FSDataInputStream#unbuffer failed: error: > OutOfMemoryError: Direct buffer memoryjava.lang.OutOfMemoryError: Direct > buffer memory > at java.nio.Bits.reserveMemory(Bits.java:694) > at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) > at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) > at > org.apache.hadoop.io.ElasticByteBufferPool.getBuffer(ElasticByteBufferPool.java:95) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.resetCurStripeBuffer(DFSStripedInputStream.java:118) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.closeCurrentBlockReaders(DFSStripedInputStream.java:205) > at > org.apache.hadoop.hdfs.DFSInputStream.unbuffer(DFSInputStream.java:1782) > at > org.apache.hadoop.fs.StreamCapabilitiesPolicy.unbuffer(StreamCapabilitiesPolicy.java:48) > at > org.apache.hadoop.fs.FSDataInputStream.unbuffer(FSDataInputStream.java:230) > {noformat} > As the stack trace goes, {{resetCurStripeBuffer}} will get buffer from the > buffer pool. We could save the cost of doing so if it's not for a read (e.g. > close, unbuffer etc.) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13540) DFSStripedInputStream should not allocate new buffers during close / unbuffer
[ https://issues.apache.org/jira/browse/HDFS-13540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16476804#comment-16476804 ] SammiChen commented on HDFS-13540: -- Hi [~xiaochen], thanks for working on this! *{{closeCurrentBlockReaders}}* is called by *{{close}}, {{unbuffer}}*, and *{{DFSStripedInputStream.blockSeekTo}}*. I feel like when we use *{{resetCurStripeBuffer(false)}}* in *{{closeCurrentBlockReaders}}*, *{{DFSStripedInputStream.readWithStrategy}}* which calls *{{blockSeekTo}}* will have issue. Can you double check that? > DFSStripedInputStream should not allocate new buffers during close / unbuffer > - > > Key: HDFS-13540 > URL: https://issues.apache.org/jira/browse/HDFS-13540 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Xiao Chen >Assignee: Xiao Chen >Priority: Major > Attachments: HDFS-13540.01.patch, HDFS-13540.02.patch, > HDFS-13540.03.patch > > > This was found in the same scenario where HDFS-13539 is caught. > There are 2 OOM that looks interesting: > {noformat} > FSDataInputStream#close error: > OutOfMemoryError: Direct buffer memoryjava.lang.OutOfMemoryError: Direct > buffer memory > at java.nio.Bits.reserveMemory(Bits.java:694) > at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) > at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) > at > org.apache.hadoop.io.ElasticByteBufferPool.getBuffer(ElasticByteBufferPool.java:95) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.resetCurStripeBuffer(DFSStripedInputStream.java:118) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.closeCurrentBlockReaders(DFSStripedInputStream.java:205) > at > org.apache.hadoop.hdfs.DFSInputStream.close(DFSInputStream.java:672) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.close(DFSStripedInputStream.java:181) > at java.io.FilterInputStream.close(FilterInputStream.java:181) > {noformat} > and > {noformat} > org/apache/hadoop/fs/FSDataInputStream#unbuffer failed: error: > OutOfMemoryError: Direct buffer memoryjava.lang.OutOfMemoryError: Direct > buffer memory > at java.nio.Bits.reserveMemory(Bits.java:694) > at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) > at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) > at > org.apache.hadoop.io.ElasticByteBufferPool.getBuffer(ElasticByteBufferPool.java:95) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.resetCurStripeBuffer(DFSStripedInputStream.java:118) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.closeCurrentBlockReaders(DFSStripedInputStream.java:205) > at > org.apache.hadoop.hdfs.DFSInputStream.unbuffer(DFSInputStream.java:1782) > at > org.apache.hadoop.fs.StreamCapabilitiesPolicy.unbuffer(StreamCapabilitiesPolicy.java:48) > at > org.apache.hadoop.fs.FSDataInputStream.unbuffer(FSDataInputStream.java:230) > {noformat} > As the stack trace goes, {{resetCurStripeBuffer}} will get buffer from the > buffer pool. We could save the cost of doing so if it's not for a read (e.g. > close, unbuffer etc.) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-13388) RequestHedgingProxyProvider calls multiple configured NNs all the time
[ https://issues.apache.org/jira/browse/HDFS-13388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431908#comment-16431908 ] SammiChen edited comment on HDFS-13388 at 4/10/18 8:22 AM: --- Hi [~LiJinglun] and [~elgoiri], branch-2.9 suffers build failure with this commit. Would you please double check it? also check the branch-2. was (Author: sammi): Hi [~LiJinglun] and [~elgoiri], branch-2.9 suffers build failure with this commit. Would you please double check it? > RequestHedgingProxyProvider calls multiple configured NNs all the time > -- > > Key: HDFS-13388 > URL: https://issues.apache.org/jira/browse/HDFS-13388 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Reporter: Jinglun >Assignee: Jinglun >Priority: Major > Fix For: 2.10.0, 3.2.0, 3.1.1, 2.9.2, 3.0.4 > > Attachments: HADOOP-13388.0001.patch, HADOOP-13388.0002.patch, > HADOOP-13388.0003.patch, HADOOP-13388.0004.patch, HADOOP-13388.0005.patch, > HADOOP-13388.0006.patch > > > In HDFS-7858 RequestHedgingProxyProvider was designed to "first > simultaneously call multiple configured NNs to decide which is the active > Namenode and then for subsequent calls it will invoke the previously > successful NN ." But the current code call multiple configured NNs every time > even when we already got the successful NN. > That's because in RetryInvocationHandler.java, ProxyDescriptor's member > proxyInfo is assigned only when it is constructed or when failover occurs. > RequestHedgingProxyProvider.currentUsedProxy is null in both cases, so the > only proxy we can get is always a dynamic proxy handled by > RequestHedgingInvocationHandler.class. RequestHedgingInvocationHandler.class > handles invoked method by calling multiple configured NNs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13388) RequestHedgingProxyProvider calls multiple configured NNs all the time
[ https://issues.apache.org/jira/browse/HDFS-13388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431908#comment-16431908 ] SammiChen commented on HDFS-13388: -- Hi [~LiJinglun] and [~elgoiri], branch-2.9 suffers build failure with this commit. Would you please double check it? > RequestHedgingProxyProvider calls multiple configured NNs all the time > -- > > Key: HDFS-13388 > URL: https://issues.apache.org/jira/browse/HDFS-13388 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Reporter: Jinglun >Assignee: Jinglun >Priority: Major > Fix For: 2.10.0, 3.2.0, 3.1.1, 2.9.2, 3.0.4 > > Attachments: HADOOP-13388.0001.patch, HADOOP-13388.0002.patch, > HADOOP-13388.0003.patch, HADOOP-13388.0004.patch, HADOOP-13388.0005.patch, > HADOOP-13388.0006.patch > > > In HDFS-7858 RequestHedgingProxyProvider was designed to "first > simultaneously call multiple configured NNs to decide which is the active > Namenode and then for subsequent calls it will invoke the previously > successful NN ." But the current code call multiple configured NNs every time > even when we already got the successful NN. > That's because in RetryInvocationHandler.java, ProxyDescriptor's member > proxyInfo is assigned only when it is constructed or when failover occurs. > RequestHedgingProxyProvider.currentUsedProxy is null in both cases, so the > only proxy we can get is always a dynamic proxy handled by > RequestHedgingInvocationHandler.class. RequestHedgingInvocationHandler.class > handles invoked method by calling multiple configured NNs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11600) Refactor TestDFSStripedOutputStreamWithFailure test classes
[ https://issues.apache.org/jira/browse/HDFS-11600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SammiChen updated HDFS-11600: - Component/s: erasure-coding > Refactor TestDFSStripedOutputStreamWithFailure test classes > --- > > Key: HDFS-11600 > URL: https://issues.apache.org/jira/browse/HDFS-11600 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding, test >Affects Versions: 3.0.0-alpha2 >Reporter: Andrew Wang >Assignee: SammiChen >Priority: Minor > Fix For: 3.1.0, 3.0.3 > > Attachments: HDFS-11600-1.patch, HDFS-11600.002.patch, > HDFS-11600.003.patch, HDFS-11600.004.patch, HDFS-11600.005.patch, > HDFS-11600.006.patch, HDFS-11600.007.patch > > > TestDFSStripedOutputStreamWithFailure has a great number of subclasses. The > tests are parameterized based on the name of these subclasses. > Seems like we could parameterize these tests with JUnit and then not need all > these separate test classes. > Another note, the tests will randomly return instead of running the test. > Using {{Assume}} instead would make it more clear in the test output that > these tests were skipped. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10183) Prevent race condition during class initialization
[ https://issues.apache.org/jira/browse/HDFS-10183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SammiChen updated HDFS-10183: - Fix Version/s: (was: 2.9.1) > Prevent race condition during class initialization > -- > > Key: HDFS-10183 > URL: https://issues.apache.org/jira/browse/HDFS-10183 > Project: Hadoop HDFS > Issue Type: Bug > Components: fs >Affects Versions: 2.9.0 >Reporter: Pavel Avgustinov >Assignee: Pavel Avgustinov >Priority: Minor > Attachments: HADOOP-12944.1.patch, HDFS-10183.2.patch > > > In HADOOP-11969, [~busbey] tracked down a non-deterministic > {{NullPointerException}} to an oddity in the Java memory model: When multiple > threads trigger the loading of a class at the same time, one of them wins and > creates the {{java.lang.Class}} instance; the others block during this > initialization, but once it is complete they may obtain a reference to the > {{Class}} which has non-{{final}} fields still containing their default (i.e. > {{null}}) values. This leads to runtime failures that are hard to debug or > diagnose. > HADOOP-11969 observed that {{ThreadLocal}} fields, by their very nature, are > very likely to be accessed from multiple threads, and thus the problem is > particularly severe there. Consequently, the patch removed all occurrences of > the issue in the code base. > Unfortunately, since then HDFS-7964 has [reverted one of the fixes during a > refactoring|https://github.com/apache/hadoop/commit/2151716832ad14932dd65b1a4e47e64d8d6cd767#diff-0c2e9f7f9e685f38d1a11373b627cfa6R151], > and introduced a [new instance of the > problem|https://github.com/apache/hadoop/commit/2151716832ad14932dd65b1a4e47e64d8d6cd767#diff-6334d0df7d9aefbccd12b21bb7603169R43]. > The attached patch addresses the issue by adding the missing {{final}} > modifier in these two cases. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13337) Backport HDFS-4275 to branch-2.9
[ https://issues.apache.org/jira/browse/HDFS-13337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SammiChen updated HDFS-13337: - Target Version/s: 2.10.0, 2.9.2 (was: 2.10.0, 2.9.1) > Backport HDFS-4275 to branch-2.9 > > > Key: HDFS-13337 > URL: https://issues.apache.org/jira/browse/HDFS-13337 > Project: Hadoop HDFS > Issue Type: Test >Reporter: Íñigo Goiri >Assignee: Xiao Liang >Priority: Minor > Attachments: HDFS-13337-branch-2.000.patch > > > Multiple HDFS test suites fail on Windows during initialization of > MiniDFSCluster due to "Could not fully delete" the name testing data > directory. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11885) createEncryptionZone should not block on initializing EDEK cache
[ https://issues.apache.org/jira/browse/HDFS-11885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SammiChen updated HDFS-11885: - Target Version/s: 2.8.3, 3.2.0, 2.9.2 (was: 2.8.3, 2.9.1, 3.2.0) > createEncryptionZone should not block on initializing EDEK cache > > > Key: HDFS-11885 > URL: https://issues.apache.org/jira/browse/HDFS-11885 > Project: Hadoop HDFS > Issue Type: Bug > Components: encryption >Affects Versions: 2.6.5 >Reporter: Andrew Wang >Assignee: Andrew Wang >Priority: Major > Attachments: HDFS-11885.001.patch, HDFS-11885.002.patch, > HDFS-11885.003.patch, HDFS-11885.004.patch > > > When creating an encryption zone, we call {{ensureKeyIsInitialized}}, which > calls {{provider.warmUpEncryptedKeys(keyName)}}. This is a blocking call, > which attempts to fill the key cache up to the low watermark. > If the KMS is down or slow, this can take a very long time, and cause the > createZone RPC to fail with a timeout. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12257) Expose getSnapshottableDirListing as a public API in HdfsAdmin
[ https://issues.apache.org/jira/browse/HDFS-12257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SammiChen updated HDFS-12257: - Target Version/s: 2.8.3, 3.2.0, 2.9.2 (was: 2.8.3, 2.9.1, 3.2.0) > Expose getSnapshottableDirListing as a public API in HdfsAdmin > -- > > Key: HDFS-12257 > URL: https://issues.apache.org/jira/browse/HDFS-12257 > Project: Hadoop HDFS > Issue Type: Improvement > Components: snapshots >Affects Versions: 2.6.5 >Reporter: Andrew Wang >Assignee: Huafeng Wang >Priority: Major > Attachments: HDFS-12257.001.patch, HDFS-12257.002.patch, > HDFS-12257.003.patch > > > Found at HIVE-16294. We have a CLI API for listing snapshottable dirs, but no > programmatic API. Other snapshot APIs are exposed in HdfsAdmin, I think we > should expose listing there as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13051) dead lock occurs when rolleditlog rpc call happen and editPendingQ is full
[ https://issues.apache.org/jira/browse/HDFS-13051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SammiChen updated HDFS-13051: - Target Version/s: 2.10.0, 2.8.4, 2.7.6, 3.0.2, 2.9.2 (was: 2.10.0, 2.9.1, 2.8.4, 2.7.6, 3.0.2) > dead lock occurs when rolleditlog rpc call happen and editPendingQ is full > -- > > Key: HDFS-13051 > URL: https://issues.apache.org/jira/browse/HDFS-13051 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.5 >Reporter: zhangwei >Assignee: Daryn Sharp >Priority: Major > Labels: AsyncEditlog, deadlock > Attachments: HDFS-13112.patch, deadlock.patch > > > when doing rolleditlog it acquires fs write lock,then acquire FSEditLogAsync > lock object,and write 3 EDIT(the second one override logEdit method and > return true) > in extremely case,when FSEditLogAsync's logSync is very > slow,editPendingQ(default size 4096)is full,it case IPC thread can not offer > edit object into editPendingQ when doing rolleditlog,it block on editPendingQ > .put method,however it does't release FSEditLogAsync object lock, and > edit.logEdit method in FSEditLogAsync.run thread can never acquire > FSEditLogAsync object lock, it case dead lock > stack trace like below > "Thread[Thread-44528,5,main]" #130093 daemon prio=5 os_prio=0 > tid=0x02377000 nid=0x13fda waiting on condition [0x7fb3297de000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x7fbd3cb96f58> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) > at java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:353) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.enqueueEdit(FSEditLogAsync.java:156) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.logEdit(FSEditLogAsync.java:118) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.logCancelDelegationToken(FSEditLog.java:1008) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.logExpireDelegationToken(FSNamesystem.java:7635) > at > org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenSecretManager.logExpireToken(DelegationTokenSecretManager.java:395) > - locked <0x7fbd3cbae500> (a java.lang.Object) > at > org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenSecretManager.logExpireToken(DelegationTokenSecretManager.java:62) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.removeExpiredToken(AbstractDelegationTokenSecretManager.java:604) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.access$400(AbstractDelegationTokenSecretManager.java:54) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager$ExpiredTokenRemover.run(AbstractDelegationTokenSecretManager.java:656) > at java.lang.Thread.run(Thread.java:745) > "FSEditLogAsync" #130072 daemon prio=5 os_prio=0 tid=0x0715b800 > nid=0x13fbf waiting for monitor entry [0x7fb32c51a000] > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.doEditTransaction(FSEditLog.java:443) > - waiting to lock <*0x7fbcbc131000*> (a > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync$Edit.logEdit(FSEditLogAsync.java:233) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.run(FSEditLogAsync.java:177) > at java.lang.Thread.run(Thread.java:745) > "IPC Server handler 47 on 53310" #337 daemon prio=5 os_prio=0 > tid=0x7fe659d46000 nid=0x4c62 waiting on condition [0x7fb32fe52000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x7fbd3cb96f58> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) > at java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:353) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.enqueueEdit(FSEditLogAsync.java:156) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.logEdit(FSEditLogAsync.java:118) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:1251) > - locked <*0x7fbcbc131000*> (a >
[jira] [Updated] (HDFS-13174) hdfs mover -p /path times out after 20 min
[ https://issues.apache.org/jira/browse/HDFS-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SammiChen updated HDFS-13174: - Target Version/s: 3.0.1, 2.8.4, 2.7.6, 2.9.2 (was: 2.9.1, 3.0.1, 2.8.4, 2.7.6) > hdfs mover -p /path times out after 20 min > -- > > Key: HDFS-13174 > URL: https://issues.apache.org/jira/browse/HDFS-13174 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover >Affects Versions: 2.8.0, 2.7.4, 3.0.0-alpha2 >Reporter: Istvan Fajth >Assignee: Istvan Fajth >Priority: Major > > In HDFS-11015 there is an iteration timeout introduced in Dispatcher.Source > class, that is checked during dispatching the moves that the Balancer and the > Mover does. This timeout is hardwired to 20 minutes. > In the Balancer we have iterations, and even if an iteration is timing out > the Balancer runs further and does an other iteration before it fails if > there were no moves happened in a few iterations. > The Mover on the other hand does not have iterations, so if moving a path > runs for more than 20 minutes, after 20 minutes Mover will stop with the > following exception reported to the console (lines might differ as this > exception came from a CDH5.12.1 installation): > java.io.IOException: Block move timed out > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.receiveResponse(Dispatcher.java:382) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:328) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2500(Dispatcher.java:186) > at > org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:956) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11600) Refactor TestDFSStripedOutputStreamWithFailure test classes
[ https://issues.apache.org/jira/browse/HDFS-11600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SammiChen updated HDFS-11600: - Fix Version/s: (was: 3.2.0) > Refactor TestDFSStripedOutputStreamWithFailure test classes > --- > > Key: HDFS-11600 > URL: https://issues.apache.org/jira/browse/HDFS-11600 > Project: Hadoop HDFS > Issue Type: Improvement > Components: test >Affects Versions: 3.0.0-alpha2 >Reporter: Andrew Wang >Assignee: SammiChen >Priority: Minor > Fix For: 3.1.0, 3.0.2 > > Attachments: HDFS-11600-1.patch, HDFS-11600.002.patch, > HDFS-11600.003.patch, HDFS-11600.004.patch, HDFS-11600.005.patch, > HDFS-11600.006.patch, HDFS-11600.007.patch > > > TestDFSStripedOutputStreamWithFailure has a great number of subclasses. The > tests are parameterized based on the name of these subclasses. > Seems like we could parameterize these tests with JUnit and then not need all > these separate test classes. > Another note, the tests will randomly return instead of running the test. > Using {{Assume}} instead would make it more clear in the test output that > these tests were skipped. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11600) Refactor TestDFSStripedOutputStreamWithFailure test classes
[ https://issues.apache.org/jira/browse/HDFS-11600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SammiChen updated HDFS-11600: - Target Version/s: 3.1.0, 3.0.2 (was: 3.1.0) > Refactor TestDFSStripedOutputStreamWithFailure test classes > --- > > Key: HDFS-11600 > URL: https://issues.apache.org/jira/browse/HDFS-11600 > Project: Hadoop HDFS > Issue Type: Improvement > Components: test >Affects Versions: 3.0.0-alpha2 >Reporter: Andrew Wang >Assignee: SammiChen >Priority: Minor > Fix For: 3.1.0, 3.0.2 > > Attachments: HDFS-11600-1.patch, HDFS-11600.002.patch, > HDFS-11600.003.patch, HDFS-11600.004.patch, HDFS-11600.005.patch, > HDFS-11600.006.patch, HDFS-11600.007.patch > > > TestDFSStripedOutputStreamWithFailure has a great number of subclasses. The > tests are parameterized based on the name of these subclasses. > Seems like we could parameterize these tests with JUnit and then not need all > these separate test classes. > Another note, the tests will randomly return instead of running the test. > Using {{Assume}} instead would make it more clear in the test output that > these tests were skipped. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11600) Refactor TestDFSStripedOutputStreamWithFailure test classes
[ https://issues.apache.org/jira/browse/HDFS-11600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SammiChen updated HDFS-11600: - Resolution: Fixed Fix Version/s: 3.2.0 3.0.2 3.1.0 Status: Resolved (was: Patch Available) > Refactor TestDFSStripedOutputStreamWithFailure test classes > --- > > Key: HDFS-11600 > URL: https://issues.apache.org/jira/browse/HDFS-11600 > Project: Hadoop HDFS > Issue Type: Improvement > Components: test >Affects Versions: 3.0.0-alpha2 >Reporter: Andrew Wang >Assignee: SammiChen >Priority: Minor > Fix For: 3.1.0, 3.0.2, 3.2.0 > > Attachments: HDFS-11600-1.patch, HDFS-11600.002.patch, > HDFS-11600.003.patch, HDFS-11600.004.patch, HDFS-11600.005.patch, > HDFS-11600.006.patch, HDFS-11600.007.patch > > > TestDFSStripedOutputStreamWithFailure has a great number of subclasses. The > tests are parameterized based on the name of these subclasses. > Seems like we could parameterize these tests with JUnit and then not need all > these separate test classes. > Another note, the tests will randomly return instead of running the test. > Using {{Assume}} instead would make it more clear in the test output that > these tests were skipped. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-11600) Refactor TestDFSStripedOutputStreamWithFailure test classes
[ https://issues.apache.org/jira/browse/HDFS-11600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SammiChen reassigned HDFS-11600: Assignee: SammiChen > Refactor TestDFSStripedOutputStreamWithFailure test classes > --- > > Key: HDFS-11600 > URL: https://issues.apache.org/jira/browse/HDFS-11600 > Project: Hadoop HDFS > Issue Type: Improvement > Components: test >Affects Versions: 3.0.0-alpha2 >Reporter: Andrew Wang >Assignee: SammiChen >Priority: Minor > Attachments: HDFS-11600-1.patch, HDFS-11600.002.patch, > HDFS-11600.003.patch, HDFS-11600.004.patch, HDFS-11600.005.patch, > HDFS-11600.006.patch, HDFS-11600.007.patch > > > TestDFSStripedOutputStreamWithFailure has a great number of subclasses. The > tests are parameterized based on the name of these subclasses. > Seems like we could parameterize these tests with JUnit and then not need all > these separate test classes. > Another note, the tests will randomly return instead of running the test. > Using {{Assume}} instead would make it more clear in the test output that > these tests were skipped. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11600) Refactor TestDFSStripedOutputStreamWithFailure test classes
[ https://issues.apache.org/jira/browse/HDFS-11600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16396772#comment-16396772 ] SammiChen commented on HDFS-11600: -- Thanks [~xiaochen] for the review. I uploaded 007 patch to address the line length checkstyle issue. Will commit after the pre-commit build comes out. Also thanks [~andrew.wang] for the initial patches and [~rakeshr] for the review. > Refactor TestDFSStripedOutputStreamWithFailure test classes > --- > > Key: HDFS-11600 > URL: https://issues.apache.org/jira/browse/HDFS-11600 > Project: Hadoop HDFS > Issue Type: Improvement > Components: test >Affects Versions: 3.0.0-alpha2 >Reporter: Andrew Wang >Priority: Minor > Attachments: HDFS-11600-1.patch, HDFS-11600.002.patch, > HDFS-11600.003.patch, HDFS-11600.004.patch, HDFS-11600.005.patch, > HDFS-11600.006.patch, HDFS-11600.007.patch > > > TestDFSStripedOutputStreamWithFailure has a great number of subclasses. The > tests are parameterized based on the name of these subclasses. > Seems like we could parameterize these tests with JUnit and then not need all > these separate test classes. > Another note, the tests will randomly return instead of running the test. > Using {{Assume}} instead would make it more clear in the test output that > these tests were skipped. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11600) Refactor TestDFSStripedOutputStreamWithFailure test classes
[ https://issues.apache.org/jira/browse/HDFS-11600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SammiChen updated HDFS-11600: - Attachment: HDFS-11600.007.patch > Refactor TestDFSStripedOutputStreamWithFailure test classes > --- > > Key: HDFS-11600 > URL: https://issues.apache.org/jira/browse/HDFS-11600 > Project: Hadoop HDFS > Issue Type: Improvement > Components: test >Affects Versions: 3.0.0-alpha2 >Reporter: Andrew Wang >Priority: Minor > Attachments: HDFS-11600-1.patch, HDFS-11600.002.patch, > HDFS-11600.003.patch, HDFS-11600.004.patch, HDFS-11600.005.patch, > HDFS-11600.006.patch, HDFS-11600.007.patch > > > TestDFSStripedOutputStreamWithFailure has a great number of subclasses. The > tests are parameterized based on the name of these subclasses. > Seems like we could parameterize these tests with JUnit and then not need all > these separate test classes. > Another note, the tests will randomly return instead of running the test. > Using {{Assume}} instead would make it more clear in the test output that > these tests were skipped. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11600) Refactor TestDFSStripedOutputStreamWithFailure test classes
[ https://issues.apache.org/jira/browse/HDFS-11600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16396493#comment-16396493 ] SammiChen commented on HDFS-11600: -- Thanks [~xiaochen] for the comments. bq. Do you know why this range was chosen? I didn't know initial reason. By going through the code, I guess at the moment the TestDFSStripedOutputStreamWithFailure is introduced, the only supported EC policy is RS-6-3-64K. The intent is to test the file with length varies from [0, 1, 2] block groups, each time block group's cell number varies from [0 - (6(data block number)*4(cell per block)-1] , plus [-1,0,1] delta length. So approximately there will be total 3 * ((6 * 4) -1) * 3 = 207 length variants. While now we support more EC policies, especially RS-10-4, so the previous 210 variants doesn't stand any more. Actually the variants should varies when different EC polices is used inTestDFSStripedOutputStreamWithFailureWithRandomECPolicy. bq. This is from existing code, but now may be a good chance to change - could you do tearDown with a @After annotation? This way, each test doesn't have to try-finally. Agree. While there is a loop in testBlockTokenExpired which requires setup and tearDown the cluster every iterate. So It seems better to keep it. > Refactor TestDFSStripedOutputStreamWithFailure test classes > --- > > Key: HDFS-11600 > URL: https://issues.apache.org/jira/browse/HDFS-11600 > Project: Hadoop HDFS > Issue Type: Improvement > Components: test >Affects Versions: 3.0.0-alpha2 >Reporter: Andrew Wang >Priority: Minor > Attachments: HDFS-11600-1.patch, HDFS-11600.002.patch, > HDFS-11600.003.patch, HDFS-11600.004.patch, HDFS-11600.005.patch, > HDFS-11600.006.patch > > > TestDFSStripedOutputStreamWithFailure has a great number of subclasses. The > tests are parameterized based on the name of these subclasses. > Seems like we could parameterize these tests with JUnit and then not need all > these separate test classes. > Another note, the tests will randomly return instead of running the test. > Using {{Assume}} instead would make it more clear in the test output that > these tests were skipped. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11600) Refactor TestDFSStripedOutputStreamWithFailure test classes
[ https://issues.apache.org/jira/browse/HDFS-11600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SammiChen updated HDFS-11600: - Attachment: HDFS-11600.006.patch > Refactor TestDFSStripedOutputStreamWithFailure test classes > --- > > Key: HDFS-11600 > URL: https://issues.apache.org/jira/browse/HDFS-11600 > Project: Hadoop HDFS > Issue Type: Improvement > Components: test >Affects Versions: 3.0.0-alpha2 >Reporter: Andrew Wang >Priority: Minor > Attachments: HDFS-11600-1.patch, HDFS-11600.002.patch, > HDFS-11600.003.patch, HDFS-11600.004.patch, HDFS-11600.005.patch, > HDFS-11600.006.patch > > > TestDFSStripedOutputStreamWithFailure has a great number of subclasses. The > tests are parameterized based on the name of these subclasses. > Seems like we could parameterize these tests with JUnit and then not need all > these separate test classes. > Another note, the tests will randomly return instead of running the test. > Using {{Assume}} instead would make it more clear in the test output that > these tests were skipped. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11600) Refactor TestDFSStripedOutputStreamWithFailure test classes
[ https://issues.apache.org/jira/browse/HDFS-11600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SammiChen updated HDFS-11600: - Attachment: HDFS-11600.005.patch > Refactor TestDFSStripedOutputStreamWithFailure test classes > --- > > Key: HDFS-11600 > URL: https://issues.apache.org/jira/browse/HDFS-11600 > Project: Hadoop HDFS > Issue Type: Improvement > Components: test >Affects Versions: 3.0.0-alpha2 >Reporter: Andrew Wang >Priority: Minor > Attachments: HDFS-11600-1.patch, HDFS-11600.002.patch, > HDFS-11600.003.patch, HDFS-11600.004.patch, HDFS-11600.005.patch > > > TestDFSStripedOutputStreamWithFailure has a great number of subclasses. The > tests are parameterized based on the name of these subclasses. > Seems like we could parameterize these tests with JUnit and then not need all > these separate test classes. > Another note, the tests will randomly return instead of running the test. > Using {{Assume}} instead would make it more clear in the test output that > these tests were skipped. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-11600) Refactor TestDFSStripedOutputStreamWithFailure test classes
[ https://issues.apache.org/jira/browse/HDFS-11600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394916#comment-16394916 ] SammiChen edited comment on HDFS-11600 at 3/12/18 8:37 AM: --- Hi [~rakeshr], thanks for the comments. bq. 2. I hope you have named the class with "P" to represent parameterized class. Can we give a meaningful name instead of appending with letter "P" - TestDFSStripedOutputStreamWithFailureP, TestDFSStripedOutputStreamWithFailurePWithRandomECPolicy. Regarding the class name with "P", you are right, it stands for "parameterized". I was originally using the full word, then found out the class name becomes very very long, especially "TestDFSStripedOutputStreamWithFailureWithRandomECPolicy". bq. 4.TestDFSStripedOutputStreamWithFailureBase#testCloseWithExceptionsInStreamer function is not used anywhere. Whats the purpose of this? testCloseWithExceptionsInStreamer is both TestDFSStripedOutputStreamWithFailureBase and TestDFSStripedOutputStreamWithFailure. I will remove it from TestDFSStripedOutputStreamWithFailureBase. Will soon upload a new patch. was (Author: sammi): Hi [~rakeshr], thanks for the comments. I will upload a new patch to address all the issues. Regarding the class name with "P", you are right, it stands for "parameterized". I was originally using the full word, then found out the class name becomes very very long, especially "TestDFSStripedOutputStreamWithFailureWithRandomECPolicy". > Refactor TestDFSStripedOutputStreamWithFailure test classes > --- > > Key: HDFS-11600 > URL: https://issues.apache.org/jira/browse/HDFS-11600 > Project: Hadoop HDFS > Issue Type: Improvement > Components: test >Affects Versions: 3.0.0-alpha2 >Reporter: Andrew Wang >Priority: Minor > Attachments: HDFS-11600-1.patch, HDFS-11600.002.patch, > HDFS-11600.003.patch, HDFS-11600.004.patch > > > TestDFSStripedOutputStreamWithFailure has a great number of subclasses. The > tests are parameterized based on the name of these subclasses. > Seems like we could parameterize these tests with JUnit and then not need all > these separate test classes. > Another note, the tests will randomly return instead of running the test. > Using {{Assume}} instead would make it more clear in the test output that > these tests were skipped. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11600) Refactor TestDFSStripedOutputStreamWithFailure test classes
[ https://issues.apache.org/jira/browse/HDFS-11600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394916#comment-16394916 ] SammiChen commented on HDFS-11600: -- Hi [~rakeshr], thanks for the comments. I will upload a new patch to address all the issues. Regarding the class name with "P", you are right, it stands for "parameterized". I was originally using the full word, then found out the class name becomes very very long, especially "TestDFSStripedOutputStreamWithFailureWithRandomECPolicy". > Refactor TestDFSStripedOutputStreamWithFailure test classes > --- > > Key: HDFS-11600 > URL: https://issues.apache.org/jira/browse/HDFS-11600 > Project: Hadoop HDFS > Issue Type: Improvement > Components: test >Affects Versions: 3.0.0-alpha2 >Reporter: Andrew Wang >Priority: Minor > Attachments: HDFS-11600-1.patch, HDFS-11600.002.patch, > HDFS-11600.003.patch, HDFS-11600.004.patch > > > TestDFSStripedOutputStreamWithFailure has a great number of subclasses. The > tests are parameterized based on the name of these subclasses. > Seems like we could parameterize these tests with JUnit and then not need all > these separate test classes. > Another note, the tests will randomly return instead of running the test. > Using {{Assume}} instead would make it more clear in the test output that > these tests were skipped. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11600) Refactor TestDFSStripedOutputStreamWithFailure test classes
[ https://issues.apache.org/jira/browse/HDFS-11600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390803#comment-16390803 ] SammiChen commented on HDFS-11600: -- Hi [~rakeshr], do you have time to help review the patch? > Refactor TestDFSStripedOutputStreamWithFailure test classes > --- > > Key: HDFS-11600 > URL: https://issues.apache.org/jira/browse/HDFS-11600 > Project: Hadoop HDFS > Issue Type: Improvement > Components: test >Affects Versions: 3.0.0-alpha2 >Reporter: Andrew Wang >Priority: Minor > Attachments: HDFS-11600-1.patch, HDFS-11600.002.patch, > HDFS-11600.003.patch, HDFS-11600.004.patch > > > TestDFSStripedOutputStreamWithFailure has a great number of subclasses. The > tests are parameterized based on the name of these subclasses. > Seems like we could parameterize these tests with JUnit and then not need all > these separate test classes. > Another note, the tests will randomly return instead of running the test. > Using {{Assume}} instead would make it more clear in the test output that > these tests were skipped. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11600) Refactor TestDFSStripedOutputStreamWithFailure test classes
[ https://issues.apache.org/jira/browse/HDFS-11600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389328#comment-16389328 ] SammiChen commented on HDFS-11600: -- Handle javac issues and improve the {{testMultipleDatanodeFailure56}} to make sure will not run out of heap size. > Refactor TestDFSStripedOutputStreamWithFailure test classes > --- > > Key: HDFS-11600 > URL: https://issues.apache.org/jira/browse/HDFS-11600 > Project: Hadoop HDFS > Issue Type: Improvement > Components: test >Affects Versions: 3.0.0-alpha2 >Reporter: Andrew Wang >Priority: Minor > Attachments: HDFS-11600-1.patch, HDFS-11600.002.patch, > HDFS-11600.003.patch, HDFS-11600.004.patch > > > TestDFSStripedOutputStreamWithFailure has a great number of subclasses. The > tests are parameterized based on the name of these subclasses. > Seems like we could parameterize these tests with JUnit and then not need all > these separate test classes. > Another note, the tests will randomly return instead of running the test. > Using {{Assume}} instead would make it more clear in the test output that > these tests were skipped. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11600) Refactor TestDFSStripedOutputStreamWithFailure test classes
[ https://issues.apache.org/jira/browse/HDFS-11600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SammiChen updated HDFS-11600: - Attachment: HDFS-11600.004.patch > Refactor TestDFSStripedOutputStreamWithFailure test classes > --- > > Key: HDFS-11600 > URL: https://issues.apache.org/jira/browse/HDFS-11600 > Project: Hadoop HDFS > Issue Type: Improvement > Components: test >Affects Versions: 3.0.0-alpha2 >Reporter: Andrew Wang >Priority: Minor > Attachments: HDFS-11600-1.patch, HDFS-11600.002.patch, > HDFS-11600.003.patch, HDFS-11600.004.patch > > > TestDFSStripedOutputStreamWithFailure has a great number of subclasses. The > tests are parameterized based on the name of these subclasses. > Seems like we could parameterize these tests with JUnit and then not need all > these separate test classes. > Another note, the tests will randomly return instead of running the test. > Using {{Assume}} instead would make it more clear in the test output that > these tests were skipped. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12654) APPEND API call is different in HTTPFS and NameNode REST
[ https://issues.apache.org/jira/browse/HDFS-12654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387350#comment-16387350 ] SammiChen commented on HDFS-12654: -- Hi, [~Nuke] and [~iwasakims], seem it's not a issue after the further investigation. Can it be closed? > APPEND API call is different in HTTPFS and NameNode REST > > > Key: HDFS-12654 > URL: https://issues.apache.org/jira/browse/HDFS-12654 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs, httpfs, namenode >Affects Versions: 2.6.0, 2.7.0, 2.8.0, 3.0.0-beta1 >Reporter: Andras Czesznak >Priority: Major > > The APPEND REST API call behaves differently in the NameNode REST and the > HTTPFS codes. The NameNode version creates the target file the new data being > appended to if it does not exist at the time of the call issued. The HTTPFS > version assumes the target file exists when APPEND is called and can append > only the new data but does not create the target file it doesn't exist. > The two implementations should be standardized, preferably the HTTPFS version > should be modified to execute an implicit CREATE if the target file does not > exist. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11600) Refactor TestDFSStripedOutputStreamWithFailure test classes
[ https://issues.apache.org/jira/browse/HDFS-11600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387171#comment-16387171 ] SammiChen commented on HDFS-11600: -- The output links in last build expired. Try to trigger the build again. > Refactor TestDFSStripedOutputStreamWithFailure test classes > --- > > Key: HDFS-11600 > URL: https://issues.apache.org/jira/browse/HDFS-11600 > Project: Hadoop HDFS > Issue Type: Improvement > Components: test >Affects Versions: 3.0.0-alpha2 >Reporter: Andrew Wang >Priority: Minor > Attachments: HDFS-11600-1.patch, HDFS-11600.002.patch, > HDFS-11600.003.patch > > > TestDFSStripedOutputStreamWithFailure has a great number of subclasses. The > tests are parameterized based on the name of these subclasses. > Seems like we could parameterize these tests with JUnit and then not need all > these separate test classes. > Another note, the tests will randomly return instead of running the test. > Using {{Assume}} instead would make it more clear in the test output that > these tests were skipped. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11885) createEncryptionZone should not block on initializing EDEK cache
[ https://issues.apache.org/jira/browse/HDFS-11885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383049#comment-16383049 ] SammiChen commented on HDFS-11885: -- Is it still on target for 2.9.1? If not, can we push this out from 2.9.1 to next release? > createEncryptionZone should not block on initializing EDEK cache > > > Key: HDFS-11885 > URL: https://issues.apache.org/jira/browse/HDFS-11885 > Project: Hadoop HDFS > Issue Type: Bug > Components: encryption >Affects Versions: 2.6.5 >Reporter: Andrew Wang >Assignee: Andrew Wang >Priority: Major > Attachments: HDFS-11885.001.patch, HDFS-11885.002.patch, > HDFS-11885.003.patch, HDFS-11885.004.patch > > > When creating an encryption zone, we call {{ensureKeyIsInitialized}}, which > calls {{provider.warmUpEncryptedKeys(keyName)}}. This is a blocking call, > which attempts to fill the key cache up to the low watermark. > If the KMS is down or slow, this can take a very long time, and cause the > createZone RPC to fail with a timeout. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-12257) Expose getSnapshottableDirListing as a public API in HdfsAdmin
[ https://issues.apache.org/jira/browse/HDFS-12257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383028#comment-16383028 ] SammiChen edited comment on HDFS-12257 at 3/2/18 2:29 AM: -- Hi [~HuafengWang], does this target for 2.9.1? If not, can we push this out to next 2.9.2 release? was (Author: sammi): Hi [~HuafengWang], does this target for 2.9.1? If not, can we push this out to next 2.9 release? > Expose getSnapshottableDirListing as a public API in HdfsAdmin > -- > > Key: HDFS-12257 > URL: https://issues.apache.org/jira/browse/HDFS-12257 > Project: Hadoop HDFS > Issue Type: Improvement > Components: snapshots >Affects Versions: 2.6.5 >Reporter: Andrew Wang >Assignee: Huafeng Wang >Priority: Major > Attachments: HDFS-12257.001.patch, HDFS-12257.002.patch, > HDFS-12257.003.patch > > > Found at HIVE-16294. We have a CLI API for listing snapshottable dirs, but no > programmatic API. Other snapshot APIs are exposed in HdfsAdmin, I think we > should expose listing there as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12257) Expose getSnapshottableDirListing as a public API in HdfsAdmin
[ https://issues.apache.org/jira/browse/HDFS-12257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383028#comment-16383028 ] SammiChen commented on HDFS-12257: -- Hi [~HuafengWang], does this target for 2.9.1? If not, can we push this out to next 2.9 release? > Expose getSnapshottableDirListing as a public API in HdfsAdmin > -- > > Key: HDFS-12257 > URL: https://issues.apache.org/jira/browse/HDFS-12257 > Project: Hadoop HDFS > Issue Type: Improvement > Components: snapshots >Affects Versions: 2.6.5 >Reporter: Andrew Wang >Assignee: Huafeng Wang >Priority: Major > Attachments: HDFS-12257.001.patch, HDFS-12257.002.patch, > HDFS-12257.003.patch > > > Found at HIVE-16294. We have a CLI API for listing snapshottable dirs, but no > programmatic API. Other snapshot APIs are exposed in HdfsAdmin, I think we > should expose listing there as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11600) Refactor TestDFSStripedOutputStreamWithFailure test classes
[ https://issues.apache.org/jira/browse/HDFS-11600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16359850#comment-16359850 ] SammiChen commented on HDFS-11600: -- I uploaded a new patch based on Andrew's 002 patch. The idea is to separate {{TestDFSStripedOutputStreamWithFailure}} into {{TestDFSStripedOutputStreamWithFailureBase}} which carries common routines and variable defines, {{TestDFSStripedOutputStreamWithFailure}} which carries fixed parameter test cases, and {{TestDFSStripedOutputStreamWithFailureP}} which carries parameterized test case. In {{TestDFSStripedOutputStreamWithFailureP}}, I refine the current test case. Each time it will randomly choose 10 file length to run the test case. Given that the largest built-in EC policy currently support is RS-10-4-1MB, 10 rounds of same test case with random 1 data node failure seems enough. [~andrew.wang], would you take a look at the new patch at your convenient time? > Refactor TestDFSStripedOutputStreamWithFailure test classes > --- > > Key: HDFS-11600 > URL: https://issues.apache.org/jira/browse/HDFS-11600 > Project: Hadoop HDFS > Issue Type: Improvement > Components: test >Affects Versions: 3.0.0-alpha2 >Reporter: Andrew Wang >Priority: Minor > Attachments: HDFS-11600-1.patch, HDFS-11600.002.patch, > HDFS-11600.003.patch > > > TestDFSStripedOutputStreamWithFailure has a great number of subclasses. The > tests are parameterized based on the name of these subclasses. > Seems like we could parameterize these tests with JUnit and then not need all > these separate test classes. > Another note, the tests will randomly return instead of running the test. > Using {{Assume}} instead would make it more clear in the test output that > these tests were skipped. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11600) Refactor TestDFSStripedOutputStreamWithFailure test classes
[ https://issues.apache.org/jira/browse/HDFS-11600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SammiChen updated HDFS-11600: - Attachment: HDFS-11600.003.patch > Refactor TestDFSStripedOutputStreamWithFailure test classes > --- > > Key: HDFS-11600 > URL: https://issues.apache.org/jira/browse/HDFS-11600 > Project: Hadoop HDFS > Issue Type: Improvement > Components: test >Affects Versions: 3.0.0-alpha2 >Reporter: Andrew Wang >Priority: Minor > Attachments: HDFS-11600-1.patch, HDFS-11600.002.patch, > HDFS-11600.003.patch > > > TestDFSStripedOutputStreamWithFailure has a great number of subclasses. The > tests are parameterized based on the name of these subclasses. > Seems like we could parameterize these tests with JUnit and then not need all > these separate test classes. > Another note, the tests will randomly return instead of running the test. > Using {{Assume}} instead would make it more clear in the test output that > these tests were skipped. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11600) Refactor TestDFSStripedOutputStreamWithFailure test classes
[ https://issues.apache.org/jira/browse/HDFS-11600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SammiChen updated HDFS-11600: - Status: Patch Available (was: Open) > Refactor TestDFSStripedOutputStreamWithFailure test classes > --- > > Key: HDFS-11600 > URL: https://issues.apache.org/jira/browse/HDFS-11600 > Project: Hadoop HDFS > Issue Type: Improvement > Components: test >Affects Versions: 3.0.0-alpha2 >Reporter: Andrew Wang >Priority: Minor > Attachments: HDFS-11600-1.patch, HDFS-11600.002.patch, > HDFS-11600.003.patch > > > TestDFSStripedOutputStreamWithFailure has a great number of subclasses. The > tests are parameterized based on the name of these subclasses. > Seems like we could parameterize these tests with JUnit and then not need all > these separate test classes. > Another note, the tests will randomly return instead of running the test. > Using {{Assume}} instead would make it more clear in the test output that > these tests were skipped. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11600) Refactor TestDFSStripedOutputStreamWithFailure test classes
[ https://issues.apache.org/jira/browse/HDFS-11600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16358077#comment-16358077 ] SammiChen commented on HDFS-11600: -- Hi [~andrew.wang], the idea of using JUnit parameterize to really good, it helps to cleanup the messy test cases. I took a further look into the TestDFSStripedOutputStreamWithFailure. Many test cases are constant, not related with parameter. So I think we can further split the TestDFSStripedOutputStreamWithFailure into 2 files, one with constant test case, another is parameterized. What do you think? By the way, If you don't have too much time lately, I can take over it. > Refactor TestDFSStripedOutputStreamWithFailure test classes > --- > > Key: HDFS-11600 > URL: https://issues.apache.org/jira/browse/HDFS-11600 > Project: Hadoop HDFS > Issue Type: Improvement > Components: test >Affects Versions: 3.0.0-alpha2 >Reporter: Andrew Wang >Priority: Minor > Attachments: HDFS-11600-1.patch, HDFS-11600.002.patch > > > TestDFSStripedOutputStreamWithFailure has a great number of subclasses. The > tests are parameterized based on the name of these subclasses. > Seems like we could parameterize these tests with JUnit and then not need all > these separate test classes. > Another note, the tests will randomly return instead of running the test. > Using {{Assume}} instead would make it more clear in the test output that > these tests were skipped. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12462) Erasure coding policy extra options should be sorted by key value
[ https://issues.apache.org/jira/browse/HDFS-12462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16315629#comment-16315629 ] SammiChen commented on HDFS-12462: -- Sure. Thanks [~liaoyuxiangqin] for be interested in this task. Currently, each erasure coding policy schema has extra options which is an {{Map}} which doesn't order the element by its keys. When the in memory EC policies are saved into fsImage two times. The fsImage part on disk are not identical if you compare the first fsImage with the second fsImage one by one byte, which will potentially cause problems in some cases. This task is to make sure the serialized part of EC policies in fsimage and editlog keep the same no matter how many times they are been saved. > Erasure coding policy extra options should be sorted by key value > - > > Key: HDFS-12462 > URL: https://issues.apache.org/jira/browse/HDFS-12462 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding >Reporter: SammiChen > Labels: hdfs-ec-3.0-nice-to-have > > To make sure the serialized fsimage and editlog binary equal, Erasure coding > policy extra options should be sorted by key value. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12860) StripedBlockUtil#getRangesInternalBlocks throws exception for the block group size larger than 2GB
[ https://issues.apache.org/jira/browse/HDFS-12860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16311064#comment-16311064 ] SammiChen commented on HDFS-12860: -- Just come back from a long vacation. Sorry for the late response. Thanks [~eddyxu] for refine the test case. It's more clear and readable now. For end-to-end tests, I worried about if it's the only 2GB related bug in EC code. Anyway, with current title scope, I'm good with the current test coverage. My + 1 for the patch. > StripedBlockUtil#getRangesInternalBlocks throws exception for the block group > size larger than 2GB > -- > > Key: HDFS-12860 > URL: https://issues.apache.org/jira/browse/HDFS-12860 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu > Attachments: HDFS-12860.00.patch, HDFS-12860.01.patch > > > Running terasort on a cluster with 8 datanodes, 256GB data, using > RS-3-2-1024k. > The test data was generated by {{teragen}} with 32 mappers. > The terasort benchmark fails with the following stack trace: > {code} > 17/11/27 14:44:31 INFO mapreduce.Job: map 45% reduce 0% > 17/11/27 14:44:33 INFO mapreduce.Job: Task Id : > attempt_1510080297865_0160_m_08_0, Status : FAILED > Error: java.lang.IllegalArgumentException > at > com.google.common.base.Preconditions.checkArgument(Preconditions.java:72) > at > org.apache.hadoop.hdfs.util.StripedBlockUtil$VerticalRange.(StripedBlockUtil.java:701) > at > org.apache.hadoop.hdfs.util.StripedBlockUtil.getRangesForInternalBlocks(StripedBlockUtil.java:442) > at > org.apache.hadoop.hdfs.util.StripedBlockUtil.divideOneStripe(StripedBlockUtil.java:311) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.readOneStripe(DFSStripedInputStream.java:308) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.readWithStrategy(DFSStripedInputStream.java:391) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:813) > at java.io.DataInputStream.read(DataInputStream.java:149) > at > org.apache.hadoop.examples.terasort.TeraInputFormat$TeraRecordReader.nextKeyValue(TeraInputFormat.java:257) > at > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:562) > at > org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80) > at > org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12860) StripedBlockUtil#getRangesInternalBlocks throws exception for the block group size larger than 2GB
[ https://issues.apache.org/jira/browse/HDFS-12860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16300987#comment-16300987 ] SammiChen commented on HDFS-12860: -- Hi [~eddyxu], thanks for report the issue and work on it. 1. It's great to add error message to provide more information when Precondition check fails. There are "%d" used in String.format and "%s" used in Preconditions. Is it because Preconditions doesn't support "%s"? 2. ")" is missed in {{AlignedStripe.toString}} and {{StripingCell.toString}} 3. Can you add some javadoc or comment in {{testDivideOneStripeLargeBlockSize}}? If we want to test block group larger than 2GB, use the RS-6-3-1024k as an example, the {{stripSize}} is 9 * 1MB, {{stripesPerBlock}} will be > (2 * 1024) / 9M, {{blockSize}} is {{cellSize * stripesPerBlock}}.Also I would suggest add a end-to-end test case in {{TestErasureCodingPolicies}}. > StripedBlockUtil#getRangesInternalBlocks throws exception for the block group > size larger than 2GB > -- > > Key: HDFS-12860 > URL: https://issues.apache.org/jira/browse/HDFS-12860 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu > Attachments: HDFS-12860.00.patch > > > Running terasort on a cluster with 8 datanodes, 256GB data, using > RS-3-2-1024k. > The test data was generated by {{teragen}} with 32 mappers. > The terasort benchmark fails with the following stack trace: > {code} > 17/11/27 14:44:31 INFO mapreduce.Job: map 45% reduce 0% > 17/11/27 14:44:33 INFO mapreduce.Job: Task Id : > attempt_1510080297865_0160_m_08_0, Status : FAILED > Error: java.lang.IllegalArgumentException > at > com.google.common.base.Preconditions.checkArgument(Preconditions.java:72) > at > org.apache.hadoop.hdfs.util.StripedBlockUtil$VerticalRange.(StripedBlockUtil.java:701) > at > org.apache.hadoop.hdfs.util.StripedBlockUtil.getRangesForInternalBlocks(StripedBlockUtil.java:442) > at > org.apache.hadoop.hdfs.util.StripedBlockUtil.divideOneStripe(StripedBlockUtil.java:311) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.readOneStripe(DFSStripedInputStream.java:308) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.readWithStrategy(DFSStripedInputStream.java:391) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:813) > at java.io.DataInputStream.read(DataInputStream.java:149) > at > org.apache.hadoop.examples.terasort.TeraInputFormat$TeraRecordReader.nextKeyValue(TeraInputFormat.java:257) > at > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:562) > at > org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80) > at > org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12915) Fix findbugs warning in INodeFile$HeaderFormat.getBlockLayoutRedundancy
[ https://issues.apache.org/jira/browse/HDFS-12915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16300908#comment-16300908 ] SammiChen commented on HDFS-12915: -- bq. IIRC the unit test failed because %02x was printing as a literal, not as a 2-digit hex string using the passed parameter. Replacing an if/throw statement with a call to a third-party library seems unnecessary. If it's not cleaner in this case then its appeal, even aesthetically, is limited... [~chris.douglas], thanks for the further explanation, I'm clear now. Also thanks for the patch! I didn't realize it before. bq. On a second thought, I think using ecPolicyID alone is sufficient, so that we can eliminate blockType as parameter. [~eddyxu], functionally I agree. While I would suggest to keep the {{blockType}} for code readability. "0" {{ecPolicyID}} means continuous file may confuse someone if he/she doesn't know the background. > Fix findbugs warning in INodeFile$HeaderFormat.getBlockLayoutRedundancy > --- > > Key: HDFS-12915 > URL: https://issues.apache.org/jira/browse/HDFS-12915 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.0.0 >Reporter: Wei-Chiu Chuang > Attachments: HDFS-12915.00.patch, HDFS-12915.01.patch > > > It seems HDFS-12840 creates a new findbugs warning. > Possible null pointer dereference of replication in > org.apache.hadoop.hdfs.server.namenode.INodeFile$HeaderFormat.getBlockLayoutRedundancy(BlockType, > Short, Byte) > Bug type NP_NULL_ON_SOME_PATH (click for details) > In class org.apache.hadoop.hdfs.server.namenode.INodeFile$HeaderFormat > In method > org.apache.hadoop.hdfs.server.namenode.INodeFile$HeaderFormat.getBlockLayoutRedundancy(BlockType, > Short, Byte) > Value loaded from replication > Dereferenced at INodeFile.java:[line 210] > Known null at INodeFile.java:[line 207] > From a quick look at the patch, it seems bogus though. [~eddyxu][~Sammi] > would you please double check? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12915) Fix findbugs warning in INodeFile$HeaderFormat.getBlockLayoutRedundancy
[ https://issues.apache.org/jira/browse/HDFS-12915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16294575#comment-16294575 ] SammiChen commented on HDFS-12915: -- Thanks [~jojochuang] for reporting and working on it. Back to the review period of HDFS-12840, I do double checked the findbugs report and find nothing suspicious. It's a false alert. I think Eddy did the same check. I dived a little deeper this time. This piece of code which will trigger findbugs to alert warning. {noformat} Preconditions.checkArgument(replication != null && replication >= 0 && replication <= MAX_REDUNDANCY, "Invalid replication value " + replication); {noformat} This piece of code will not trigger warning. {noformat} Preconditions.checkArgument(replication != null && replication >= 0 && replication <= MAX_REDUNDANCY, "Invalid replication value " + replication); {noformat} The only difference is the condition clause is in one line in the second code. When the condition clause is separated into two lines, findbugs cannot handle the case correctly, will trigger the false alert. And [~chris.douglas], Preconditions does support formatted strings. So basically I think Preconditions is very useful and neat to check parameters. But if the check condition is complex, to not trigger the annoying findbugs warning, a traditional {{if()}} plus {{throw}} statement seems more fit. By the way, the last update of the findbugs web site is Mar. 2015. Seems it's lack of maintenance these days. For the patch, the following statement is not appropriate. If {{erasureCodingPolicyID}} is null and {{blockType}} is stripped, it should throw exception. {{REPLICATION_POLICY_ID}} is a special EC policy. It represents the "3 replica" scheme. File with this policy is effectively 3 replica file, not EC file. {noformat} if (null == erasureCodingPolicyID) { erasureCodingPolicyID = REPLICATION_POLICY_ID; } {noformat} > Fix findbugs warning in INodeFile$HeaderFormat.getBlockLayoutRedundancy > --- > > Key: HDFS-12915 > URL: https://issues.apache.org/jira/browse/HDFS-12915 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.0.0 >Reporter: Wei-Chiu Chuang > Attachments: HDFS-12915.00.patch, HDFS-12915.01.patch > > > It seems HDFS-12840 creates a new findbugs warning. > Possible null pointer dereference of replication in > org.apache.hadoop.hdfs.server.namenode.INodeFile$HeaderFormat.getBlockLayoutRedundancy(BlockType, > Short, Byte) > Bug type NP_NULL_ON_SOME_PATH (click for details) > In class org.apache.hadoop.hdfs.server.namenode.INodeFile$HeaderFormat > In method > org.apache.hadoop.hdfs.server.namenode.INodeFile$HeaderFormat.getBlockLayoutRedundancy(BlockType, > Short, Byte) > Value loaded from replication > Dereferenced at INodeFile.java:[line 210] > Known null at INodeFile.java:[line 207] > From a quick look at the patch, it seems bogus though. [~eddyxu][~Sammi] > would you please double check? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12840) Creating a file with non-default EC policy in a EC zone is not correctly serialized in the editlog
[ https://issues.apache.org/jira/browse/HDFS-12840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16281379#comment-16281379 ] SammiChen commented on HDFS-12840: -- Thanks [~eddyxu] for the contribution! The latest patch LGTM and +1. Please double check the style issues before check-in. > Creating a file with non-default EC policy in a EC zone is not correctly > serialized in the editlog > -- > > Key: HDFS-12840 > URL: https://issues.apache.org/jira/browse/HDFS-12840 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0-beta1 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu >Priority: Blocker > Labels: hdfs-ec-3.0-must-do > Attachments: HDFS-12840.00.patch, HDFS-12840.01.patch, > HDFS-12840.02.patch, HDFS-12840.03.patch, HDFS-12840.04.patch, > HDFS-12840.05.patch, HDFS-12840.reprod.patch, editsStored, editsStored, > editsStored.03, editsStored.05 > > > When create a replicated file in an existing EC zone, the edit logs does not > differentiate it from an EC file. When {{FSEditLogLoader}} to replay edits, > this file is treated as EC file, as a results, it crashes the NN because the > blocks of this file are replicated, which does not match with {{INode}}. > {noformat} > ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered > exception on operation AddBlockOp [path=/system/balancer.id, > penultimateBlock=NULL, lastBlock=blk_1073743259_2455, RpcClientId=, > RpcCallId=-2] > java.lang.IllegalArgumentException: reportedBlock is not striped > at > com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoStriped.addStorage(BlockInfoStriped.java:118) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.addBlock(DatanodeStorageInfo.java:256) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addStoredBlock(BlockManager.java:3141) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addStoredBlockUnderConstruction(BlockManager.java:3068) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processAndHandleReportedBlock(BlockManager.java:3864) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processQueuedMessages(BlockManager.java:2916) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processQueuedMessagesForBlock(BlockManager.java:2903) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.addNewBlock(FSEditLogLoader.java:1069) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:532) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:882) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:863) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:293) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:380) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:397) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12896) when set replicate EC policy for a directory or file,it's EC policy cannot be querying by getPolicy command.
[ https://issues.apache.org/jira/browse/HDFS-12896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16281229#comment-16281229 ] SammiChen commented on HDFS-12896: -- Hi [~candychencan], HDFS-12308 is about to provide a new API other than {{getErasureCodingPolicy}} to return the effective EC policy. > when set replicate EC policy for a directory or file,it's EC policy cannot be > querying by getPolicy command. > > > Key: HDFS-12896 > URL: https://issues.apache.org/jira/browse/HDFS-12896 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: chencan > > When i set replicate EC policy for ecDir,then query it by getPolicy,it return > ‘The erasure coding policy of /ecDir is unspecified', as follow. > [root@master bin]# hdfs dfs -mkdir /ecDir > [root@master bin]# hdfs ec -setPolicy -path /ecDir -replicate > Set erasure coding policy replication on /ecDir > [root@master bin]# hdfs ec -getPolicy -path /ecDir > The erasure coding policy of /ecDir is unspecified -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12896) when set replicate EC policy for a directory or file,it's EC policy cannot be querying by getPolicy command.
[ https://issues.apache.org/jira/browse/HDFS-12896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16279723#comment-16279723 ] SammiChen commented on HDFS-12896: -- Hi [~candychencan], sure, welcome to contribute to the community! Besides, can you resolve this JIRA? > when set replicate EC policy for a directory or file,it's EC policy cannot be > querying by getPolicy command. > > > Key: HDFS-12896 > URL: https://issues.apache.org/jira/browse/HDFS-12896 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: chencan > > When i set replicate EC policy for ecDir,then query it by getPolicy,it return > ‘The erasure coding policy of /ecDir is unspecified', as follow. > [root@master bin]# hdfs dfs -mkdir /ecDir > [root@master bin]# hdfs ec -setPolicy -path /ecDir -replicate > Set erasure coding policy replication on /ecDir > [root@master bin]# hdfs ec -getPolicy -path /ecDir > The erasure coding policy of /ecDir is unspecified -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12840) Creating a file with non-default EC policy in a EC zone is not correctly serialized in the editlog
[ https://issues.apache.org/jira/browse/HDFS-12840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16279711#comment-16279711 ] SammiChen commented on HDFS-12840: -- Hi [~eddyxu], TestOfflineEditsViewer with editsStored.03 can be passed after I run each test function manually. bq. So that it can handle the editslog which has no ERASURE_CODING_POLICY_ID field. Not quite understand the proposal. For policy ID, currently 1~63 is used for system built-in policy. 64~127 is allocated for user defined policy. 0 is not used now. Let's focus on the fix itself and get it in timely. We can discuss later about the desired replication policy ID. > Creating a file with non-default EC policy in a EC zone is not correctly > serialized in the editlog > -- > > Key: HDFS-12840 > URL: https://issues.apache.org/jira/browse/HDFS-12840 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0-beta1 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu >Priority: Blocker > Labels: hdfs-ec-3.0-must-do > Attachments: HDFS-12840.00.patch, HDFS-12840.01.patch, > HDFS-12840.02.patch, HDFS-12840.03.patch, HDFS-12840.04.patch, > HDFS-12840.reprod.patch, editsStored, editsStored, editsStored.03 > > > When create a replicated file in an existing EC zone, the edit logs does not > differentiate it from an EC file. When {{FSEditLogLoader}} to replay edits, > this file is treated as EC file, as a results, it crashes the NN because the > blocks of this file are replicated, which does not match with {{INode}}. > {noformat} > ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered > exception on operation AddBlockOp [path=/system/balancer.id, > penultimateBlock=NULL, lastBlock=blk_1073743259_2455, RpcClientId=, > RpcCallId=-2] > java.lang.IllegalArgumentException: reportedBlock is not striped > at > com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoStriped.addStorage(BlockInfoStriped.java:118) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.addBlock(DatanodeStorageInfo.java:256) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addStoredBlock(BlockManager.java:3141) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addStoredBlockUnderConstruction(BlockManager.java:3068) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processAndHandleReportedBlock(BlockManager.java:3864) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processQueuedMessages(BlockManager.java:2916) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processQueuedMessagesForBlock(BlockManager.java:2903) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.addNewBlock(FSEditLogLoader.java:1069) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:532) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:882) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:863) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:293) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:380) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:397) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12896) when set replicate EC policy for a directory or file,it's EC policy cannot be querying by getPolicy command.
[ https://issues.apache.org/jira/browse/HDFS-12896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16279618#comment-16279618 ] SammiChen commented on HDFS-12896: -- Hi [~candychencan], thanks for reporting this! Currently it's the designed behavior to not return the special replicate EC policy when query. HDFS-12308 is tracked to implement the function to return effective EC policy. > when set replicate EC policy for a directory or file,it's EC policy cannot be > querying by getPolicy command. > > > Key: HDFS-12896 > URL: https://issues.apache.org/jira/browse/HDFS-12896 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: chencan > > When i set replicate EC policy for ecDir,then query it by getPolicy,it return > ‘The erasure coding policy of /ecDir is unspecified', as follow. > [root@master bin]# hdfs dfs -mkdir /ecDir > [root@master bin]# hdfs ec -setPolicy -path /ecDir -replicate > Set erasure coding policy replication on /ecDir > [root@master bin]# hdfs ec -getPolicy -path /ecDir > The erasure coding policy of /ecDir is unspecified -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12840) Creating a file with non-default EC policy in a EC zone is not correctly serialized in the editlog
[ https://issues.apache.org/jira/browse/HDFS-12840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16278150#comment-16278150 ] SammiChen commented on HDFS-12840: -- Thanks [~eddyxu] ! The latest patch looks overall good. 1. {{addFileForEditLog}} in {{FsDirWriteFileOp}} bq. ErasureCodingPolicy ecPolicy = null; the variable declaration can be in scope of {{isStriped}} 2. TestOfflineEditsViewer fails locally with editsStored.03 The current solution will append a "ERASURE_CODING_POLICY_ID" with value "63" to each "OP_ADD" operation. do you think a "0" value for the "replication policy ID" is more appropriate given this case? > Creating a file with non-default EC policy in a EC zone is not correctly > serialized in the editlog > -- > > Key: HDFS-12840 > URL: https://issues.apache.org/jira/browse/HDFS-12840 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0-beta1 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu >Priority: Blocker > Labels: hdfs-ec-3.0-must-do > Attachments: HDFS-12840.00.patch, HDFS-12840.01.patch, > HDFS-12840.02.patch, HDFS-12840.03.patch, HDFS-12840.04.patch, > HDFS-12840.reprod.patch, editsStored, editsStored, editsStored.03 > > > When create a replicated file in an existing EC zone, the edit logs does not > differentiate it from an EC file. When {{FSEditLogLoader}} to replay edits, > this file is treated as EC file, as a results, it crashes the NN because the > blocks of this file are replicated, which does not match with {{INode}}. > {noformat} > ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered > exception on operation AddBlockOp [path=/system/balancer.id, > penultimateBlock=NULL, lastBlock=blk_1073743259_2455, RpcClientId=, > RpcCallId=-2] > java.lang.IllegalArgumentException: reportedBlock is not striped > at > com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoStriped.addStorage(BlockInfoStriped.java:118) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.addBlock(DatanodeStorageInfo.java:256) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addStoredBlock(BlockManager.java:3141) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addStoredBlockUnderConstruction(BlockManager.java:3068) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processAndHandleReportedBlock(BlockManager.java:3864) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processQueuedMessages(BlockManager.java:2916) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processQueuedMessagesForBlock(BlockManager.java:2903) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.addNewBlock(FSEditLogLoader.java:1069) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:532) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:882) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:863) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:293) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:380) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:397) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12840) Creating a replicated file in a EC zone does not correctly serialized in EditLogs
[ https://issues.apache.org/jira/browse/HDFS-12840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16272270#comment-16272270 ] SammiChen commented on HDFS-12840: -- Hi Eddy, thanks for working on it. some comments here, 1. {{REPLICATION_POLICY_ID}} is defined in {{ErasureCodeConstants}} already with value 63. Suggest reuse it. 2. {{TestRetryCacheWithHA}}, 40 instead of 41. bq. assertEquals("Retry cache size is wrong", 41, cacheSet.size()); > Creating a replicated file in a EC zone does not correctly serialized in > EditLogs > - > > Key: HDFS-12840 > URL: https://issues.apache.org/jira/browse/HDFS-12840 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0-beta1 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu >Priority: Blocker > Labels: hdfs-ec-3.0-must-do > Attachments: HDFS-12840.00.patch, HDFS-12840.01.patch, > HDFS-12840.02.patch, HDFS-12840.reprod.patch, editsStored, editsStored > > > When create a replicated file in an existing EC zone, the edit logs does not > differentiate it from an EC file. When {{FSEditLogLoader}} to replay edits, > this file is treated as EC file, as a results, it crashes the NN because the > blocks of this file are replicated, which does not match with {{INode}}. > {noformat} > ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered > exception on operation AddBlockOp [path=/system/balancer.id, > penultimateBlock=NULL, lastBlock=blk_1073743259_2455, RpcClientId=, > RpcCallId=-2] > java.lang.IllegalArgumentException: reportedBlock is not striped > at > com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoStriped.addStorage(BlockInfoStriped.java:118) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.addBlock(DatanodeStorageInfo.java:256) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addStoredBlock(BlockManager.java:3141) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addStoredBlockUnderConstruction(BlockManager.java:3068) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processAndHandleReportedBlock(BlockManager.java:3864) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processQueuedMessages(BlockManager.java:2916) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processQueuedMessagesForBlock(BlockManager.java:2903) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.addNewBlock(FSEditLogLoader.java:1069) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:532) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:882) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:863) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:293) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:380) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:397) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12840) Creating a replicated file in a EC zone does not correctly serialized in EditLogs
[ https://issues.apache.org/jira/browse/HDFS-12840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16264935#comment-16264935 ] SammiChen commented on HDFS-12840: -- Hi [~eddyxu], thanks for reporting and fix this. I'm not able to apply the 01.patch locally against latest trunk code while 00.patch is OK to apply. Can you double check if the patch format is correct? > Creating a replicated file in a EC zone does not correctly serialized in > EditLogs > - > > Key: HDFS-12840 > URL: https://issues.apache.org/jira/browse/HDFS-12840 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0-beta1 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu >Priority: Blocker > Labels: hdfs-ec-3.0-must-do > Attachments: HDFS-12840.00.patch, HDFS-12840.01.patch, > HDFS-12840.reprod.patch, editsStored > > > When create a replicated file in an existing EC zone, the edit logs does not > differentiate it from an EC file. When {{FSEditLogLoader}} to replay edits, > this file is treated as EC file, as a results, it crashes the NN because the > blocks of this file are replicated, which does not match with {{INode}}. > {noformat} > ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered > exception on operation AddBlockOp [path=/system/balancer.id, > penultimateBlock=NULL, lastBlock=blk_1073743259_2455, RpcClientId=, > RpcCallId=-2] > java.lang.IllegalArgumentException: reportedBlock is not striped > at > com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoStriped.addStorage(BlockInfoStriped.java:118) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.addBlock(DatanodeStorageInfo.java:256) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addStoredBlock(BlockManager.java:3141) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addStoredBlockUnderConstruction(BlockManager.java:3068) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processAndHandleReportedBlock(BlockManager.java:3864) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processQueuedMessages(BlockManager.java:2916) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processQueuedMessagesForBlock(BlockManager.java:2903) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.addNewBlock(FSEditLogLoader.java:1069) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:532) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:882) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:863) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:293) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:380) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:397) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12682) ECAdmin -listPolicies will always show SystemErasureCodingPolicies state as DISABLED
[ https://issues.apache.org/jira/browse/HDFS-12682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16226598#comment-16226598 ] SammiChen commented on HDFS-12682: -- HI [~xiaochen], thanks for the update. The patch looks overall good. Minor nits: {{toStringWithState}} in {{ErasureCodingPolicy}} is duplicate with {{toString}} and is not used. I'm thinking besides {{getErasureCodingPolicies}}, do we also need to return {{ErasureCodingPolicyInfo}} in the response of {{addErasureCodingPolicies}}? From API's semantics, return {{ErasureCodingPolicyInfo}} seems more fit. But I'm wonder would that provide more benefit to end users for this API, maybe current {{ErasureCodingPolicy}} is already enough. > ECAdmin -listPolicies will always show SystemErasureCodingPolicies state as > DISABLED > > > Key: HDFS-12682 > URL: https://issues.apache.org/jira/browse/HDFS-12682 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Reporter: Xiao Chen >Assignee: Xiao Chen >Priority: Blocker > Labels: hdfs-ec-3.0-must-do > Attachments: HDFS-12682.01.patch, HDFS-12682.02.patch, > HDFS-12682.03.patch, HDFS-12682.04.patch, HDFS-12682.05.patch > > > On a real cluster, {{hdfs ec -listPolicies}} will always show policy state as > DISABLED. > {noformat} > [hdfs@nightly6x-1 root]$ hdfs ec -listPolicies > Erasure Coding Policies: > ErasureCodingPolicy=[Name=RS-10-4-1024k, Schema=[ECSchema=[Codec=rs, > numDataUnits=10, numParityUnits=4]], CellSize=1048576, Id=5, State=DISABLED] > ErasureCodingPolicy=[Name=RS-3-2-1024k, Schema=[ECSchema=[Codec=rs, > numDataUnits=3, numParityUnits=2]], CellSize=1048576, Id=2, State=DISABLED] > ErasureCodingPolicy=[Name=RS-6-3-1024k, Schema=[ECSchema=[Codec=rs, > numDataUnits=6, numParityUnits=3]], CellSize=1048576, Id=1, State=DISABLED] > ErasureCodingPolicy=[Name=RS-LEGACY-6-3-1024k, > Schema=[ECSchema=[Codec=rs-legacy, numDataUnits=6, numParityUnits=3]], > CellSize=1048576, Id=3, State=DISABLED] > ErasureCodingPolicy=[Name=XOR-2-1-1024k, Schema=[ECSchema=[Codec=xor, > numDataUnits=2, numParityUnits=1]], CellSize=1048576, Id=4, State=DISABLED] > [hdfs@nightly6x-1 root]$ hdfs ec -getPolicy -path /ecec > XOR-2-1-1024k > {noformat} > This is because when [deserializing > protobuf|https://github.com/apache/hadoop/blob/branch-3.0.0-beta1/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java#L2942], > the static instance of [SystemErasureCodingPolicies > class|https://github.com/apache/hadoop/blob/branch-3.0.0-beta1/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/SystemErasureCodingPolicies.java#L101] > is first checked, and always returns the cached policy objects, which are > created by default with state=DISABLED. > All the existing unit tests pass, because that static instance that the > client (e.g. ECAdmin) reads in unit test is updated by NN. :) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12686) Erasure coding system policy state is not correctly saved and loaded during real cluster restart
[ https://issues.apache.org/jira/browse/HDFS-12686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16217973#comment-16217973 ] SammiChen commented on HDFS-12686: -- Hi [~jojochuang], this JIRA is closely related with HDFS-12682. I was plan to work on it after HDFS-12682 is committed. Thanks [~xiaochen] for taking care it together in HDFS-12682. > Erasure coding system policy state is not correctly saved and loaded during > real cluster restart > > > Key: HDFS-12686 > URL: https://issues.apache.org/jira/browse/HDFS-12686 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0-beta1 >Reporter: SammiChen >Assignee: SammiChen >Priority: Blocker > Labels: hdfs-ec-3.0-must-do > > Inspired by HDFS-12682, I found the system erasure coding policy state will > not be correctly saved and loaded in a real cluster. Through there are such > kind of unit tests and all are passed with MiniCluster. It's because the > MiniCluster keeps the same static system erasure coding policy object after > the NN restart operation. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-12046) Hadoop CRC implementation using Intel ISA-L library
[ https://issues.apache.org/jira/browse/HDFS-12046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SammiChen reassigned HDFS-12046: Assignee: SammiChen (was: luhuichun) > Hadoop CRC implementation using Intel ISA-L library > --- > > Key: HDFS-12046 > URL: https://issues.apache.org/jira/browse/HDFS-12046 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: luhuichun >Assignee: SammiChen > Attachments: HDFS-12046-001.patch, ISA-L CRC Performance Report using > intel ISA-L.pdf > > > Intel ISA-L open source library provides set of highly optimized functions > for RAID, erasure code, CRC, cryptographic hash, encryption, and compression. > Ref. https://github.com/01org/isa-l. HDFS-EC has already integrated ISA-L and > added the necessary building options support for Hadoop. For Hadoop CRC, we > recently explored more, developing a Hadoop CRC using Intel ISA-L, performing > a test on Broadwell and Skylake servers, comparing the performance against > Hadoop native CRC. On Broadwell/Skylake, ISA-L CRC has about 8%~ performance > gain over Hadoop native CRC. We suggest adding a new Hadoop native CRC using > the ISA-L library, the extra advantage is it’s already optimized when we > upgrade to new servers and Hadoop developers don’t have to maintain their own > bunch of ASM codes. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12682) ECAdmin -listPolicies will always show policy state as DISABLED
[ https://issues.apache.org/jira/browse/HDFS-12682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210790#comment-16210790 ] SammiChen commented on HDFS-12682: -- Hi [~xiaochen], thanks for reporting this issue. Inspired by your discovery, I found the same issue exists in system EC persist into and load from fsImage (HFDS-12686). The current convertErasureCodingPolicy function is perfect in most cases. For special cases, like get all erasure coding policy and persist policy into fsImage, I think we need a new edition for full convert. {quote} The problem I see from HDFS-12258's implementation though, is the mutable ECPS is saved on the immutable ECP, breaking assumptions such as shared single instance policy. At the same time the policy is still not persisted independently. I think ECPS is highly dependent on the missing piece from HDFS-7337: policies are not persisted to NN metadata. The state of whether a policy is enabled could be persisted together with the policy, without impacting HDFSFileStatus. {quote} Persist ec policies is implemented in HDFS-7337. {quote} I think this bug (HDFS-12682) and HDFS-12258 would make more sense if we could first persist policies to NN metadata. Would also be helpful to separate out something like ErasureCodingPolicyAndState for the policy-specific APIs, so the state isn't deserialized onto HDFSFileStatus. {quote} For HDFS-12258, [~zhouwei], [~drankye] and I, we discussed and do have two different approaches when we first think about how to implement it. One is the current implemented approach, which add one extra "state" field in the existing ECP definition. Another is define a new class, something like {{ErasureCodingPolicyWithState}} to hold the EPC and new policy state field. They are almost equally good. The only concern is if we introduce the new {{ErasureCodingPolicyWithState}}, it may introduce complexity to API interfaces, and to end users. There are multiple EC related APIs. If we return {{ErasureCodingPolicyWithState}} for {{getAllErasureCodingPolicies}} , should we return {{ErasureCodingPolicyWithState}} or {{ErasureCodingPolicy}} for {{getErasureCodingPolicy}}? something like that. Also is it worth to introduce a new class definition in Hadoop which only has 1 extra new field? After all the considerations, the current approach is chosen to leverage the existing ECP. Please let me know if you have other concerns. Thanks! > ECAdmin -listPolicies will always show policy state as DISABLED > --- > > Key: HDFS-12682 > URL: https://issues.apache.org/jira/browse/HDFS-12682 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Reporter: Xiao Chen >Assignee: Xiao Chen > Labels: hdfs-ec-3.0-must-do > > On a real cluster, {{hdfs ec -listPolicies}} will always show policy state as > DISABLED. > {noformat} > [hdfs@nightly6x-1 root]$ hdfs ec -listPolicies > Erasure Coding Policies: > ErasureCodingPolicy=[Name=RS-10-4-1024k, Schema=[ECSchema=[Codec=rs, > numDataUnits=10, numParityUnits=4]], CellSize=1048576, Id=5, State=DISABLED] > ErasureCodingPolicy=[Name=RS-3-2-1024k, Schema=[ECSchema=[Codec=rs, > numDataUnits=3, numParityUnits=2]], CellSize=1048576, Id=2, State=DISABLED] > ErasureCodingPolicy=[Name=RS-6-3-1024k, Schema=[ECSchema=[Codec=rs, > numDataUnits=6, numParityUnits=3]], CellSize=1048576, Id=1, State=DISABLED] > ErasureCodingPolicy=[Name=RS-LEGACY-6-3-1024k, > Schema=[ECSchema=[Codec=rs-legacy, numDataUnits=6, numParityUnits=3]], > CellSize=1048576, Id=3, State=DISABLED] > ErasureCodingPolicy=[Name=XOR-2-1-1024k, Schema=[ECSchema=[Codec=xor, > numDataUnits=2, numParityUnits=1]], CellSize=1048576, Id=4, State=DISABLED] > [hdfs@nightly6x-1 root]$ hdfs ec -getPolicy -path /ecec > XOR-2-1-1024k > {noformat} > This is because when [deserializing > protobuf|https://github.com/apache/hadoop/blob/branch-3.0.0-beta1/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java#L2942], > the static instance of [SystemErasureCodingPolicies > class|https://github.com/apache/hadoop/blob/branch-3.0.0-beta1/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/SystemErasureCodingPolicies.java#L101] > is first checked, and always returns the cached policy objects, which are > created by default with state=DISABLED. > All the existing unit tests pass, because that static instance that the > client (e.g. ECAdmin) reads in unit test is updated by NN. :) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail:
[jira] [Created] (HDFS-12686) Erasure coding system policy state is not correctly saved and loaded during real cluster restart
SammiChen created HDFS-12686: Summary: Erasure coding system policy state is not correctly saved and loaded during real cluster restart Key: HDFS-12686 URL: https://issues.apache.org/jira/browse/HDFS-12686 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0-beta1 Reporter: SammiChen Assignee: SammiChen Inspired by HDFS-12682, I found the system erasure coding policy state will not be correctly saved and loaded in a real cluster. Through there are such kind of unit tests and all are passed with MiniCluster. It's because the MiniCluster keeps the same static system erasure coding policy object after the NN restart operation. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7337) Configurable and pluggable erasure codec and policy
[ https://issues.apache.org/jira/browse/HDFS-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210640#comment-16210640 ] SammiChen commented on HDFS-7337: - Hi [~xiaochen], HDFS-7859 is for persist EC policy in protobuffer fsImage, and HDFS-12395 is for support EC policy API in edit log. Thanks [~rakeshr]! > Configurable and pluggable erasure codec and policy > --- > > Key: HDFS-7337 > URL: https://issues.apache.org/jira/browse/HDFS-7337 > Project: Hadoop HDFS > Issue Type: New Feature > Components: erasure-coding >Reporter: Zhe Zhang >Assignee: SammiChen >Priority: Critical > Labels: hdfs-ec-3.0-nice-to-have > Fix For: 3.0.0-beta1 > > Attachments: HDFS-7337-prototype-v1.patch, > HDFS-7337-prototype-v2.zip, HDFS-7337-prototype-v3.zip, PluggableErasureCodec > v4.pdf, PluggableErasureCodec-v2.pdf, PluggableErasureCodec-v3.pdf, > PluggableErasureCodec.pdf > > > According to HDFS-7285 and the design, this considers to support multiple > Erasure Codecs via pluggable approach. It allows to define and configure > multiple codec schemas with different coding algorithms and parameters. The > resultant codec schemas can be utilized and specified via command tool for > different file folders. While design and implement such pluggable framework, > it’s also to implement a concrete codec by default (Reed Solomon) to prove > the framework is useful and workable. Separate JIRA could be opened for the > RS codec implementation. > Note HDFS-7353 will focus on the very low level codec API and implementation > to make concrete vendor libraries transparent to the upper layer. This JIRA > focuses on high level stuffs that interact with configuration, schema and etc. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12613) Native EC coder should implement release() as idempotent function.
[ https://issues.apache.org/jira/browse/HDFS-12613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16206917#comment-16206917 ] SammiChen commented on HDFS-12613: -- HDFS-12672 is fired for track. Will prepare a Window environment later to verify it. > Native EC coder should implement release() as idempotent function. > -- > > Key: HDFS-12613 > URL: https://issues.apache.org/jira/browse/HDFS-12613 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding >Affects Versions: 3.0.0-beta1 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu > Attachments: HDFS-12613.00.patch, HDFS-12613.01.patch, > HDFS-12613.02.patch, HDFS-12613.03.patch, HDFS-12613.04.patch > > > Recently, we found native EC coder crashes JVM because > {{NativeRSDecoder#release()}} being called multiple times (HDFS-12612 and > HDFS-12606). > We should strength the implement the native code to make {{release()}} > idempotent as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-12672) Verify erasure coding native code on Windows platform
[ https://issues.apache.org/jira/browse/HDFS-12672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SammiChen reassigned HDFS-12672: Assignee: SammiChen > Verify erasure coding native code on Windows platform > - > > Key: HDFS-12672 > URL: https://issues.apache.org/jira/browse/HDFS-12672 > Project: Hadoop HDFS > Issue Type: Task >Reporter: SammiChen >Assignee: SammiChen > Labels: hdfs-ec-3.0-nice-to-have > > Recently there is some change in erasure coding native code to fix some known > issues. It's better to verify the code change on Window platform also. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-12672) Verify erasure coding native code on Windows platform
SammiChen created HDFS-12672: Summary: Verify erasure coding native code on Windows platform Key: HDFS-12672 URL: https://issues.apache.org/jira/browse/HDFS-12672 Project: Hadoop HDFS Issue Type: Task Reporter: SammiChen Recently there is some change in erasure coding native code to fix some known issues. It's better to verify the code change on Window platform also. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11467) Support ErasureCoding section in OIV XML/ReverseXML
[ https://issues.apache.org/jira/browse/HDFS-11467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16205451#comment-16205451 ] SammiChen commented on HDFS-11467: -- Thanks [~HuafengWang] for working on it! 1. Suggest move the new private {{convertErasureCodingPolicy}} function to {{PBHelperClient}}, and rename it to distinguish with the current {{convertErasureCodingPolicy}} function. 2. A schema is a must-have to a policy, so would handle the case when schema is null. Can use Preconditions to check it's null or not. Also should handle the case when there is no policy found in the section. 3. If "extraOptions" is not NULL, persist it. 4. Extra unit test is preferred. > Support ErasureCoding section in OIV XML/ReverseXML > --- > > Key: HDFS-11467 > URL: https://issues.apache.org/jira/browse/HDFS-11467 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: tools >Affects Versions: 3.0.0-alpha4 >Reporter: Wei-Chiu Chuang >Assignee: Huafeng Wang > Labels: hdfs-ec-3.0-must-do > Attachments: HDFS-11467.001.patch > > > As discussed in HDFS-7859, after ErasureCoding section is added into fsimage, > we would like to also support exporting this section into an XML back and > forth using the OIV tool. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12575) Improve test coverage for EC related edit logs ops
[ https://issues.apache.org/jira/browse/HDFS-12575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16201668#comment-16201668 ] SammiChen commented on HDFS-12575: -- Hi [~eddyxu], sure, I will work on it. I'm not clear about the detail steps to carry out the "Replay edits after checkpoint" and "Apply edits on SNN". Can you help to give some hit? Also by SNN, you mean both secondary namenode and standby namenode, right? > Improve test coverage for EC related edit logs ops > -- > > Key: HDFS-12575 > URL: https://issues.apache.org/jira/browse/HDFS-12575 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding >Affects Versions: 3.0.0-alpha4 >Reporter: Lei (Eddy) Xu >Assignee: SammiChen > Labels: hdfs-ec-3.0-nice-to-have > > HDFS-12569 found that we have little test coverage for edit logs ops of > erasure coding. > And we've seen the following bug bring down SNN in our test environments: > {code} > 6:42:18.177 AMERROR FSEditLogLoader > Encountered exception on operation AddBlockOp [path=/tmp/foo/bar, > penultimateBlock=NULL, lastBlock=blk_1073743386_69322, RpcClientId=, > RpcCallId=-2] > java.lang.IllegalArgumentException: reportedBlock is not striped > at > com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) > at > > 6:42:18.190 AMFATAL EditLogTailer > Unknown error encountered while tailing edits. Shutting down standby NN. > java.io.IOException: java.lang.IllegalArgumentException: reportedBlock is not > striped > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:251) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:150) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:882) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:863) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:293) > at > {code} > We should add coverage for these important edit logs, i.e., set/unset policy, > enable/remove policies and etc are correctly persisted in edit logs, and test > the scenarios like: > * Restart NN > * Replay edits after checkpoint > * Apply edits on SNN. > * and etc. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12613) Native EC coder should implement release() as idempotent function.
[ https://issues.apache.org/jira/browse/HDFS-12613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16201395#comment-16201395 ] SammiChen commented on HDFS-12613: -- Hi [~eddyxu], agree. Check NULL pointer in native code is a must-have. Check NULL pointer at JAVA level is a nice-to-have to avoid one JNI call. > Native EC coder should implement release() as idempotent function. > -- > > Key: HDFS-12613 > URL: https://issues.apache.org/jira/browse/HDFS-12613 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding >Affects Versions: 3.0.0-beta1 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu > Attachments: HDFS-12613.00.patch, HDFS-12613.01.patch, > HDFS-12613.02.patch > > > Recently, we found native EC coder crashes JVM because > {{NativeRSDecoder#release()}} being called multiple times (HDFS-12612 and > HDFS-12606). > We should strength the implement the native code to make {{release()}} > idempotent as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12613) Native EC coder should implement release() as idempotent function.
[ https://issues.apache.org/jira/browse/HDFS-12613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16199928#comment-16199928 ] SammiChen commented on HDFS-12613: -- Hi [~eddyxu], thanks for reporting and working on it. 1. Apart from add {{synchronized}} on {{release}}, {{performEncodeImpl}} and {{performDecodeImpl}} can also have the {{synchronized}} keyword 2. I see you add the NULL check of {{nativeCoder}} in native code. We can also check it's status in JAVA code. If it's already null, we don't need to call the native code through JNI. > Native EC coder should implement release() as idempotent function. > -- > > Key: HDFS-12613 > URL: https://issues.apache.org/jira/browse/HDFS-12613 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding >Affects Versions: 3.0.0-beta1 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu > Attachments: HDFS-12613.00.patch, HDFS-12613.01.patch > > > Recently, we found native EC coder crashes JVM because > {{NativeRSDecoder#release()}} being called multiple times (HDFS-12612 and > HDFS-12606). > We should strength the implement the native code to make {{release()}} > idempotent as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12569) Unset EC policy logs empty payload in edit log
[ https://issues.apache.org/jira/browse/HDFS-12569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16186826#comment-16186826 ] SammiChen commented on HDFS-12569: -- Hi Andrew, Next week is PRC National Day. We will all take 1 week vocation (1st Oct. ~ 8th Oct.) at least. Assume delayed email response during this period. If any task fits or we can help, just ping or assign to us, we will take them over after the vocation. Bests, Sammi > Unset EC policy logs empty payload in edit log > -- > > Key: HDFS-12569 > URL: https://issues.apache.org/jira/browse/HDFS-12569 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0-alpha4 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu >Priority: Blocker > Attachments: HDFS-12569.00.patch > > > The edit log generated by {{hdfs ec -unsetPolicy}} generates an > {{OP_REMOVE_XATTR}} entry in edit logs, but the payload like xattr namespace > / name / vaue are missing: > {code} > > OP_REMOVE_XATTR > > 420481 > / > b098e758-9d7f-48b7-aa91-80ca52133b09 > 0 > > > {code} > As a result, when Active NN restarts, or the Standby NN replay edits, this op > has not effect. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7337) Configurable and pluggable erasure codec and policy
[ https://issues.apache.org/jira/browse/HDFS-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16185280#comment-16185280 ] SammiChen commented on HDFS-7337: - Hi [~andrew.wang], the release note is ready. Is there anything I need to do besides that? > Configurable and pluggable erasure codec and policy > --- > > Key: HDFS-7337 > URL: https://issues.apache.org/jira/browse/HDFS-7337 > Project: Hadoop HDFS > Issue Type: New Feature > Components: erasure-coding >Reporter: Zhe Zhang >Assignee: SammiChen >Priority: Critical > Labels: hdfs-ec-3.0-nice-to-have > Fix For: 3.0.0-beta1 > > Attachments: HDFS-7337-prototype-v1.patch, > HDFS-7337-prototype-v2.zip, HDFS-7337-prototype-v3.zip, > PluggableErasureCodec.pdf, PluggableErasureCodec-v2.pdf, > PluggableErasureCodec-v3.pdf, PluggableErasureCodec v4.pdf > > > According to HDFS-7285 and the design, this considers to support multiple > Erasure Codecs via pluggable approach. It allows to define and configure > multiple codec schemas with different coding algorithms and parameters. The > resultant codec schemas can be utilized and specified via command tool for > different file folders. While design and implement such pluggable framework, > it’s also to implement a concrete codec by default (Reed Solomon) to prove > the framework is useful and workable. Separate JIRA could be opened for the > RS codec implementation. > Note HDFS-7353 will focus on the very low level codec API and implementation > to make concrete vendor libraries transparent to the upper layer. This JIRA > focuses on high level stuffs that interact with configuration, schema and etc. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12497) Re-enable TestDFSStripedOutputStreamWithFailure tests
[ https://issues.apache.org/jira/browse/HDFS-12497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16183560#comment-16183560 ] SammiChen commented on HDFS-12497: -- Thanks Huafeng for taking over the task! > Re-enable TestDFSStripedOutputStreamWithFailure tests > - > > Key: HDFS-12497 > URL: https://issues.apache.org/jira/browse/HDFS-12497 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0-beta1 >Reporter: Andrew Wang >Assignee: Huafeng Wang > Labels: flaky-test, hdfs-ec-3.0-must-do > Attachments: HDFS-12497.001.patch > > > We disabled this suite of tests in HDFS-12417 since they were very flaky. We > should fix these tests and re-enable them. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12399) Improve erasure coding codec framework adding more unit tests
[ https://issues.apache.org/jira/browse/HDFS-12399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16182027#comment-16182027 ] SammiChen commented on HDFS-12399: -- I double checked test case failures. They are not relevant. Ping [~drankye] and [~eddyxu], can you help to review the patch? > Improve erasure coding codec framework adding more unit tests > -- > > Key: HDFS-12399 > URL: https://issues.apache.org/jira/browse/HDFS-12399 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding >Affects Versions: 3.0.0-alpha3 >Reporter: SammiChen >Assignee: SammiChen > Labels: hdfs-ec-3.0-nice-to-have > Attachments: HDFS-12399.000.patch > > > Improve erasure coding codec through add more unit tests -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12497) Re-enable TestDFSStripedOutputStreamWithFailure tests
[ https://issues.apache.org/jira/browse/HDFS-12497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16174379#comment-16174379 ] SammiChen commented on HDFS-12497: -- I see the {{testMultipleDatanodeFailure56}} failure too. By reducing the {{stripesPerBlock}} from 4 to 2, all test*() and testMultipleDatanodeFailure56 run very well locally. But, {{testBlockTokenExpired}} will always fail in this case no matter how long the token lifetime is. If I set the "tokenExpire" to false, then everything is OK. So the failure is token related but not token lifetime related. Need more time to find the root cause. > Re-enable TestDFSStripedOutputStreamWithFailure tests > - > > Key: HDFS-12497 > URL: https://issues.apache.org/jira/browse/HDFS-12497 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0-beta1 >Reporter: Andrew Wang >Assignee: SammiChen > Labels: flaky-test, hdfs-ec-3.0-must-do > > We disabled this suite of tests in HDFS-12417 since they were very flaky. We > should fix these tests and re-enable them. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12449) TestReconstructStripedFile.testNNSendsErasureCodingTasks randomly cannot finish in 60s
[ https://issues.apache.org/jira/browse/HDFS-12449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16172611#comment-16172611 ] SammiChen commented on HDFS-12449: -- Thanks [~eddyxu] for review and commit the patch! > TestReconstructStripedFile.testNNSendsErasureCodingTasks randomly cannot > finish in 60s > -- > > Key: HDFS-12449 > URL: https://issues.apache.org/jira/browse/HDFS-12449 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0-alpha4 >Reporter: SammiChen >Assignee: SammiChen > Labels: flaky-test > Fix For: 3.0.0-beta1 > > Attachments: HDFS-12449.001.patch > > > TestReconstructStripedFile.testNNSendsErasureCodingTasks randomly cannot > finish in 60s > reduce the file size and loop count -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-12497) Re-enable TestDFSStripedOutputStreamWithFailure tests
[ https://issues.apache.org/jira/browse/HDFS-12497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SammiChen reassigned HDFS-12497: Assignee: SammiChen > Re-enable TestDFSStripedOutputStreamWithFailure tests > - > > Key: HDFS-12497 > URL: https://issues.apache.org/jira/browse/HDFS-12497 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0-beta1 >Reporter: Andrew Wang >Assignee: SammiChen > Labels: flaky-test, hdfs-ec-3.0-must-do > > We disabled this suite of tests in HDFS-12417 since they were very flaky. We > should fix these tests and re-enable them. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12447) Refactor addErasureCodingPolicy
[ https://issues.apache.org/jira/browse/HDFS-12447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16171525#comment-16171525 ] SammiChen commented on HDFS-12447: -- Build failed at step "Apache Hadoop Client Packaging Invariants" and "Apache Hadoop Client Packaging Invariants for Test". Not sure why these two module failed. {quote} [INFO] Apache Hadoop Scheduler Load Simulator . SUCCESS [ 6.028 s] [INFO] Apache Hadoop Azure Data Lake support .. SUCCESS [ 4.058 s] [INFO] Apache Hadoop Tools Dist ... SUCCESS [ 1.229 s] [INFO] Apache Hadoop Tools SUCCESS [ 0.029 s] [INFO] Apache Hadoop Client API ... SUCCESS [01:59 min] [INFO] Apache Hadoop Client Runtime ... SUCCESS [01:50 min] [INFO] Apache Hadoop Client Packaging Invariants .. FAILURE [ 1.081 s] [INFO] Apache Hadoop Client Test Minicluster .. SUCCESS [02:24 min] [INFO] Apache Hadoop Client Packaging Invariants for Test . FAILURE [ 0.120 s] [INFO] Apache Hadoop Client Packaging Integration Tests ... SUCCESS [ 1.231 s] [INFO] Apache Hadoop Distribution . SKIPPED [INFO] Apache Hadoop Client Modules ... SUCCESS [ 0.075 s] [INFO] Apache Hadoop Cloud Storage SUCCESS [ 1.091 s] [INFO] Apache Hadoop Cloud Storage Project SUCCESS [ 0.044 s] [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 15:37 min [INFO] Finished at: 2017-09-19T07:49:44+00:00 [INFO] Final Memory: 121M/497M [INFO] [ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.3.1:exec (check-jar-contents) on project hadoop-client-check-invariants: Command execution failed. Process exited with an error: 1 (Exit value: 1) -> [Help 1] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-enforcer-plugin:3.0.0-M1:enforce (enforce-banned-dependencies) on project hadoop-client-check-test-invariants: Some Enforcer rules have failed. Look above for specific messages explaining why the rule failed. -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn -rf :hadoop-client-check-invariants {quote} > Refactor addErasureCodingPolicy > --- > > Key: HDFS-12447 > URL: https://issues.apache.org/jira/browse/HDFS-12447 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: SammiChen >Assignee: SammiChen > Labels: hdfs-ec-3.0-must-do > Attachments: HDFS-12447.001.patch, HDFS-12447.002.patch, > HDFS-12447.003.patch, HDFS-12447.004.patch > > > As a follow on to handle some issues discussed in HDFS-12395, this is to > majorly refactor addErasureCodingPoliy API, change AddECPolicyResponse => > AddErasureCodingPolicyResponse -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12399) Improve erasure coding codec framework adding more unit tests
[ https://issues.apache.org/jira/browse/HDFS-12399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SammiChen updated HDFS-12399: - Attachment: HDFS-12399.000.patch Initial patch. > Improve erasure coding codec framework adding more unit tests > -- > > Key: HDFS-12399 > URL: https://issues.apache.org/jira/browse/HDFS-12399 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding >Affects Versions: 3.0.0-alpha3 >Reporter: SammiChen >Assignee: SammiChen > Labels: hdfs-ec-3.0-nice-to-have > Attachments: HDFS-12399.000.patch > > > Improve erasure coding codec through add more unit tests -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12449) TestReconstructStripedFile.testNNSendsErasureCodingTasks randomly cannot finish in 60s
[ https://issues.apache.org/jira/browse/HDFS-12449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16171236#comment-16171236 ] SammiChen commented on HDFS-12449: -- Hi [~eddyxu], do you have time to take a look at the patch? I double checked failed unit tests, not relevant. > TestReconstructStripedFile.testNNSendsErasureCodingTasks randomly cannot > finish in 60s > -- > > Key: HDFS-12449 > URL: https://issues.apache.org/jira/browse/HDFS-12449 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0-alpha4 >Reporter: SammiChen >Assignee: SammiChen > Labels: flaky-test > Attachments: HDFS-12449.001.patch > > > TestReconstructStripedFile.testNNSendsErasureCodingTasks randomly cannot > finish in 60s > reduce the file size and loop count -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12447) Refactor addErasureCodingPolicy
[ https://issues.apache.org/jira/browse/HDFS-12447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SammiChen updated HDFS-12447: - Attachment: HDFS-12447.004.patch Rebase the patch against trunk. > Refactor addErasureCodingPolicy > --- > > Key: HDFS-12447 > URL: https://issues.apache.org/jira/browse/HDFS-12447 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: SammiChen >Assignee: SammiChen > Labels: hdfs-ec-3.0-must-do > Attachments: HDFS-12447.001.patch, HDFS-12447.002.patch, > HDFS-12447.003.patch, HDFS-12447.004.patch > > > As a follow on to handle some issues discussed in HDFS-12395, this is to > majorly refactor addErasureCodingPoliy API, change AddECPolicyResponse => > AddErasureCodingPolicyResponse -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12395) Support erasure coding policy operations in namenode edit log
[ https://issues.apache.org/jira/browse/HDFS-12395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16171232#comment-16171232 ] SammiChen commented on HDFS-12395: -- Thanks [~kihwal] for the reminder of taking care the NN layout version, I have the same opinion as [~andrew.wang]. And [~brahmareddy], thanks for your advice. I will issue separate JIRA next time in this case. > Support erasure coding policy operations in namenode edit log > - > > Key: HDFS-12395 > URL: https://issues.apache.org/jira/browse/HDFS-12395 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding >Reporter: SammiChen >Assignee: SammiChen > Labels: hdfs-ec-3.0-must-do > Fix For: 3.0.0-beta1 > > Attachments: editsStored, HDFS-12395.001.patch, HDFS-12395.002.patch, > HDFS-12395.003.patch, HDFS-12395.004.patch > > > Support add, remove, disable, enable erasure coding policy operation in edit > log. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12395) Support erasure coding policy operations in namenode edit log
[ https://issues.apache.org/jira/browse/HDFS-12395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16169666#comment-16169666 ] SammiChen commented on HDFS-12395: -- Hi, [~brahmareddy], thanks for the reminder. I will address these two failed cases in HDFS-12460. > Support erasure coding policy operations in namenode edit log > - > > Key: HDFS-12395 > URL: https://issues.apache.org/jira/browse/HDFS-12395 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding >Reporter: SammiChen >Assignee: SammiChen > Labels: hdfs-ec-3.0-must-do > Fix For: 3.0.0-beta1 > > Attachments: editsStored, HDFS-12395.001.patch, HDFS-12395.002.patch, > HDFS-12395.003.patch, HDFS-12395.004.patch > > > Support add, remove, disable, enable erasure coding policy operation in edit > log. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12447) Refactor addErasureCodingPolicy
[ https://issues.apache.org/jira/browse/HDFS-12447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SammiChen updated HDFS-12447: - Attachment: HDFS-12447.002.patch Improve the patch after offline discussion with Kai. > Refactor addErasureCodingPolicy > --- > > Key: HDFS-12447 > URL: https://issues.apache.org/jira/browse/HDFS-12447 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: SammiChen >Assignee: SammiChen > Attachments: HDFS-12447.001.patch, HDFS-12447.002.patch > > > As a follow on to handle some issues discussed in HDFS-12395, this is to > majorly refactor addErasureCodingPoliy API, change AddECPolicyResponse => > AddErasureCodingPolicyResponse -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12460) make addErasureCodingPolicy an idempotent operation
[ https://issues.apache.org/jira/browse/HDFS-12460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SammiChen updated HDFS-12460: - Attachment: HDFS-12460.001.patch Initial patch. > make addErasureCodingPolicy an idempotent operation > --- > > Key: HDFS-12460 > URL: https://issues.apache.org/jira/browse/HDFS-12460 > Project: Hadoop HDFS > Issue Type: Improvement > Components: caching >Reporter: SammiChen >Assignee: SammiChen > Attachments: HDFS-12460.001.patch > > > Make addErasureCodingPolicy an idempotent operation to guarantee after HA > switch, addErasureCodingPolicy edit log can be applied smoothly. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12460) make addErasureCodingPolicy an idempotent operation
[ https://issues.apache.org/jira/browse/HDFS-12460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SammiChen updated HDFS-12460: - Description: Make addErasureCodingPolicy an idempotent operation to guarantee after HA switch, addErasureCodingPolicy edit log can be applied smoothly. (was: Make addErasureCodingPolicy an idempotent operation to guarantee after HA switch, all edit log can be applied smoothly ) > make addErasureCodingPolicy an idempotent operation > --- > > Key: HDFS-12460 > URL: https://issues.apache.org/jira/browse/HDFS-12460 > Project: Hadoop HDFS > Issue Type: Improvement > Components: caching >Reporter: SammiChen >Assignee: SammiChen > > Make addErasureCodingPolicy an idempotent operation to guarantee after HA > switch, addErasureCodingPolicy edit log can be applied smoothly. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12460) make addErasureCodingPolicy an idempotent operation
[ https://issues.apache.org/jira/browse/HDFS-12460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SammiChen updated HDFS-12460: - Description: Make addErasureCodingPolicy an idempotent operation to guarantee after HA switch, all edit log can be applied smoothly (was: TestNamenodeRetryCache.testRetryCacheRebuild unit test case failure due to edit log opcode number increase.) > make addErasureCodingPolicy an idempotent operation > --- > > Key: HDFS-12460 > URL: https://issues.apache.org/jira/browse/HDFS-12460 > Project: Hadoop HDFS > Issue Type: Improvement > Components: caching >Reporter: SammiChen >Assignee: SammiChen > > Make addErasureCodingPolicy an idempotent operation to guarantee after HA > switch, all edit log can be applied smoothly -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12460) make addErasureCodingPolicy an idempotent operation
[ https://issues.apache.org/jira/browse/HDFS-12460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SammiChen updated HDFS-12460: - Issue Type: Improvement (was: Bug) > make addErasureCodingPolicy an idempotent operation > --- > > Key: HDFS-12460 > URL: https://issues.apache.org/jira/browse/HDFS-12460 > Project: Hadoop HDFS > Issue Type: Improvement > Components: caching >Reporter: SammiChen >Assignee: SammiChen > > TestNamenodeRetryCache.testRetryCacheRebuild unit test case failure due to > edit log opcode number increase. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12460) make addErasureCodingPolicy an idempotent operation
[ https://issues.apache.org/jira/browse/HDFS-12460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SammiChen updated HDFS-12460: - Summary: make addErasureCodingPolicy an idempotent operation (was: addErasureCodingPolicy should) > make addErasureCodingPolicy an idempotent operation > --- > > Key: HDFS-12460 > URL: https://issues.apache.org/jira/browse/HDFS-12460 > Project: Hadoop HDFS > Issue Type: Bug > Components: caching >Reporter: SammiChen >Assignee: SammiChen > > TestNamenodeRetryCache.testRetryCacheRebuild unit test case failure due to > edit log opcode number increase. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12460) addErasureCodingPolicy should
[ https://issues.apache.org/jira/browse/HDFS-12460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SammiChen updated HDFS-12460: - Summary: addErasureCodingPolicy should (was: TestNamenodeRetryCache.testRetryCacheRebuild unit test case failure) > addErasureCodingPolicy should > - > > Key: HDFS-12460 > URL: https://issues.apache.org/jira/browse/HDFS-12460 > Project: Hadoop HDFS > Issue Type: Bug > Components: caching >Reporter: SammiChen >Assignee: SammiChen > > TestNamenodeRetryCache.testRetryCacheRebuild unit test case failure due to > edit log opcode number increase. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12462) Erasure coding policy extra options should be sorted by key value
[ https://issues.apache.org/jira/browse/HDFS-12462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SammiChen updated HDFS-12462: - Labels: hdfs-ec-3.0-nice-to-have (was: ) > Erasure coding policy extra options should be sorted by key value > - > > Key: HDFS-12462 > URL: https://issues.apache.org/jira/browse/HDFS-12462 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding >Reporter: SammiChen > Labels: hdfs-ec-3.0-nice-to-have > > To make sure the serialized fsimage and editlog binary equal, Erasure coding > policy extra options should be sorted by key value. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12462) Erasure coding policy extra options should be sorted by key value
[ https://issues.apache.org/jira/browse/HDFS-12462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SammiChen updated HDFS-12462: - Component/s: erasure-coding > Erasure coding policy extra options should be sorted by key value > - > > Key: HDFS-12462 > URL: https://issues.apache.org/jira/browse/HDFS-12462 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding >Reporter: SammiChen > > To make sure the serialized fsimage and editlog binary equal, Erasure coding > policy extra options should be sorted by key value. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-12462) Erasure coding policy extra options should be sorted by key value
SammiChen created HDFS-12462: Summary: Erasure coding policy extra options should be sorted by key value Key: HDFS-12462 URL: https://issues.apache.org/jira/browse/HDFS-12462 Project: Hadoop HDFS Issue Type: Improvement Reporter: SammiChen To make sure the serialized fsimage and editlog binary equal, Erasure coding policy extra options should be sorted by key value. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12447) Refactor addErasureCodingPolicy
[ https://issues.apache.org/jira/browse/HDFS-12447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SammiChen updated HDFS-12447: - Attachment: HDFS-12447.001.patch > Refactor addErasureCodingPolicy > --- > > Key: HDFS-12447 > URL: https://issues.apache.org/jira/browse/HDFS-12447 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: SammiChen >Assignee: SammiChen > Attachments: HDFS-12447.001.patch > > > As a follow on to handle some issues discussed in HDFS-12395, this is to > majorly refactor addErasureCodingPoliy API, change AddECPolicyResponse => > AddErasureCodingPolicyResponse -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12447) Refactor addErasureCodingPolicy
[ https://issues.apache.org/jira/browse/HDFS-12447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SammiChen updated HDFS-12447: - Status: Patch Available (was: Open) > Refactor addErasureCodingPolicy > --- > > Key: HDFS-12447 > URL: https://issues.apache.org/jira/browse/HDFS-12447 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: SammiChen >Assignee: SammiChen > Attachments: HDFS-12447.001.patch > > > As a follow on to handle some issues discussed in HDFS-12395, this is to > majorly refactor addErasureCodingPoliy API, change AddECPolicyResponse => > AddErasureCodingPolicyResponse -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-12460) TestNamenodeRetryCache.testRetryCacheRebuild unit test case failure
SammiChen created HDFS-12460: Summary: TestNamenodeRetryCache.testRetryCacheRebuild unit test case failure Key: HDFS-12460 URL: https://issues.apache.org/jira/browse/HDFS-12460 Project: Hadoop HDFS Issue Type: Bug Components: caching Reporter: SammiChen Assignee: SammiChen TestNamenodeRetryCache.testRetryCacheRebuild unit test case failure due to edit log opcode number increase. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7859) Erasure Coding: Persist erasure coding policies in NameNode
[ https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16167239#comment-16167239 ] SammiChen commented on HDFS-7859: - Sure. Release note is ready. Thanks [~drankye], [~eddyxu], [~andrew.wang], [~xinwei], [~szetszwo], [~zhz] and [~jingzhao] for all your contribution and effort! > Erasure Coding: Persist erasure coding policies in NameNode > --- > > Key: HDFS-7859 > URL: https://issues.apache.org/jira/browse/HDFS-7859 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: SammiChen > Labels: hdfs-ec-3.0-must-do > Fix For: 3.0.0-beta1 > > Attachments: HDFS-7859.001.patch, HDFS-7859.002.patch, > HDFS-7859.004.patch, HDFS-7859.005.patch, HDFS-7859.006.patch, > HDFS-7859.007.patch, HDFS-7859.008.patch, HDFS-7859.009.patch, > HDFS-7859.010.patch, HDFS-7859.011.patch, HDFS-7859.012.patch, > HDFS-7859.013.patch, HDFS-7859.014.patch, HDFS-7859.015.patch, > HDFS-7859.016.patch, HDFS-7859.017.patch, HDFS-7859.018.patch, > HDFS-7859.019.patch, HDFS-7859-HDFS-7285.002.patch, > HDFS-7859-HDFS-7285.002.patch, HDFS-7859-HDFS-7285.003.patch > > > In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we > persist EC schemas in NameNode centrally and reliably, so that EC zones can > reference them by name efficiently. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-7859) Erasure Coding: Persist erasure coding policies in NameNode
[ https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SammiChen updated HDFS-7859: Release Note: Persist all built-in erasure coding policies and user defined erasure coding policies into NameNode fsImage and editlog reliably, so that all erasure coding policies remain consistent after NameNode restart. > Erasure Coding: Persist erasure coding policies in NameNode > --- > > Key: HDFS-7859 > URL: https://issues.apache.org/jira/browse/HDFS-7859 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: SammiChen > Labels: hdfs-ec-3.0-must-do > Fix For: 3.0.0-beta1 > > Attachments: HDFS-7859.001.patch, HDFS-7859.002.patch, > HDFS-7859.004.patch, HDFS-7859.005.patch, HDFS-7859.006.patch, > HDFS-7859.007.patch, HDFS-7859.008.patch, HDFS-7859.009.patch, > HDFS-7859.010.patch, HDFS-7859.011.patch, HDFS-7859.012.patch, > HDFS-7859.013.patch, HDFS-7859.014.patch, HDFS-7859.015.patch, > HDFS-7859.016.patch, HDFS-7859.017.patch, HDFS-7859.018.patch, > HDFS-7859.019.patch, HDFS-7859-HDFS-7285.002.patch, > HDFS-7859-HDFS-7285.002.patch, HDFS-7859-HDFS-7285.003.patch > > > In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we > persist EC schemas in NameNode centrally and reliably, so that EC zones can > reference them by name efficiently. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12449) TestReconstructStripedFile.testNNSendsErasureCodingTasks randomly cannot finish in 60s
[ https://issues.apache.org/jira/browse/HDFS-12449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SammiChen updated HDFS-12449: - Description: TestReconstructStripedFile.testNNSendsErasureCodingTasks randomly cannot finish in 60s reduce the file size and loop count was: TestReconstructStripedFile.testNNSendsErasureCodingTasks randomly cannot finish in 60s reduce the file size and loop account > TestReconstructStripedFile.testNNSendsErasureCodingTasks randomly cannot > finish in 60s > -- > > Key: HDFS-12449 > URL: https://issues.apache.org/jira/browse/HDFS-12449 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0-alpha4 >Reporter: SammiChen >Assignee: SammiChen > Labels: flaky-test > Attachments: HDFS-12449.001.patch > > > TestReconstructStripedFile.testNNSendsErasureCodingTasks randomly cannot > finish in 60s > reduce the file size and loop count -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12449) TestReconstructStripedFile.testNNSendsErasureCodingTasks randomly cannot finish in 60s
[ https://issues.apache.org/jira/browse/HDFS-12449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SammiChen updated HDFS-12449: - Status: Patch Available (was: Open) > TestReconstructStripedFile.testNNSendsErasureCodingTasks randomly cannot > finish in 60s > -- > > Key: HDFS-12449 > URL: https://issues.apache.org/jira/browse/HDFS-12449 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: SammiChen >Assignee: SammiChen > Labels: flaky-test > Attachments: HDFS-12449.001.patch > > > TestReconstructStripedFile.testNNSendsErasureCodingTasks randomly cannot > finish in 60s > reduce the file size and loop account -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12449) TestReconstructStripedFile.testNNSendsErasureCodingTasks randomly cannot finish in 60s
[ https://issues.apache.org/jira/browse/HDFS-12449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SammiChen updated HDFS-12449: - Attachment: HDFS-12449.001.patch > TestReconstructStripedFile.testNNSendsErasureCodingTasks randomly cannot > finish in 60s > -- > > Key: HDFS-12449 > URL: https://issues.apache.org/jira/browse/HDFS-12449 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: SammiChen >Assignee: SammiChen > Labels: flaky-test > Attachments: HDFS-12449.001.patch > > > TestReconstructStripedFile.testNNSendsErasureCodingTasks randomly cannot > finish in 60s > reduce the file size and loop account -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org