[jira] [Created] (HDFS-13762) Support non-volatile memory or storage class memory(SCM) in HDFS cache

2018-07-24 Thread SammiChen (JIRA)
SammiChen created HDFS-13762:


 Summary: Support non-volatile memory or storage class memory(SCM) 
in HDFS cache
 Key: HDFS-13762
 URL: https://issues.apache.org/jira/browse/HDFS-13762
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: caching, datanode
Reporter: SammiChen
Assignee: SammiChen


Non-volatile memory is a type of memory that can keep the data content after 
power failure or between the power cycle. Non-volatile memory device usually 
has near access speed as memory DIMM while has lower cost than memory.  So 
today It is usually used as a supplement to memory to hold long tern persistent 
data, such as data in cache. 

Currently in HDFS, we have OS page cache backed read only cache and RAMDISK 
based lazy write cache.  Non-volatile memory suits for both these functions. 

This Jira aims to enable non-volatile memory first in read cache, and then lazy 
write case. 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13642) Creating a file with block size smaller than EC policy's cell size should throw

2018-06-08 Thread SammiChen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16505749#comment-16505749
 ] 

SammiChen commented on HDFS-13642:
--

[~xiaochen], sorry I didn't notice the second 
blockManager.verifyReplication.  My + 1 for the last patch.  Thanks for 
the contribution. 

> Creating a file with block size smaller than EC policy's cell size should 
> throw
> ---
>
> Key: HDFS-13642
> URL: https://issues.apache.org/jira/browse/HDFS-13642
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.0.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Major
> Attachments: HDFS-13642.01.patch, HDFS-13642.02.patch, 
> HDFS-13642.03.patch, editsStored
>
>
> The following command causes an exception:
> {noformat}
> hadoop fs -Ddfs.block.size=349696 -put -f lineitem_sixblocks.parquet 
> /test-warehouse/tmp123ec
> {noformat}
> {noformat}
> 18/05/25 16:00:59 WARN hdfs.DataStreamer: DataStreamer Exception
> java.io.IOException: BlockSize 349696 < lastByteOffsetInBlock, #0: 
> blk_-9223372036854574256_14634, packet seqno: 7 offsetInBlock: 349696 
> lastPacketInBlock: false lastByteOffsetInBlock: 350208
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:729)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:46)
> 18/05/25 16:00:59 WARN hdfs.DFSOutputStream: Failed: offset=4096, length=512, 
> DFSStripedOutputStream:#0: failed, blk_-9223372036854574256_14634
> java.io.IOException: BlockSize 349696 < lastByteOffsetInBlock, #0: 
> blk_-9223372036854574256_14634, packet seqno: 7 offsetInBlock: 349696 
> lastPacketInBlock: false lastByteOffsetInBlock: 350208
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:729)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:46)
> {noformat}
> Then the streamer is confused and hangs.
> The local file is under 6MB, the hdfs file has a RS-3-2-1024k EC policy.
>  
> Credit to [~tarasbob] for reporting this issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13642) Creating a file with block size smaller than EC policy's cell size should throw

2018-06-06 Thread SammiChen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16502852#comment-16502852
 ] 

SammiChen commented on HDFS-13642:
--

Right, the current {{hasErasureCodingPolicy}} doesn't check if it's replication 
EC policy or normal EC policy.  We should improve the function to add the 
check. 

The original check should be kept. 
{quote}
  if (shouldReplicate ||
 {color:#f79232} 
(org.apache.commons.lang.StringUtils.isEmpty(ecPolicyName) &&
  !FSDirErasureCodingOp.hasErasureCodingPolicy(this, iip))){color} {
blockManager.verifyReplication(src, replication, clientMachine);
  }
{quote}

When the file is a 3 replica file, {{blockManager.verifyReplication}} should be 
called to verify the replication factor.  The value of {{shouldReplicate}} 
doesn't indicate file is 3 replica or not. The value of {{shouldReplicate}} 
only reflect if the {{CreateFlag.SHOULD_REPLICATE}} is explicated set. 

 

> Creating a file with block size smaller than EC policy's cell size should 
> throw
> ---
>
> Key: HDFS-13642
> URL: https://issues.apache.org/jira/browse/HDFS-13642
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.0.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Major
> Attachments: HDFS-13642.01.patch, HDFS-13642.02.patch, 
> HDFS-13642.03.patch, editsStored
>
>
> The following command causes an exception:
> {noformat}
> hadoop fs -Ddfs.block.size=349696 -put -f lineitem_sixblocks.parquet 
> /test-warehouse/tmp123ec
> {noformat}
> {noformat}
> 18/05/25 16:00:59 WARN hdfs.DataStreamer: DataStreamer Exception
> java.io.IOException: BlockSize 349696 < lastByteOffsetInBlock, #0: 
> blk_-9223372036854574256_14634, packet seqno: 7 offsetInBlock: 349696 
> lastPacketInBlock: false lastByteOffsetInBlock: 350208
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:729)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:46)
> 18/05/25 16:00:59 WARN hdfs.DFSOutputStream: Failed: offset=4096, length=512, 
> DFSStripedOutputStream:#0: failed, blk_-9223372036854574256_14634
> java.io.IOException: BlockSize 349696 < lastByteOffsetInBlock, #0: 
> blk_-9223372036854574256_14634, packet seqno: 7 offsetInBlock: 349696 
> lastPacketInBlock: false lastByteOffsetInBlock: 350208
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:729)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:46)
> {noformat}
> Then the streamer is confused and hangs.
> The local file is under 6MB, the hdfs file has a RS-3-2-1024k EC policy.
>  
> Credit to [~tarasbob] for reporting this issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13642) Creating a file with block size smaller than EC policy's cell size should throw

2018-06-04 Thread SammiChen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16499852#comment-16499852
 ] 

SammiChen commented on HDFS-13642:
--


Some comments,
1.  *private static final int BLOCK_SIZE = 1 << 20; // 16k*               
change the comments from 16k to 1MB
2.  {quote}
 if (!shouldReplicate) {
final ErasureCodingPolicy ecPolicy = FSDirErasureCodingOp
.getErasureCodingPolicy(this, ecPolicyName, iip);
if (ecPolicy != null && (!ecPolicy.isReplicationPolicy())) {
  if (blockSize < ecPolicy.getCellSize()) {
throw new IOException("Specified block size " + blockSize
+ " is less than the cell" + " size (" + ecPolicy.getCellSize()
+ ") of the erasure coding policy on this file.");
  }
}
  }
 {quote}

 When create a normal 3-replica file,  {{shouldReplicate}} value is false. 
This value is true when user set the {{CreateFlag.SHOULD_REPLICATE}}  
explicitly when calling the create API.  One suggestion is adding the block 
size, cell size compare statements as the else statement of 
   {quote}
 if (shouldReplicate ||
  (org.apache.commons.lang.StringUtils.isEmpty(ecPolicyName) &&
  !FSDirErasureCodingOp.hasErasureCodingPolicy(this, iip))) {
blockManager.verifyReplication(src, replication, clientMachine);
  }
   {quote}


Thanks for working on it,  [~xiaochen].





> Creating a file with block size smaller than EC policy's cell size should 
> throw
> ---
>
> Key: HDFS-13642
> URL: https://issues.apache.org/jira/browse/HDFS-13642
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.0.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Major
> Attachments: HDFS-13642.01.patch, HDFS-13642.02.patch, editsStored
>
>
> The following command causes an exception:
> {noformat}
> hadoop fs -Ddfs.block.size=349696 -put -f lineitem_sixblocks.parquet 
> /test-warehouse/tmp123ec
> {noformat}
> {noformat}
> 18/05/25 16:00:59 WARN hdfs.DataStreamer: DataStreamer Exception
> java.io.IOException: BlockSize 349696 < lastByteOffsetInBlock, #0: 
> blk_-9223372036854574256_14634, packet seqno: 7 offsetInBlock: 349696 
> lastPacketInBlock: false lastByteOffsetInBlock: 350208
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:729)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:46)
> 18/05/25 16:00:59 WARN hdfs.DFSOutputStream: Failed: offset=4096, length=512, 
> DFSStripedOutputStream:#0: failed, blk_-9223372036854574256_14634
> java.io.IOException: BlockSize 349696 < lastByteOffsetInBlock, #0: 
> blk_-9223372036854574256_14634, packet seqno: 7 offsetInBlock: 349696 
> lastPacketInBlock: false lastByteOffsetInBlock: 350208
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:729)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:46)
> {noformat}
> Then the streamer is confused and hangs.
> The local file is under 6MB, the hdfs file has a RS-3-2-1024k EC policy.
>  
> Credit to [~tarasbob] for reporting this issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13642) Creating a file with block size smaller than EC policy's cell size should throw

2018-05-31 Thread SammiChen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16496266#comment-16496266
 ] 

SammiChen commented on HDFS-13642:
--

[~xiaochen], agree, NN should reject the request when the block size is less 
than minimum block size.  NN should also reject if EC policy cell size is 
greater than the block size.  I will find time tomorrow to review the code. 

> Creating a file with block size smaller than EC policy's cell size should 
> throw
> ---
>
> Key: HDFS-13642
> URL: https://issues.apache.org/jira/browse/HDFS-13642
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.0.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Major
> Attachments: HDFS-13642.01.patch, HDFS-13642.02.patch, editsStored
>
>
> The following command causes an exception:
> {noformat}
> hadoop fs -Ddfs.block.size=349696 -put -f lineitem_sixblocks.parquet 
> /test-warehouse/tmp123ec
> {noformat}
> {noformat}
> 18/05/25 16:00:59 WARN hdfs.DataStreamer: DataStreamer Exception
> java.io.IOException: BlockSize 349696 < lastByteOffsetInBlock, #0: 
> blk_-9223372036854574256_14634, packet seqno: 7 offsetInBlock: 349696 
> lastPacketInBlock: false lastByteOffsetInBlock: 350208
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:729)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:46)
> 18/05/25 16:00:59 WARN hdfs.DFSOutputStream: Failed: offset=4096, length=512, 
> DFSStripedOutputStream:#0: failed, blk_-9223372036854574256_14634
> java.io.IOException: BlockSize 349696 < lastByteOffsetInBlock, #0: 
> blk_-9223372036854574256_14634, packet seqno: 7 offsetInBlock: 349696 
> lastPacketInBlock: false lastByteOffsetInBlock: 350208
>   at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:729)
>   at 
> org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:46)
> {noformat}
> Then the streamer is confused and hangs.
> The local file is under 6MB, the hdfs file has a RS-3-2-1024k EC policy.
>  
> Credit to [~tarasbob] for reporting this issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13540) DFSStripedInputStream should only allocate new buffers when reading

2018-05-23 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HDFS-13540:
-
  Resolution: Fixed
Target Version/s: 3.0.3  (was: 3.0.4)
  Status: Resolved  (was: Patch Available)

> DFSStripedInputStream should only allocate new buffers when reading
> ---
>
> Key: HDFS-13540
> URL: https://issues.apache.org/jira/browse/HDFS-13540
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Major
> Fix For: 3.2.0, 3.1.1, 3.0.3
>
> Attachments: HDFS-13540.01.patch, HDFS-13540.02.patch, 
> HDFS-13540.03.patch, HDFS-13540.04.patch, HDFS-13540.05.patch, 
> HDFS-13540.06.patch
>
>
> This was found in the same scenario where HDFS-13539 is caught.
> There are 2 OOM that looks interesting:
> {noformat}
> FSDataInputStream#close error:
> OutOfMemoryError: Direct buffer memoryjava.lang.OutOfMemoryError: Direct 
> buffer memory
> at java.nio.Bits.reserveMemory(Bits.java:694)
> at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
> at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
> at 
> org.apache.hadoop.io.ElasticByteBufferPool.getBuffer(ElasticByteBufferPool.java:95)
> at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.resetCurStripeBuffer(DFSStripedInputStream.java:118)
> at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.closeCurrentBlockReaders(DFSStripedInputStream.java:205)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.close(DFSInputStream.java:672)
> at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.close(DFSStripedInputStream.java:181)
> at java.io.FilterInputStream.close(FilterInputStream.java:181)
> {noformat}
> and 
> {noformat}
> org/apache/hadoop/fs/FSDataInputStream#unbuffer failed: error:
> OutOfMemoryError: Direct buffer memoryjava.lang.OutOfMemoryError: Direct 
> buffer memory
> at java.nio.Bits.reserveMemory(Bits.java:694)
> at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
> at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
> at 
> org.apache.hadoop.io.ElasticByteBufferPool.getBuffer(ElasticByteBufferPool.java:95)
> at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.resetCurStripeBuffer(DFSStripedInputStream.java:118)
> at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.closeCurrentBlockReaders(DFSStripedInputStream.java:205)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.unbuffer(DFSInputStream.java:1782)
> at 
> org.apache.hadoop.fs.StreamCapabilitiesPolicy.unbuffer(StreamCapabilitiesPolicy.java:48)
> at 
> org.apache.hadoop.fs.FSDataInputStream.unbuffer(FSDataInputStream.java:230)
> {noformat}
> As the stack trace goes, {{resetCurStripeBuffer}} will get buffer from the 
> buffer pool. We could save the cost of doing so if it's not for a read (e.g. 
> close, unbuffer etc.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13540) DFSStripedInputStream should only allocate new buffers when reading

2018-05-23 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16487136#comment-16487136
 ] 

SammiChen commented on HDFS-13540:
--

+1. Thanks [~xiaochen] for the contribution. Committed to trunk, branch-3.0 and 
branch-3.1.

> DFSStripedInputStream should only allocate new buffers when reading
> ---
>
> Key: HDFS-13540
> URL: https://issues.apache.org/jira/browse/HDFS-13540
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Major
> Fix For: 3.2.0, 3.1.1, 3.0.3
>
> Attachments: HDFS-13540.01.patch, HDFS-13540.02.patch, 
> HDFS-13540.03.patch, HDFS-13540.04.patch, HDFS-13540.05.patch, 
> HDFS-13540.06.patch
>
>
> This was found in the same scenario where HDFS-13539 is caught.
> There are 2 OOM that looks interesting:
> {noformat}
> FSDataInputStream#close error:
> OutOfMemoryError: Direct buffer memoryjava.lang.OutOfMemoryError: Direct 
> buffer memory
> at java.nio.Bits.reserveMemory(Bits.java:694)
> at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
> at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
> at 
> org.apache.hadoop.io.ElasticByteBufferPool.getBuffer(ElasticByteBufferPool.java:95)
> at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.resetCurStripeBuffer(DFSStripedInputStream.java:118)
> at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.closeCurrentBlockReaders(DFSStripedInputStream.java:205)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.close(DFSInputStream.java:672)
> at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.close(DFSStripedInputStream.java:181)
> at java.io.FilterInputStream.close(FilterInputStream.java:181)
> {noformat}
> and 
> {noformat}
> org/apache/hadoop/fs/FSDataInputStream#unbuffer failed: error:
> OutOfMemoryError: Direct buffer memoryjava.lang.OutOfMemoryError: Direct 
> buffer memory
> at java.nio.Bits.reserveMemory(Bits.java:694)
> at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
> at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
> at 
> org.apache.hadoop.io.ElasticByteBufferPool.getBuffer(ElasticByteBufferPool.java:95)
> at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.resetCurStripeBuffer(DFSStripedInputStream.java:118)
> at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.closeCurrentBlockReaders(DFSStripedInputStream.java:205)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.unbuffer(DFSInputStream.java:1782)
> at 
> org.apache.hadoop.fs.StreamCapabilitiesPolicy.unbuffer(StreamCapabilitiesPolicy.java:48)
> at 
> org.apache.hadoop.fs.FSDataInputStream.unbuffer(FSDataInputStream.java:230)
> {noformat}
> As the stack trace goes, {{resetCurStripeBuffer}} will get buffer from the 
> buffer pool. We could save the cost of doing so if it's not for a read (e.g. 
> close, unbuffer etc.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13540) DFSStripedInputStream should only allocate new buffers when reading

2018-05-23 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HDFS-13540:
-
Fix Version/s: 3.0.3
   3.1.1
   3.2.0

> DFSStripedInputStream should only allocate new buffers when reading
> ---
>
> Key: HDFS-13540
> URL: https://issues.apache.org/jira/browse/HDFS-13540
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Major
> Fix For: 3.2.0, 3.1.1, 3.0.3
>
> Attachments: HDFS-13540.01.patch, HDFS-13540.02.patch, 
> HDFS-13540.03.patch, HDFS-13540.04.patch, HDFS-13540.05.patch, 
> HDFS-13540.06.patch
>
>
> This was found in the same scenario where HDFS-13539 is caught.
> There are 2 OOM that looks interesting:
> {noformat}
> FSDataInputStream#close error:
> OutOfMemoryError: Direct buffer memoryjava.lang.OutOfMemoryError: Direct 
> buffer memory
> at java.nio.Bits.reserveMemory(Bits.java:694)
> at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
> at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
> at 
> org.apache.hadoop.io.ElasticByteBufferPool.getBuffer(ElasticByteBufferPool.java:95)
> at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.resetCurStripeBuffer(DFSStripedInputStream.java:118)
> at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.closeCurrentBlockReaders(DFSStripedInputStream.java:205)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.close(DFSInputStream.java:672)
> at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.close(DFSStripedInputStream.java:181)
> at java.io.FilterInputStream.close(FilterInputStream.java:181)
> {noformat}
> and 
> {noformat}
> org/apache/hadoop/fs/FSDataInputStream#unbuffer failed: error:
> OutOfMemoryError: Direct buffer memoryjava.lang.OutOfMemoryError: Direct 
> buffer memory
> at java.nio.Bits.reserveMemory(Bits.java:694)
> at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
> at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
> at 
> org.apache.hadoop.io.ElasticByteBufferPool.getBuffer(ElasticByteBufferPool.java:95)
> at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.resetCurStripeBuffer(DFSStripedInputStream.java:118)
> at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.closeCurrentBlockReaders(DFSStripedInputStream.java:205)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.unbuffer(DFSInputStream.java:1782)
> at 
> org.apache.hadoop.fs.StreamCapabilitiesPolicy.unbuffer(StreamCapabilitiesPolicy.java:48)
> at 
> org.apache.hadoop.fs.FSDataInputStream.unbuffer(FSDataInputStream.java:230)
> {noformat}
> As the stack trace goes, {{resetCurStripeBuffer}} will get buffer from the 
> buffer pool. We could save the cost of doing so if it's not for a read (e.g. 
> close, unbuffer etc.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13540) DFSStripedInputStream should only allocate new buffers when reading

2018-05-22 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483581#comment-16483581
 ] 

SammiChen commented on HDFS-13540:
--

Hi Xiao, the overall idea looks good to me.

1.  There are two relevant unit tests failed. The error message is 
"expected:<0> but was:<2>".   Maybe we can dig into why 2 buffers are allocated 
for a open stream which haven't read any content net. 

2.  @VisibleForTesting ahead of resetCurStripeBuffer is not necessary now. 

 

> DFSStripedInputStream should only allocate new buffers when reading
> ---
>
> Key: HDFS-13540
> URL: https://issues.apache.org/jira/browse/HDFS-13540
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Major
> Attachments: HDFS-13540.01.patch, HDFS-13540.02.patch, 
> HDFS-13540.03.patch, HDFS-13540.04.patch, HDFS-13540.05.patch
>
>
> This was found in the same scenario where HDFS-13539 is caught.
> There are 2 OOM that looks interesting:
> {noformat}
> FSDataInputStream#close error:
> OutOfMemoryError: Direct buffer memoryjava.lang.OutOfMemoryError: Direct 
> buffer memory
> at java.nio.Bits.reserveMemory(Bits.java:694)
> at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
> at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
> at 
> org.apache.hadoop.io.ElasticByteBufferPool.getBuffer(ElasticByteBufferPool.java:95)
> at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.resetCurStripeBuffer(DFSStripedInputStream.java:118)
> at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.closeCurrentBlockReaders(DFSStripedInputStream.java:205)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.close(DFSInputStream.java:672)
> at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.close(DFSStripedInputStream.java:181)
> at java.io.FilterInputStream.close(FilterInputStream.java:181)
> {noformat}
> and 
> {noformat}
> org/apache/hadoop/fs/FSDataInputStream#unbuffer failed: error:
> OutOfMemoryError: Direct buffer memoryjava.lang.OutOfMemoryError: Direct 
> buffer memory
> at java.nio.Bits.reserveMemory(Bits.java:694)
> at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
> at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
> at 
> org.apache.hadoop.io.ElasticByteBufferPool.getBuffer(ElasticByteBufferPool.java:95)
> at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.resetCurStripeBuffer(DFSStripedInputStream.java:118)
> at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.closeCurrentBlockReaders(DFSStripedInputStream.java:205)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.unbuffer(DFSInputStream.java:1782)
> at 
> org.apache.hadoop.fs.StreamCapabilitiesPolicy.unbuffer(StreamCapabilitiesPolicy.java:48)
> at 
> org.apache.hadoop.fs.FSDataInputStream.unbuffer(FSDataInputStream.java:230)
> {noformat}
> As the stack trace goes, {{resetCurStripeBuffer}} will get buffer from the 
> buffer pool. We could save the cost of doing so if it's not for a read (e.g. 
> close, unbuffer etc.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13540) DFSStripedInputStream should not allocate new buffers during close / unbuffer

2018-05-17 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16478871#comment-16478871
 ] 

SammiChen commented on HDFS-13540:
--

[~xiaochen], thanks for the explanation. It makes sense to change the Jira 
title as your proposal.  I double checked the code, *curStripeBuf* is only used 
in two EC read functions.

For the new test case, I would suggest,
 # change the name from testCloseDoesNotGetBuffer to  
testCloseDoesNotAllocateNewBuffer. It's more clear.
 # the test case always passes even when I use "true" in 
closeCurrentBlockReaders.  Because the *curStripeBuf* will be set to *null* 
after *stream.close* is called. So *assertNull(stream.getCurStripeBuf());* 
always stands.

The alternative to check whether buffer is allocated or not is to check the 
number of buffers holds by *ElasticByteBufferPool*. 

> DFSStripedInputStream should not allocate new buffers during close / unbuffer
> -
>
> Key: HDFS-13540
> URL: https://issues.apache.org/jira/browse/HDFS-13540
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Major
> Attachments: HDFS-13540.01.patch, HDFS-13540.02.patch, 
> HDFS-13540.03.patch
>
>
> This was found in the same scenario where HDFS-13539 is caught.
> There are 2 OOM that looks interesting:
> {noformat}
> FSDataInputStream#close error:
> OutOfMemoryError: Direct buffer memoryjava.lang.OutOfMemoryError: Direct 
> buffer memory
> at java.nio.Bits.reserveMemory(Bits.java:694)
> at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
> at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
> at 
> org.apache.hadoop.io.ElasticByteBufferPool.getBuffer(ElasticByteBufferPool.java:95)
> at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.resetCurStripeBuffer(DFSStripedInputStream.java:118)
> at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.closeCurrentBlockReaders(DFSStripedInputStream.java:205)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.close(DFSInputStream.java:672)
> at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.close(DFSStripedInputStream.java:181)
> at java.io.FilterInputStream.close(FilterInputStream.java:181)
> {noformat}
> and 
> {noformat}
> org/apache/hadoop/fs/FSDataInputStream#unbuffer failed: error:
> OutOfMemoryError: Direct buffer memoryjava.lang.OutOfMemoryError: Direct 
> buffer memory
> at java.nio.Bits.reserveMemory(Bits.java:694)
> at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
> at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
> at 
> org.apache.hadoop.io.ElasticByteBufferPool.getBuffer(ElasticByteBufferPool.java:95)
> at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.resetCurStripeBuffer(DFSStripedInputStream.java:118)
> at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.closeCurrentBlockReaders(DFSStripedInputStream.java:205)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.unbuffer(DFSInputStream.java:1782)
> at 
> org.apache.hadoop.fs.StreamCapabilitiesPolicy.unbuffer(StreamCapabilitiesPolicy.java:48)
> at 
> org.apache.hadoop.fs.FSDataInputStream.unbuffer(FSDataInputStream.java:230)
> {noformat}
> As the stack trace goes, {{resetCurStripeBuffer}} will get buffer from the 
> buffer pool. We could save the cost of doing so if it's not for a read (e.g. 
> close, unbuffer etc.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13540) DFSStripedInputStream should not allocate new buffers during close / unbuffer

2018-05-15 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16476804#comment-16476804
 ] 

SammiChen commented on HDFS-13540:
--

Hi [~xiaochen], thanks for working on this!

*{{closeCurrentBlockReaders}}* is called by *{{close}}, {{unbuffer}}*, and 
*{{DFSStripedInputStream.blockSeekTo}}*. I feel like when we use 
*{{resetCurStripeBuffer(false)}}* in *{{closeCurrentBlockReaders}}*, 
*{{DFSStripedInputStream.readWithStrategy}}* which calls *{{blockSeekTo}}* will 
have issue. Can you double check that?

> DFSStripedInputStream should not allocate new buffers during close / unbuffer
> -
>
> Key: HDFS-13540
> URL: https://issues.apache.org/jira/browse/HDFS-13540
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Major
> Attachments: HDFS-13540.01.patch, HDFS-13540.02.patch, 
> HDFS-13540.03.patch
>
>
> This was found in the same scenario where HDFS-13539 is caught.
> There are 2 OOM that looks interesting:
> {noformat}
> FSDataInputStream#close error:
> OutOfMemoryError: Direct buffer memoryjava.lang.OutOfMemoryError: Direct 
> buffer memory
> at java.nio.Bits.reserveMemory(Bits.java:694)
> at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
> at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
> at 
> org.apache.hadoop.io.ElasticByteBufferPool.getBuffer(ElasticByteBufferPool.java:95)
> at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.resetCurStripeBuffer(DFSStripedInputStream.java:118)
> at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.closeCurrentBlockReaders(DFSStripedInputStream.java:205)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.close(DFSInputStream.java:672)
> at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.close(DFSStripedInputStream.java:181)
> at java.io.FilterInputStream.close(FilterInputStream.java:181)
> {noformat}
> and 
> {noformat}
> org/apache/hadoop/fs/FSDataInputStream#unbuffer failed: error:
> OutOfMemoryError: Direct buffer memoryjava.lang.OutOfMemoryError: Direct 
> buffer memory
> at java.nio.Bits.reserveMemory(Bits.java:694)
> at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
> at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
> at 
> org.apache.hadoop.io.ElasticByteBufferPool.getBuffer(ElasticByteBufferPool.java:95)
> at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.resetCurStripeBuffer(DFSStripedInputStream.java:118)
> at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.closeCurrentBlockReaders(DFSStripedInputStream.java:205)
> at 
> org.apache.hadoop.hdfs.DFSInputStream.unbuffer(DFSInputStream.java:1782)
> at 
> org.apache.hadoop.fs.StreamCapabilitiesPolicy.unbuffer(StreamCapabilitiesPolicy.java:48)
> at 
> org.apache.hadoop.fs.FSDataInputStream.unbuffer(FSDataInputStream.java:230)
> {noformat}
> As the stack trace goes, {{resetCurStripeBuffer}} will get buffer from the 
> buffer pool. We could save the cost of doing so if it's not for a read (e.g. 
> close, unbuffer etc.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13388) RequestHedgingProxyProvider calls multiple configured NNs all the time

2018-04-10 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431908#comment-16431908
 ] 

SammiChen edited comment on HDFS-13388 at 4/10/18 8:22 AM:
---

Hi [~LiJinglun] and [~elgoiri],  branch-2.9 suffers build failure with this 
commit. Would you please double check it?  also check the branch-2.


was (Author: sammi):
Hi [~LiJinglun] and [~elgoiri],  branch-2.9 suffers build failure with this 
commit. Would you please double check it? 

> RequestHedgingProxyProvider calls multiple configured NNs all the time
> --
>
> Key: HDFS-13388
> URL: https://issues.apache.org/jira/browse/HDFS-13388
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Fix For: 2.10.0, 3.2.0, 3.1.1, 2.9.2, 3.0.4
>
> Attachments: HADOOP-13388.0001.patch, HADOOP-13388.0002.patch, 
> HADOOP-13388.0003.patch, HADOOP-13388.0004.patch, HADOOP-13388.0005.patch, 
> HADOOP-13388.0006.patch
>
>
> In HDFS-7858 RequestHedgingProxyProvider was designed to "first 
> simultaneously call multiple configured NNs to decide which is the active 
> Namenode and then for subsequent calls it will invoke the previously 
> successful NN ." But the current code call multiple configured NNs every time 
> even when we already got the successful NN. 
>  That's because in RetryInvocationHandler.java, ProxyDescriptor's member 
> proxyInfo is assigned only when it is constructed or when failover occurs. 
> RequestHedgingProxyProvider.currentUsedProxy is null in both cases, so the 
> only proxy we can get is always a dynamic proxy handled by 
> RequestHedgingInvocationHandler.class. RequestHedgingInvocationHandler.class 
> handles invoked method by calling multiple configured NNs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13388) RequestHedgingProxyProvider calls multiple configured NNs all the time

2018-04-10 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431908#comment-16431908
 ] 

SammiChen commented on HDFS-13388:
--

Hi [~LiJinglun] and [~elgoiri],  branch-2.9 suffers build failure with this 
commit. Would you please double check it? 

> RequestHedgingProxyProvider calls multiple configured NNs all the time
> --
>
> Key: HDFS-13388
> URL: https://issues.apache.org/jira/browse/HDFS-13388
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Fix For: 2.10.0, 3.2.0, 3.1.1, 2.9.2, 3.0.4
>
> Attachments: HADOOP-13388.0001.patch, HADOOP-13388.0002.patch, 
> HADOOP-13388.0003.patch, HADOOP-13388.0004.patch, HADOOP-13388.0005.patch, 
> HADOOP-13388.0006.patch
>
>
> In HDFS-7858 RequestHedgingProxyProvider was designed to "first 
> simultaneously call multiple configured NNs to decide which is the active 
> Namenode and then for subsequent calls it will invoke the previously 
> successful NN ." But the current code call multiple configured NNs every time 
> even when we already got the successful NN. 
>  That's because in RetryInvocationHandler.java, ProxyDescriptor's member 
> proxyInfo is assigned only when it is constructed or when failover occurs. 
> RequestHedgingProxyProvider.currentUsedProxy is null in both cases, so the 
> only proxy we can get is always a dynamic proxy handled by 
> RequestHedgingInvocationHandler.class. RequestHedgingInvocationHandler.class 
> handles invoked method by calling multiple configured NNs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11600) Refactor TestDFSStripedOutputStreamWithFailure test classes

2018-04-07 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HDFS-11600:
-
Component/s: erasure-coding

> Refactor TestDFSStripedOutputStreamWithFailure test classes
> ---
>
> Key: HDFS-11600
> URL: https://issues.apache.org/jira/browse/HDFS-11600
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding, test
>Affects Versions: 3.0.0-alpha2
>Reporter: Andrew Wang
>Assignee: SammiChen
>Priority: Minor
> Fix For: 3.1.0, 3.0.3
>
> Attachments: HDFS-11600-1.patch, HDFS-11600.002.patch, 
> HDFS-11600.003.patch, HDFS-11600.004.patch, HDFS-11600.005.patch, 
> HDFS-11600.006.patch, HDFS-11600.007.patch
>
>
> TestDFSStripedOutputStreamWithFailure has a great number of subclasses. The 
> tests are parameterized based on the name of these subclasses.
> Seems like we could parameterize these tests with JUnit and then not need all 
> these separate test classes.
> Another note, the tests will randomly return instead of running the test. 
> Using {{Assume}} instead would make it more clear in the test output that 
> these tests were skipped.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10183) Prevent race condition during class initialization

2018-04-02 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HDFS-10183:
-
Fix Version/s: (was: 2.9.1)

> Prevent race condition during class initialization
> --
>
> Key: HDFS-10183
> URL: https://issues.apache.org/jira/browse/HDFS-10183
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 2.9.0
>Reporter: Pavel Avgustinov
>Assignee: Pavel Avgustinov
>Priority: Minor
> Attachments: HADOOP-12944.1.patch, HDFS-10183.2.patch
>
>
> In HADOOP-11969, [~busbey] tracked down a non-deterministic 
> {{NullPointerException}} to an oddity in the Java memory model: When multiple 
> threads trigger the loading of a class at the same time, one of them wins and 
> creates the {{java.lang.Class}} instance; the others block during this 
> initialization, but once it is complete they may obtain a reference to the 
> {{Class}} which has non-{{final}} fields still containing their default (i.e. 
> {{null}}) values. This leads to runtime failures that are hard to debug or 
> diagnose.
> HADOOP-11969 observed that {{ThreadLocal}} fields, by their very nature, are 
> very likely to be accessed from multiple threads, and thus the problem is 
> particularly severe there. Consequently, the patch removed all occurrences of 
> the issue in the code base.
> Unfortunately, since then HDFS-7964 has [reverted one of the fixes during a 
> refactoring|https://github.com/apache/hadoop/commit/2151716832ad14932dd65b1a4e47e64d8d6cd767#diff-0c2e9f7f9e685f38d1a11373b627cfa6R151],
>  and introduced a [new instance of the 
> problem|https://github.com/apache/hadoop/commit/2151716832ad14932dd65b1a4e47e64d8d6cd767#diff-6334d0df7d9aefbccd12b21bb7603169R43].
> The attached patch addresses the issue by adding the missing {{final}} 
> modifier in these two cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13337) Backport HDFS-4275 to branch-2.9

2018-04-02 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HDFS-13337:
-
Target Version/s: 2.10.0, 2.9.2  (was: 2.10.0, 2.9.1)

> Backport HDFS-4275 to branch-2.9
> 
>
> Key: HDFS-13337
> URL: https://issues.apache.org/jira/browse/HDFS-13337
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Íñigo Goiri
>Assignee: Xiao Liang
>Priority: Minor
> Attachments: HDFS-13337-branch-2.000.patch
>
>
> Multiple HDFS test suites fail on Windows during initialization of 
> MiniDFSCluster due to "Could not fully delete" the name testing data 
> directory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11885) createEncryptionZone should not block on initializing EDEK cache

2018-04-02 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HDFS-11885:
-
Target Version/s: 2.8.3, 3.2.0, 2.9.2  (was: 2.8.3, 2.9.1, 3.2.0)

> createEncryptionZone should not block on initializing EDEK cache
> 
>
> Key: HDFS-11885
> URL: https://issues.apache.org/jira/browse/HDFS-11885
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption
>Affects Versions: 2.6.5
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>Priority: Major
> Attachments: HDFS-11885.001.patch, HDFS-11885.002.patch, 
> HDFS-11885.003.patch, HDFS-11885.004.patch
>
>
> When creating an encryption zone, we call {{ensureKeyIsInitialized}}, which 
> calls {{provider.warmUpEncryptedKeys(keyName)}}. This is a blocking call, 
> which attempts to fill the key cache up to the low watermark.
> If the KMS is down or slow, this can take a very long time, and cause the 
> createZone RPC to fail with a timeout.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12257) Expose getSnapshottableDirListing as a public API in HdfsAdmin

2018-04-02 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HDFS-12257:
-
Target Version/s: 2.8.3, 3.2.0, 2.9.2  (was: 2.8.3, 2.9.1, 3.2.0)

> Expose getSnapshottableDirListing as a public API in HdfsAdmin
> --
>
> Key: HDFS-12257
> URL: https://issues.apache.org/jira/browse/HDFS-12257
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: snapshots
>Affects Versions: 2.6.5
>Reporter: Andrew Wang
>Assignee: Huafeng Wang
>Priority: Major
> Attachments: HDFS-12257.001.patch, HDFS-12257.002.patch, 
> HDFS-12257.003.patch
>
>
> Found at HIVE-16294. We have a CLI API for listing snapshottable dirs, but no 
> programmatic API. Other snapshot APIs are exposed in HdfsAdmin, I think we 
> should expose listing there as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13051) dead lock occurs when rolleditlog rpc call happen and editPendingQ is full

2018-04-02 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HDFS-13051:
-
Target Version/s: 2.10.0, 2.8.4, 2.7.6, 3.0.2, 2.9.2  (was: 2.10.0, 2.9.1, 
2.8.4, 2.7.6, 3.0.2)

> dead lock occurs when rolleditlog rpc call happen and editPendingQ is full
> --
>
> Key: HDFS-13051
> URL: https://issues.apache.org/jira/browse/HDFS-13051
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.5
>Reporter: zhangwei
>Assignee: Daryn Sharp
>Priority: Major
>  Labels: AsyncEditlog, deadlock
> Attachments: HDFS-13112.patch, deadlock.patch
>
>
> when doing rolleditlog it acquires  fs write lock,then acquire FSEditLogAsync 
> lock object,and write 3 EDIT(the second one override logEdit method and 
> return true)
> in extremely case,when FSEditLogAsync's logSync is very 
> slow,editPendingQ(default size 4096)is full,it case IPC thread can not offer 
> edit object into editPendingQ when doing rolleditlog,it block on editPendingQ 
> .put  method,however it does't release FSEditLogAsync object lock, and 
> edit.logEdit method in FSEditLogAsync.run thread can never acquire 
> FSEditLogAsync object lock, it case dead lock
> stack trace like below
> "Thread[Thread-44528,5,main]" #130093 daemon prio=5 os_prio=0 
> tid=0x02377000 nid=0x13fda waiting on condition [0x7fb3297de000]
>  java.lang.Thread.State: WAITING (parking)
>  at sun.misc.Unsafe.park(Native Method)
>  - parking to wait for <0x7fbd3cb96f58> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
>  at java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:353)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.enqueueEdit(FSEditLogAsync.java:156)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.logEdit(FSEditLogAsync.java:118)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.logCancelDelegationToken(FSEditLog.java:1008)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.logExpireDelegationToken(FSNamesystem.java:7635)
>  at 
> org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenSecretManager.logExpireToken(DelegationTokenSecretManager.java:395)
>  - locked <0x7fbd3cbae500> (a java.lang.Object)
>  at 
> org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenSecretManager.logExpireToken(DelegationTokenSecretManager.java:62)
>  at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.removeExpiredToken(AbstractDelegationTokenSecretManager.java:604)
>  at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.access$400(AbstractDelegationTokenSecretManager.java:54)
>  at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager$ExpiredTokenRemover.run(AbstractDelegationTokenSecretManager.java:656)
>  at java.lang.Thread.run(Thread.java:745)
> "FSEditLogAsync" #130072 daemon prio=5 os_prio=0 tid=0x0715b800 
> nid=0x13fbf waiting for monitor entry [0x7fb32c51a000]
>  java.lang.Thread.State: BLOCKED (on object monitor)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.doEditTransaction(FSEditLog.java:443)
>  - waiting to lock <*0x7fbcbc131000*> (a 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync$Edit.logEdit(FSEditLogAsync.java:233)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.run(FSEditLogAsync.java:177)
>  at java.lang.Thread.run(Thread.java:745)
> "IPC Server handler 47 on 53310" #337 daemon prio=5 os_prio=0 
> tid=0x7fe659d46000 nid=0x4c62 waiting on condition [0x7fb32fe52000]
>  java.lang.Thread.State: WAITING (parking)
>  at sun.misc.Unsafe.park(Native Method)
>  - parking to wait for <0x7fbd3cb96f58> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
>  at java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:353)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.enqueueEdit(FSEditLogAsync.java:156)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.logEdit(FSEditLogAsync.java:118)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:1251)
>  - locked <*0x7fbcbc131000*> (a 
> 

[jira] [Updated] (HDFS-13174) hdfs mover -p /path times out after 20 min

2018-04-02 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HDFS-13174:
-
Target Version/s: 3.0.1, 2.8.4, 2.7.6, 2.9.2  (was: 2.9.1, 3.0.1, 2.8.4, 
2.7.6)

> hdfs mover -p /path times out after 20 min
> --
>
> Key: HDFS-13174
> URL: https://issues.apache.org/jira/browse/HDFS-13174
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Affects Versions: 2.8.0, 2.7.4, 3.0.0-alpha2
>Reporter: Istvan Fajth
>Assignee: Istvan Fajth
>Priority: Major
>
> In HDFS-11015 there is an iteration timeout introduced in Dispatcher.Source 
> class, that is checked during dispatching the moves that the Balancer and the 
> Mover does. This timeout is hardwired to 20 minutes.
> In the Balancer we have iterations, and even if an iteration is timing out 
> the Balancer runs further and does an other iteration before it fails if 
> there were no moves happened in a few iterations.
> The Mover on the other hand does not have iterations, so if moving a path 
> runs for more than 20 minutes, after 20 minutes Mover will stop with the 
> following exception reported to the console (lines might differ as this 
> exception came from a CDH5.12.1 installation):
> java.io.IOException: Block move timed out
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.receiveResponse(Dispatcher.java:382)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:328)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2500(Dispatcher.java:186)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:956)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11600) Refactor TestDFSStripedOutputStreamWithFailure test classes

2018-03-14 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HDFS-11600:
-
Fix Version/s: (was: 3.2.0)

> Refactor TestDFSStripedOutputStreamWithFailure test classes
> ---
>
> Key: HDFS-11600
> URL: https://issues.apache.org/jira/browse/HDFS-11600
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 3.0.0-alpha2
>Reporter: Andrew Wang
>Assignee: SammiChen
>Priority: Minor
> Fix For: 3.1.0, 3.0.2
>
> Attachments: HDFS-11600-1.patch, HDFS-11600.002.patch, 
> HDFS-11600.003.patch, HDFS-11600.004.patch, HDFS-11600.005.patch, 
> HDFS-11600.006.patch, HDFS-11600.007.patch
>
>
> TestDFSStripedOutputStreamWithFailure has a great number of subclasses. The 
> tests are parameterized based on the name of these subclasses.
> Seems like we could parameterize these tests with JUnit and then not need all 
> these separate test classes.
> Another note, the tests will randomly return instead of running the test. 
> Using {{Assume}} instead would make it more clear in the test output that 
> these tests were skipped.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11600) Refactor TestDFSStripedOutputStreamWithFailure test classes

2018-03-14 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HDFS-11600:
-
Target Version/s: 3.1.0, 3.0.2  (was: 3.1.0)

> Refactor TestDFSStripedOutputStreamWithFailure test classes
> ---
>
> Key: HDFS-11600
> URL: https://issues.apache.org/jira/browse/HDFS-11600
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 3.0.0-alpha2
>Reporter: Andrew Wang
>Assignee: SammiChen
>Priority: Minor
> Fix For: 3.1.0, 3.0.2
>
> Attachments: HDFS-11600-1.patch, HDFS-11600.002.patch, 
> HDFS-11600.003.patch, HDFS-11600.004.patch, HDFS-11600.005.patch, 
> HDFS-11600.006.patch, HDFS-11600.007.patch
>
>
> TestDFSStripedOutputStreamWithFailure has a great number of subclasses. The 
> tests are parameterized based on the name of these subclasses.
> Seems like we could parameterize these tests with JUnit and then not need all 
> these separate test classes.
> Another note, the tests will randomly return instead of running the test. 
> Using {{Assume}} instead would make it more clear in the test output that 
> these tests were skipped.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11600) Refactor TestDFSStripedOutputStreamWithFailure test classes

2018-03-14 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HDFS-11600:
-
   Resolution: Fixed
Fix Version/s: 3.2.0
   3.0.2
   3.1.0
   Status: Resolved  (was: Patch Available)

> Refactor TestDFSStripedOutputStreamWithFailure test classes
> ---
>
> Key: HDFS-11600
> URL: https://issues.apache.org/jira/browse/HDFS-11600
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 3.0.0-alpha2
>Reporter: Andrew Wang
>Assignee: SammiChen
>Priority: Minor
> Fix For: 3.1.0, 3.0.2, 3.2.0
>
> Attachments: HDFS-11600-1.patch, HDFS-11600.002.patch, 
> HDFS-11600.003.patch, HDFS-11600.004.patch, HDFS-11600.005.patch, 
> HDFS-11600.006.patch, HDFS-11600.007.patch
>
>
> TestDFSStripedOutputStreamWithFailure has a great number of subclasses. The 
> tests are parameterized based on the name of these subclasses.
> Seems like we could parameterize these tests with JUnit and then not need all 
> these separate test classes.
> Another note, the tests will randomly return instead of running the test. 
> Using {{Assume}} instead would make it more clear in the test output that 
> these tests were skipped.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-11600) Refactor TestDFSStripedOutputStreamWithFailure test classes

2018-03-14 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen reassigned HDFS-11600:


Assignee: SammiChen

> Refactor TestDFSStripedOutputStreamWithFailure test classes
> ---
>
> Key: HDFS-11600
> URL: https://issues.apache.org/jira/browse/HDFS-11600
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 3.0.0-alpha2
>Reporter: Andrew Wang
>Assignee: SammiChen
>Priority: Minor
> Attachments: HDFS-11600-1.patch, HDFS-11600.002.patch, 
> HDFS-11600.003.patch, HDFS-11600.004.patch, HDFS-11600.005.patch, 
> HDFS-11600.006.patch, HDFS-11600.007.patch
>
>
> TestDFSStripedOutputStreamWithFailure has a great number of subclasses. The 
> tests are parameterized based on the name of these subclasses.
> Seems like we could parameterize these tests with JUnit and then not need all 
> these separate test classes.
> Another note, the tests will randomly return instead of running the test. 
> Using {{Assume}} instead would make it more clear in the test output that 
> these tests were skipped.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11600) Refactor TestDFSStripedOutputStreamWithFailure test classes

2018-03-13 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16396772#comment-16396772
 ] 

SammiChen commented on HDFS-11600:
--

Thanks [~xiaochen] for the review.  I uploaded 007 patch to address the line 
length checkstyle issue. Will commit after the pre-commit build comes out. 

Also thanks [~andrew.wang] for the initial patches and [~rakeshr] for the 
review. 

> Refactor TestDFSStripedOutputStreamWithFailure test classes
> ---
>
> Key: HDFS-11600
> URL: https://issues.apache.org/jira/browse/HDFS-11600
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 3.0.0-alpha2
>Reporter: Andrew Wang
>Priority: Minor
> Attachments: HDFS-11600-1.patch, HDFS-11600.002.patch, 
> HDFS-11600.003.patch, HDFS-11600.004.patch, HDFS-11600.005.patch, 
> HDFS-11600.006.patch, HDFS-11600.007.patch
>
>
> TestDFSStripedOutputStreamWithFailure has a great number of subclasses. The 
> tests are parameterized based on the name of these subclasses.
> Seems like we could parameterize these tests with JUnit and then not need all 
> these separate test classes.
> Another note, the tests will randomly return instead of running the test. 
> Using {{Assume}} instead would make it more clear in the test output that 
> these tests were skipped.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11600) Refactor TestDFSStripedOutputStreamWithFailure test classes

2018-03-13 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HDFS-11600:
-
Attachment: HDFS-11600.007.patch

> Refactor TestDFSStripedOutputStreamWithFailure test classes
> ---
>
> Key: HDFS-11600
> URL: https://issues.apache.org/jira/browse/HDFS-11600
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 3.0.0-alpha2
>Reporter: Andrew Wang
>Priority: Minor
> Attachments: HDFS-11600-1.patch, HDFS-11600.002.patch, 
> HDFS-11600.003.patch, HDFS-11600.004.patch, HDFS-11600.005.patch, 
> HDFS-11600.006.patch, HDFS-11600.007.patch
>
>
> TestDFSStripedOutputStreamWithFailure has a great number of subclasses. The 
> tests are parameterized based on the name of these subclasses.
> Seems like we could parameterize these tests with JUnit and then not need all 
> these separate test classes.
> Another note, the tests will randomly return instead of running the test. 
> Using {{Assume}} instead would make it more clear in the test output that 
> these tests were skipped.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11600) Refactor TestDFSStripedOutputStreamWithFailure test classes

2018-03-12 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16396493#comment-16396493
 ] 

SammiChen commented on HDFS-11600:
--

Thanks [~xiaochen] for the comments.

bq.  Do you know why this range was chosen?
I didn't know initial reason. By going through the code, I guess at the moment 
the TestDFSStripedOutputStreamWithFailure is introduced, the only supported EC 
policy is RS-6-3-64K.  The intent is to test the file with length varies from 
[0, 1, 2] block groups, each time block group's cell number varies from [0 - 
(6(data block number)*4(cell per block)-1] , plus [-1,0,1] delta length.  So 
approximately there will be total 3 * ((6 * 4) -1) * 3 = 207 length variants. 
While now we support more EC policies, especially RS-10-4, so the previous 210 
variants doesn't stand any more. Actually the variants should varies when 
different EC polices is used 
inTestDFSStripedOutputStreamWithFailureWithRandomECPolicy. 

bq. This is from existing code, but now may be a good chance to change - could 
you do tearDown with a @After annotation? This way, each test doesn't have to 
try-finally.
Agree. While there is a loop in testBlockTokenExpired which requires setup and 
tearDown the cluster every iterate. So It seems better to keep it. 




> Refactor TestDFSStripedOutputStreamWithFailure test classes
> ---
>
> Key: HDFS-11600
> URL: https://issues.apache.org/jira/browse/HDFS-11600
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 3.0.0-alpha2
>Reporter: Andrew Wang
>Priority: Minor
> Attachments: HDFS-11600-1.patch, HDFS-11600.002.patch, 
> HDFS-11600.003.patch, HDFS-11600.004.patch, HDFS-11600.005.patch, 
> HDFS-11600.006.patch
>
>
> TestDFSStripedOutputStreamWithFailure has a great number of subclasses. The 
> tests are parameterized based on the name of these subclasses.
> Seems like we could parameterize these tests with JUnit and then not need all 
> these separate test classes.
> Another note, the tests will randomly return instead of running the test. 
> Using {{Assume}} instead would make it more clear in the test output that 
> these tests were skipped.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11600) Refactor TestDFSStripedOutputStreamWithFailure test classes

2018-03-12 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HDFS-11600:
-
Attachment: HDFS-11600.006.patch

> Refactor TestDFSStripedOutputStreamWithFailure test classes
> ---
>
> Key: HDFS-11600
> URL: https://issues.apache.org/jira/browse/HDFS-11600
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 3.0.0-alpha2
>Reporter: Andrew Wang
>Priority: Minor
> Attachments: HDFS-11600-1.patch, HDFS-11600.002.patch, 
> HDFS-11600.003.patch, HDFS-11600.004.patch, HDFS-11600.005.patch, 
> HDFS-11600.006.patch
>
>
> TestDFSStripedOutputStreamWithFailure has a great number of subclasses. The 
> tests are parameterized based on the name of these subclasses.
> Seems like we could parameterize these tests with JUnit and then not need all 
> these separate test classes.
> Another note, the tests will randomly return instead of running the test. 
> Using {{Assume}} instead would make it more clear in the test output that 
> these tests were skipped.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11600) Refactor TestDFSStripedOutputStreamWithFailure test classes

2018-03-12 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HDFS-11600:
-
Attachment: HDFS-11600.005.patch

> Refactor TestDFSStripedOutputStreamWithFailure test classes
> ---
>
> Key: HDFS-11600
> URL: https://issues.apache.org/jira/browse/HDFS-11600
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 3.0.0-alpha2
>Reporter: Andrew Wang
>Priority: Minor
> Attachments: HDFS-11600-1.patch, HDFS-11600.002.patch, 
> HDFS-11600.003.patch, HDFS-11600.004.patch, HDFS-11600.005.patch
>
>
> TestDFSStripedOutputStreamWithFailure has a great number of subclasses. The 
> tests are parameterized based on the name of these subclasses.
> Seems like we could parameterize these tests with JUnit and then not need all 
> these separate test classes.
> Another note, the tests will randomly return instead of running the test. 
> Using {{Assume}} instead would make it more clear in the test output that 
> these tests were skipped.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-11600) Refactor TestDFSStripedOutputStreamWithFailure test classes

2018-03-12 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394916#comment-16394916
 ] 

SammiChen edited comment on HDFS-11600 at 3/12/18 8:37 AM:
---

Hi [~rakeshr], thanks for the comments.  

bq. 2. I hope you have named the class with "P" to represent parameterized 
class. Can we give a meaningful name instead of appending with letter "P" - 
TestDFSStripedOutputStreamWithFailureP, 
TestDFSStripedOutputStreamWithFailurePWithRandomECPolicy.

Regarding the class name with "P", you are right, it stands for 
"parameterized".   I was originally using the full word, then found out the 
class name becomes very very long, especially 
"TestDFSStripedOutputStreamWithFailureWithRandomECPolicy". 

bq. 
4.TestDFSStripedOutputStreamWithFailureBase#testCloseWithExceptionsInStreamer 
function is not used anywhere. Whats the purpose of this?
testCloseWithExceptionsInStreamer is both 
TestDFSStripedOutputStreamWithFailureBase and 
TestDFSStripedOutputStreamWithFailure. I will remove it from 
TestDFSStripedOutputStreamWithFailureBase. 

Will soon upload a new patch. 


was (Author: sammi):
Hi [~rakeshr], thanks for the comments.  I will upload a new patch to address 
all the issues. Regarding the class name with "P", you are right, it stands for 
"parameterized".   I was originally using the full word, then found out the 
class name becomes very very long, especially 
"TestDFSStripedOutputStreamWithFailureWithRandomECPolicy". 

> Refactor TestDFSStripedOutputStreamWithFailure test classes
> ---
>
> Key: HDFS-11600
> URL: https://issues.apache.org/jira/browse/HDFS-11600
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 3.0.0-alpha2
>Reporter: Andrew Wang
>Priority: Minor
> Attachments: HDFS-11600-1.patch, HDFS-11600.002.patch, 
> HDFS-11600.003.patch, HDFS-11600.004.patch
>
>
> TestDFSStripedOutputStreamWithFailure has a great number of subclasses. The 
> tests are parameterized based on the name of these subclasses.
> Seems like we could parameterize these tests with JUnit and then not need all 
> these separate test classes.
> Another note, the tests will randomly return instead of running the test. 
> Using {{Assume}} instead would make it more clear in the test output that 
> these tests were skipped.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11600) Refactor TestDFSStripedOutputStreamWithFailure test classes

2018-03-12 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394916#comment-16394916
 ] 

SammiChen commented on HDFS-11600:
--

Hi [~rakeshr], thanks for the comments.  I will upload a new patch to address 
all the issues. Regarding the class name with "P", you are right, it stands for 
"parameterized".   I was originally using the full word, then found out the 
class name becomes very very long, especially 
"TestDFSStripedOutputStreamWithFailureWithRandomECPolicy". 

> Refactor TestDFSStripedOutputStreamWithFailure test classes
> ---
>
> Key: HDFS-11600
> URL: https://issues.apache.org/jira/browse/HDFS-11600
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 3.0.0-alpha2
>Reporter: Andrew Wang
>Priority: Minor
> Attachments: HDFS-11600-1.patch, HDFS-11600.002.patch, 
> HDFS-11600.003.patch, HDFS-11600.004.patch
>
>
> TestDFSStripedOutputStreamWithFailure has a great number of subclasses. The 
> tests are parameterized based on the name of these subclasses.
> Seems like we could parameterize these tests with JUnit and then not need all 
> these separate test classes.
> Another note, the tests will randomly return instead of running the test. 
> Using {{Assume}} instead would make it more clear in the test output that 
> these tests were skipped.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11600) Refactor TestDFSStripedOutputStreamWithFailure test classes

2018-03-07 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390803#comment-16390803
 ] 

SammiChen commented on HDFS-11600:
--

Hi [~rakeshr], do you have time to help review the patch? 

> Refactor TestDFSStripedOutputStreamWithFailure test classes
> ---
>
> Key: HDFS-11600
> URL: https://issues.apache.org/jira/browse/HDFS-11600
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 3.0.0-alpha2
>Reporter: Andrew Wang
>Priority: Minor
> Attachments: HDFS-11600-1.patch, HDFS-11600.002.patch, 
> HDFS-11600.003.patch, HDFS-11600.004.patch
>
>
> TestDFSStripedOutputStreamWithFailure has a great number of subclasses. The 
> tests are parameterized based on the name of these subclasses.
> Seems like we could parameterize these tests with JUnit and then not need all 
> these separate test classes.
> Another note, the tests will randomly return instead of running the test. 
> Using {{Assume}} instead would make it more clear in the test output that 
> these tests were skipped.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11600) Refactor TestDFSStripedOutputStreamWithFailure test classes

2018-03-07 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389328#comment-16389328
 ] 

SammiChen commented on HDFS-11600:
--

Handle javac issues and improve the {{testMultipleDatanodeFailure56}} to make 
sure will not run out of heap size. 

> Refactor TestDFSStripedOutputStreamWithFailure test classes
> ---
>
> Key: HDFS-11600
> URL: https://issues.apache.org/jira/browse/HDFS-11600
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 3.0.0-alpha2
>Reporter: Andrew Wang
>Priority: Minor
> Attachments: HDFS-11600-1.patch, HDFS-11600.002.patch, 
> HDFS-11600.003.patch, HDFS-11600.004.patch
>
>
> TestDFSStripedOutputStreamWithFailure has a great number of subclasses. The 
> tests are parameterized based on the name of these subclasses.
> Seems like we could parameterize these tests with JUnit and then not need all 
> these separate test classes.
> Another note, the tests will randomly return instead of running the test. 
> Using {{Assume}} instead would make it more clear in the test output that 
> these tests were skipped.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11600) Refactor TestDFSStripedOutputStreamWithFailure test classes

2018-03-07 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HDFS-11600:
-
Attachment: HDFS-11600.004.patch

> Refactor TestDFSStripedOutputStreamWithFailure test classes
> ---
>
> Key: HDFS-11600
> URL: https://issues.apache.org/jira/browse/HDFS-11600
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 3.0.0-alpha2
>Reporter: Andrew Wang
>Priority: Minor
> Attachments: HDFS-11600-1.patch, HDFS-11600.002.patch, 
> HDFS-11600.003.patch, HDFS-11600.004.patch
>
>
> TestDFSStripedOutputStreamWithFailure has a great number of subclasses. The 
> tests are parameterized based on the name of these subclasses.
> Seems like we could parameterize these tests with JUnit and then not need all 
> these separate test classes.
> Another note, the tests will randomly return instead of running the test. 
> Using {{Assume}} instead would make it more clear in the test output that 
> these tests were skipped.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12654) APPEND API call is different in HTTPFS and NameNode REST

2018-03-05 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387350#comment-16387350
 ] 

SammiChen commented on HDFS-12654:
--

Hi,  [~Nuke] and [~iwasakims], seem it's not a issue after the further 
investigation. Can it be closed? 

> APPEND API call is different in HTTPFS and NameNode REST
> 
>
> Key: HDFS-12654
> URL: https://issues.apache.org/jira/browse/HDFS-12654
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, httpfs, namenode
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 3.0.0-beta1
>Reporter: Andras Czesznak
>Priority: Major
>
> The APPEND REST API call behaves differently in the NameNode REST and the 
> HTTPFS codes. The NameNode version creates the target file the new data being 
> appended to if it does not exist at the time of the call issued. The HTTPFS 
> version assumes the target file exists when APPEND is called and can append 
> only the new data but does not create the target file it doesn't exist.
> The two implementations should be standardized, preferably the HTTPFS version 
> should be modified to execute an implicit CREATE if the target file does not 
> exist.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11600) Refactor TestDFSStripedOutputStreamWithFailure test classes

2018-03-05 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387171#comment-16387171
 ] 

SammiChen commented on HDFS-11600:
--

The output links in last build expired. Try to trigger the build again. 

> Refactor TestDFSStripedOutputStreamWithFailure test classes
> ---
>
> Key: HDFS-11600
> URL: https://issues.apache.org/jira/browse/HDFS-11600
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 3.0.0-alpha2
>Reporter: Andrew Wang
>Priority: Minor
> Attachments: HDFS-11600-1.patch, HDFS-11600.002.patch, 
> HDFS-11600.003.patch
>
>
> TestDFSStripedOutputStreamWithFailure has a great number of subclasses. The 
> tests are parameterized based on the name of these subclasses.
> Seems like we could parameterize these tests with JUnit and then not need all 
> these separate test classes.
> Another note, the tests will randomly return instead of running the test. 
> Using {{Assume}} instead would make it more clear in the test output that 
> these tests were skipped.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11885) createEncryptionZone should not block on initializing EDEK cache

2018-03-01 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383049#comment-16383049
 ] 

SammiChen commented on HDFS-11885:
--

Is it still on target for 2.9.1?  If not, can we push this out from 2.9.1 to 
next release? 

> createEncryptionZone should not block on initializing EDEK cache
> 
>
> Key: HDFS-11885
> URL: https://issues.apache.org/jira/browse/HDFS-11885
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption
>Affects Versions: 2.6.5
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>Priority: Major
> Attachments: HDFS-11885.001.patch, HDFS-11885.002.patch, 
> HDFS-11885.003.patch, HDFS-11885.004.patch
>
>
> When creating an encryption zone, we call {{ensureKeyIsInitialized}}, which 
> calls {{provider.warmUpEncryptedKeys(keyName)}}. This is a blocking call, 
> which attempts to fill the key cache up to the low watermark.
> If the KMS is down or slow, this can take a very long time, and cause the 
> createZone RPC to fail with a timeout.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-12257) Expose getSnapshottableDirListing as a public API in HdfsAdmin

2018-03-01 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383028#comment-16383028
 ] 

SammiChen edited comment on HDFS-12257 at 3/2/18 2:29 AM:
--

Hi  [~HuafengWang],  does this target for 2.9.1?  If not, can we push this out 
to next 2.9.2 release? 


was (Author: sammi):
Hi  [~HuafengWang],  does this target for 2.9.1?  If not, can we push this out 
to next 2.9 release? 

> Expose getSnapshottableDirListing as a public API in HdfsAdmin
> --
>
> Key: HDFS-12257
> URL: https://issues.apache.org/jira/browse/HDFS-12257
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: snapshots
>Affects Versions: 2.6.5
>Reporter: Andrew Wang
>Assignee: Huafeng Wang
>Priority: Major
> Attachments: HDFS-12257.001.patch, HDFS-12257.002.patch, 
> HDFS-12257.003.patch
>
>
> Found at HIVE-16294. We have a CLI API for listing snapshottable dirs, but no 
> programmatic API. Other snapshot APIs are exposed in HdfsAdmin, I think we 
> should expose listing there as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12257) Expose getSnapshottableDirListing as a public API in HdfsAdmin

2018-03-01 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383028#comment-16383028
 ] 

SammiChen commented on HDFS-12257:
--

Hi  [~HuafengWang],  does this target for 2.9.1?  If not, can we push this out 
to next 2.9 release? 

> Expose getSnapshottableDirListing as a public API in HdfsAdmin
> --
>
> Key: HDFS-12257
> URL: https://issues.apache.org/jira/browse/HDFS-12257
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: snapshots
>Affects Versions: 2.6.5
>Reporter: Andrew Wang
>Assignee: Huafeng Wang
>Priority: Major
> Attachments: HDFS-12257.001.patch, HDFS-12257.002.patch, 
> HDFS-12257.003.patch
>
>
> Found at HIVE-16294. We have a CLI API for listing snapshottable dirs, but no 
> programmatic API. Other snapshot APIs are exposed in HdfsAdmin, I think we 
> should expose listing there as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11600) Refactor TestDFSStripedOutputStreamWithFailure test classes

2018-02-11 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16359850#comment-16359850
 ] 

SammiChen commented on HDFS-11600:
--

I uploaded a new patch based on Andrew's 002 patch. The idea is to separate 
{{TestDFSStripedOutputStreamWithFailure}} into 
{{TestDFSStripedOutputStreamWithFailureBase}} which carries common routines and 
variable defines, {{TestDFSStripedOutputStreamWithFailure}} which carries fixed 
parameter test cases, and {{TestDFSStripedOutputStreamWithFailureP}} which 
carries parameterized test case. In {{TestDFSStripedOutputStreamWithFailureP}}, 
I refine the current test case. Each time it will randomly choose 10 file 
length to run the test case. Given that the largest built-in EC policy 
currently support is RS-10-4-1MB, 10 rounds of same test case with random 1 
data node failure seems enough.  [~andrew.wang], would you take a look at the 
new patch at your convenient time?

> Refactor TestDFSStripedOutputStreamWithFailure test classes
> ---
>
> Key: HDFS-11600
> URL: https://issues.apache.org/jira/browse/HDFS-11600
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 3.0.0-alpha2
>Reporter: Andrew Wang
>Priority: Minor
> Attachments: HDFS-11600-1.patch, HDFS-11600.002.patch, 
> HDFS-11600.003.patch
>
>
> TestDFSStripedOutputStreamWithFailure has a great number of subclasses. The 
> tests are parameterized based on the name of these subclasses.
> Seems like we could parameterize these tests with JUnit and then not need all 
> these separate test classes.
> Another note, the tests will randomly return instead of running the test. 
> Using {{Assume}} instead would make it more clear in the test output that 
> these tests were skipped.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11600) Refactor TestDFSStripedOutputStreamWithFailure test classes

2018-02-11 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HDFS-11600:
-
Attachment: HDFS-11600.003.patch

> Refactor TestDFSStripedOutputStreamWithFailure test classes
> ---
>
> Key: HDFS-11600
> URL: https://issues.apache.org/jira/browse/HDFS-11600
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 3.0.0-alpha2
>Reporter: Andrew Wang
>Priority: Minor
> Attachments: HDFS-11600-1.patch, HDFS-11600.002.patch, 
> HDFS-11600.003.patch
>
>
> TestDFSStripedOutputStreamWithFailure has a great number of subclasses. The 
> tests are parameterized based on the name of these subclasses.
> Seems like we could parameterize these tests with JUnit and then not need all 
> these separate test classes.
> Another note, the tests will randomly return instead of running the test. 
> Using {{Assume}} instead would make it more clear in the test output that 
> these tests were skipped.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11600) Refactor TestDFSStripedOutputStreamWithFailure test classes

2018-02-11 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HDFS-11600:
-
Status: Patch Available  (was: Open)

> Refactor TestDFSStripedOutputStreamWithFailure test classes
> ---
>
> Key: HDFS-11600
> URL: https://issues.apache.org/jira/browse/HDFS-11600
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 3.0.0-alpha2
>Reporter: Andrew Wang
>Priority: Minor
> Attachments: HDFS-11600-1.patch, HDFS-11600.002.patch, 
> HDFS-11600.003.patch
>
>
> TestDFSStripedOutputStreamWithFailure has a great number of subclasses. The 
> tests are parameterized based on the name of these subclasses.
> Seems like we could parameterize these tests with JUnit and then not need all 
> these separate test classes.
> Another note, the tests will randomly return instead of running the test. 
> Using {{Assume}} instead would make it more clear in the test output that 
> these tests were skipped.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11600) Refactor TestDFSStripedOutputStreamWithFailure test classes

2018-02-09 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16358077#comment-16358077
 ] 

SammiChen commented on HDFS-11600:
--

Hi [~andrew.wang],  the idea of using JUnit parameterize to really good, it 
helps to cleanup the messy test cases. I took a further look into the 
TestDFSStripedOutputStreamWithFailure. Many test cases are constant, not 
related with parameter. So I think we can further split the 
TestDFSStripedOutputStreamWithFailure into 2 files, one with constant test 
case, another is parameterized. What do you think? By the way, If you don't 
have too much time lately, I can take over it. 

 

> Refactor TestDFSStripedOutputStreamWithFailure test classes
> ---
>
> Key: HDFS-11600
> URL: https://issues.apache.org/jira/browse/HDFS-11600
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 3.0.0-alpha2
>Reporter: Andrew Wang
>Priority: Minor
> Attachments: HDFS-11600-1.patch, HDFS-11600.002.patch
>
>
> TestDFSStripedOutputStreamWithFailure has a great number of subclasses. The 
> tests are parameterized based on the name of these subclasses.
> Seems like we could parameterize these tests with JUnit and then not need all 
> these separate test classes.
> Another note, the tests will randomly return instead of running the test. 
> Using {{Assume}} instead would make it more clear in the test output that 
> these tests were skipped.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12462) Erasure coding policy extra options should be sorted by key value

2018-01-07 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16315629#comment-16315629
 ] 

SammiChen commented on HDFS-12462:
--

Sure. Thanks [~liaoyuxiangqin] for be interested in this task.  Currently,  
each erasure coding policy schema has extra options which is an {{Map}} which doesn't order the element by its keys. When the in memory EC 
policies are saved into fsImage two times. The fsImage part on disk are not 
identical if you compare the first fsImage with the second fsImage one by one 
byte, which will potentially cause problems in some cases. This task is to make 
sure the serialized part of EC policies in fsimage and editlog keep the same no 
matter how many times they are been saved.  

> Erasure coding policy extra options should be sorted by key value
> -
>
> Key: HDFS-12462
> URL: https://issues.apache.org/jira/browse/HDFS-12462
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding
>Reporter: SammiChen
>  Labels: hdfs-ec-3.0-nice-to-have
>
> To make sure the serialized fsimage and editlog binary equal, Erasure coding 
> policy extra options should be sorted by key value.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12860) StripedBlockUtil#getRangesInternalBlocks throws exception for the block group size larger than 2GB

2018-01-04 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16311064#comment-16311064
 ] 

SammiChen commented on HDFS-12860:
--

Just come back from a long vacation. Sorry for the late response. 

Thanks [~eddyxu] for refine the test case. It's more clear and readable now. 
For end-to-end tests, I worried about if it's the only 2GB related bug in EC 
code. Anyway, with current title scope, I'm good with the current test 
coverage.  

My + 1 for the patch. 

> StripedBlockUtil#getRangesInternalBlocks throws exception for the block group 
> size larger than 2GB
> --
>
> Key: HDFS-12860
> URL: https://issues.apache.org/jira/browse/HDFS-12860
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.0.0
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
> Attachments: HDFS-12860.00.patch, HDFS-12860.01.patch
>
>
> Running terasort on a cluster with 8 datanodes, 256GB data, using 
> RS-3-2-1024k.
> The test data was generated by {{teragen}} with 32 mappers.
> The terasort benchmark fails with the following stack trace:
> {code}
> 17/11/27 14:44:31 INFO mapreduce.Job:  map 45% reduce 0%
> 17/11/27 14:44:33 INFO mapreduce.Job: Task Id : 
> attempt_1510080297865_0160_m_08_0, Status : FAILED
> Error: java.lang.IllegalArgumentException
>   at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:72)
>   at 
> org.apache.hadoop.hdfs.util.StripedBlockUtil$VerticalRange.(StripedBlockUtil.java:701)
>   at 
> org.apache.hadoop.hdfs.util.StripedBlockUtil.getRangesForInternalBlocks(StripedBlockUtil.java:442)
>   at 
> org.apache.hadoop.hdfs.util.StripedBlockUtil.divideOneStripe(StripedBlockUtil.java:311)
>   at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.readOneStripe(DFSStripedInputStream.java:308)
>   at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.readWithStrategy(DFSStripedInputStream.java:391)
>   at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:813)
>   at java.io.DataInputStream.read(DataInputStream.java:149)
>   at 
> org.apache.hadoop.examples.terasort.TeraInputFormat$TeraRecordReader.nextKeyValue(TeraInputFormat.java:257)
>   at 
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:562)
>   at 
> org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
>   at 
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12860) StripedBlockUtil#getRangesInternalBlocks throws exception for the block group size larger than 2GB

2017-12-21 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16300987#comment-16300987
 ] 

SammiChen commented on HDFS-12860:
--

Hi [~eddyxu], thanks for report the issue and work on it. 

1. It's great to add error message to provide more information when 
Precondition check fails.  There are "%d" used in String.format and "%s" used 
in Preconditions. Is it because Preconditions doesn't support "%s"? 
2. ")" is missed in {{AlignedStripe.toString}} and {{StripingCell.toString}}
3. Can you add some javadoc or comment in 
{{testDivideOneStripeLargeBlockSize}}?   If we want to test block group larger 
than 2GB, use the RS-6-3-1024k as an example, the {{stripSize}} is 9 * 1MB,  
{{stripesPerBlock}} will be > (2 * 1024) / 9M,  {{blockSize}} is {{cellSize * 
stripesPerBlock}}.Also I would suggest add a end-to-end test case in 
{{TestErasureCodingPolicies}}.

> StripedBlockUtil#getRangesInternalBlocks throws exception for the block group 
> size larger than 2GB
> --
>
> Key: HDFS-12860
> URL: https://issues.apache.org/jira/browse/HDFS-12860
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.0.0
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
> Attachments: HDFS-12860.00.patch
>
>
> Running terasort on a cluster with 8 datanodes, 256GB data, using 
> RS-3-2-1024k.
> The test data was generated by {{teragen}} with 32 mappers.
> The terasort benchmark fails with the following stack trace:
> {code}
> 17/11/27 14:44:31 INFO mapreduce.Job:  map 45% reduce 0%
> 17/11/27 14:44:33 INFO mapreduce.Job: Task Id : 
> attempt_1510080297865_0160_m_08_0, Status : FAILED
> Error: java.lang.IllegalArgumentException
>   at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:72)
>   at 
> org.apache.hadoop.hdfs.util.StripedBlockUtil$VerticalRange.(StripedBlockUtil.java:701)
>   at 
> org.apache.hadoop.hdfs.util.StripedBlockUtil.getRangesForInternalBlocks(StripedBlockUtil.java:442)
>   at 
> org.apache.hadoop.hdfs.util.StripedBlockUtil.divideOneStripe(StripedBlockUtil.java:311)
>   at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.readOneStripe(DFSStripedInputStream.java:308)
>   at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.readWithStrategy(DFSStripedInputStream.java:391)
>   at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:813)
>   at java.io.DataInputStream.read(DataInputStream.java:149)
>   at 
> org.apache.hadoop.examples.terasort.TeraInputFormat$TeraRecordReader.nextKeyValue(TeraInputFormat.java:257)
>   at 
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:562)
>   at 
> org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
>   at 
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12915) Fix findbugs warning in INodeFile$HeaderFormat.getBlockLayoutRedundancy

2017-12-21 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16300908#comment-16300908
 ] 

SammiChen commented on HDFS-12915:
--

bq. IIRC the unit test failed because %02x was printing as a literal, not as a 
2-digit hex string using the passed parameter. Replacing an if/throw statement 
with a call to a third-party library seems unnecessary. If it's not cleaner in 
this case then its appeal, even aesthetically, is limited...

[~chris.douglas],  thanks for the further explanation, I'm clear now.  Also 
thanks for the patch!  I didn't realize it before. 

bq. On a second thought, I think using ecPolicyID alone is sufficient, so that 
we can eliminate blockType as parameter. 
[~eddyxu],  functionally I agree. While I would suggest to keep the 
{{blockType}} for code readability.  "0" {{ecPolicyID}} means continuous file 
may confuse someone if he/she doesn't know the background. 

> Fix findbugs warning in INodeFile$HeaderFormat.getBlockLayoutRedundancy
> ---
>
> Key: HDFS-12915
> URL: https://issues.apache.org/jira/browse/HDFS-12915
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Wei-Chiu Chuang
> Attachments: HDFS-12915.00.patch, HDFS-12915.01.patch
>
>
> It seems HDFS-12840 creates a new findbugs warning.
> Possible null pointer dereference of replication in 
> org.apache.hadoop.hdfs.server.namenode.INodeFile$HeaderFormat.getBlockLayoutRedundancy(BlockType,
>  Short, Byte)
> Bug type NP_NULL_ON_SOME_PATH (click for details) 
> In class org.apache.hadoop.hdfs.server.namenode.INodeFile$HeaderFormat
> In method 
> org.apache.hadoop.hdfs.server.namenode.INodeFile$HeaderFormat.getBlockLayoutRedundancy(BlockType,
>  Short, Byte)
> Value loaded from replication
> Dereferenced at INodeFile.java:[line 210]
> Known null at INodeFile.java:[line 207]
> From a quick look at the patch, it seems bogus though. [~eddyxu][~Sammi] 
> would you please double check?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12915) Fix findbugs warning in INodeFile$HeaderFormat.getBlockLayoutRedundancy

2017-12-17 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16294575#comment-16294575
 ] 

SammiChen commented on HDFS-12915:
--

Thanks [~jojochuang] for reporting and working on it.  Back to the review 
period of HDFS-12840, I do double checked the findbugs report and find nothing 
suspicious. It's a false alert. I think Eddy did the same check. 

I dived a little deeper this time.  
This piece of code which will trigger findbugs to alert warning. 
{noformat}
   Preconditions.checkArgument(replication != null && replication >= 0 && 
replication <= MAX_REDUNDANCY,
"Invalid replication value " + replication);
{noformat}

This piece of code will not trigger warning. 
{noformat}
 Preconditions.checkArgument(replication != null && replication >= 0 && 
replication <= MAX_REDUNDANCY,
"Invalid replication value " + replication);
{noformat}

The only difference is the condition clause is in one line in the second code. 
When the condition clause is separated into two lines, findbugs cannot handle 
the case correctly, will trigger the false alert. 

And [~chris.douglas], Preconditions does support formatted strings. 
So basically I think Preconditions is very useful and neat to check parameters. 
But if the check condition is complex, to not trigger the annoying findbugs 
warning,  a traditional {{if()}} plus {{throw}} statement seems more fit.

By the way, the last update of the findbugs web site is Mar. 2015. Seems it's 
lack of maintenance these days.

For the patch, the following statement is not appropriate.  If 
{{erasureCodingPolicyID}} is null and {{blockType}} is stripped, it should 
throw exception.  {{REPLICATION_POLICY_ID}} is a special EC policy. It 
represents the "3 replica" scheme. File with this policy is effectively 3 
replica file, not EC file. 

{noformat}
  if (null == erasureCodingPolicyID) {
erasureCodingPolicyID = REPLICATION_POLICY_ID;
  }
{noformat}


 





> Fix findbugs warning in INodeFile$HeaderFormat.getBlockLayoutRedundancy
> ---
>
> Key: HDFS-12915
> URL: https://issues.apache.org/jira/browse/HDFS-12915
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Wei-Chiu Chuang
> Attachments: HDFS-12915.00.patch, HDFS-12915.01.patch
>
>
> It seems HDFS-12840 creates a new findbugs warning.
> Possible null pointer dereference of replication in 
> org.apache.hadoop.hdfs.server.namenode.INodeFile$HeaderFormat.getBlockLayoutRedundancy(BlockType,
>  Short, Byte)
> Bug type NP_NULL_ON_SOME_PATH (click for details) 
> In class org.apache.hadoop.hdfs.server.namenode.INodeFile$HeaderFormat
> In method 
> org.apache.hadoop.hdfs.server.namenode.INodeFile$HeaderFormat.getBlockLayoutRedundancy(BlockType,
>  Short, Byte)
> Value loaded from replication
> Dereferenced at INodeFile.java:[line 210]
> Known null at INodeFile.java:[line 207]
> From a quick look at the patch, it seems bogus though. [~eddyxu][~Sammi] 
> would you please double check?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12840) Creating a file with non-default EC policy in a EC zone is not correctly serialized in the editlog

2017-12-06 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16281379#comment-16281379
 ] 

SammiChen commented on HDFS-12840:
--

Thanks [~eddyxu] for the contribution!  The latest patch LGTM and +1.  Please 
double check the style issues before check-in. 

> Creating a file with non-default EC policy in a EC zone is not correctly 
> serialized in the editlog
> --
>
> Key: HDFS-12840
> URL: https://issues.apache.org/jira/browse/HDFS-12840
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.0.0-beta1
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
>Priority: Blocker
>  Labels: hdfs-ec-3.0-must-do
> Attachments: HDFS-12840.00.patch, HDFS-12840.01.patch, 
> HDFS-12840.02.patch, HDFS-12840.03.patch, HDFS-12840.04.patch, 
> HDFS-12840.05.patch, HDFS-12840.reprod.patch, editsStored, editsStored, 
> editsStored.03, editsStored.05
>
>
> When create a replicated file in an existing EC zone, the edit logs does not 
> differentiate it from an EC file. When {{FSEditLogLoader}} to replay edits, 
> this file is treated as EC file, as a results, it crashes the NN because the 
> blocks of this file are replicated, which does not match with {{INode}}.
> {noformat}
> ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered 
> exception on operation AddBlockOp [path=/system/balancer.id, 
> penultimateBlock=NULL, lastBlock=blk_1073743259_2455, RpcClientId=, 
> RpcCallId=-2]
> java.lang.IllegalArgumentException: reportedBlock is not striped
>   at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoStriped.addStorage(BlockInfoStriped.java:118)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.addBlock(DatanodeStorageInfo.java:256)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addStoredBlock(BlockManager.java:3141)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addStoredBlockUnderConstruction(BlockManager.java:3068)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processAndHandleReportedBlock(BlockManager.java:3864)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processQueuedMessages(BlockManager.java:2916)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processQueuedMessagesForBlock(BlockManager.java:2903)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.addNewBlock(FSEditLogLoader.java:1069)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:532)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:882)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:863)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:293)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:380)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:397)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12896) when set replicate EC policy for a directory or file,it's EC policy cannot be querying by getPolicy command.

2017-12-06 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16281229#comment-16281229
 ] 

SammiChen commented on HDFS-12896:
--

Hi [~candychencan],  HDFS-12308 is about to provide a new API other than 
{{getErasureCodingPolicy}} to return the effective EC policy. 

> when set replicate EC policy for a directory or file,it's EC policy cannot be 
> querying by getPolicy command.
> 
>
> Key: HDFS-12896
> URL: https://issues.apache.org/jira/browse/HDFS-12896
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: chencan
>
> When i set replicate EC policy for ecDir,then query it by getPolicy,it return 
> ‘The erasure coding policy of /ecDir is unspecified', as follow.
> [root@master bin]# hdfs dfs -mkdir /ecDir
> [root@master bin]# hdfs ec -setPolicy -path /ecDir -replicate
> Set erasure coding policy replication on /ecDir
> [root@master bin]# hdfs ec -getPolicy -path /ecDir
> The erasure coding policy of /ecDir is unspecified



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12896) when set replicate EC policy for a directory or file,it's EC policy cannot be querying by getPolicy command.

2017-12-05 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16279723#comment-16279723
 ] 

SammiChen commented on HDFS-12896:
--

Hi [~candychencan], sure, welcome to contribute to the community!  Besides, can 
you resolve this JIRA? 

> when set replicate EC policy for a directory or file,it's EC policy cannot be 
> querying by getPolicy command.
> 
>
> Key: HDFS-12896
> URL: https://issues.apache.org/jira/browse/HDFS-12896
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: chencan
>
> When i set replicate EC policy for ecDir,then query it by getPolicy,it return 
> ‘The erasure coding policy of /ecDir is unspecified', as follow.
> [root@master bin]# hdfs dfs -mkdir /ecDir
> [root@master bin]# hdfs ec -setPolicy -path /ecDir -replicate
> Set erasure coding policy replication on /ecDir
> [root@master bin]# hdfs ec -getPolicy -path /ecDir
> The erasure coding policy of /ecDir is unspecified



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12840) Creating a file with non-default EC policy in a EC zone is not correctly serialized in the editlog

2017-12-05 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16279711#comment-16279711
 ] 

SammiChen commented on HDFS-12840:
--

Hi [~eddyxu],   TestOfflineEditsViewer with editsStored.03 can be passed after 
I run each test function manually. 

bq. So that it can handle the editslog which has no ERASURE_CODING_POLICY_ID 
field.

Not quite understand the proposal. For policy ID, currently 1~63 is used for 
system built-in policy.  64~127 is allocated for user defined policy.  0 is not 
used now. 
Let's focus on the fix itself and get it in timely.  We can discuss later about 
the desired replication policy ID. 




> Creating a file with non-default EC policy in a EC zone is not correctly 
> serialized in the editlog
> --
>
> Key: HDFS-12840
> URL: https://issues.apache.org/jira/browse/HDFS-12840
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.0.0-beta1
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
>Priority: Blocker
>  Labels: hdfs-ec-3.0-must-do
> Attachments: HDFS-12840.00.patch, HDFS-12840.01.patch, 
> HDFS-12840.02.patch, HDFS-12840.03.patch, HDFS-12840.04.patch, 
> HDFS-12840.reprod.patch, editsStored, editsStored, editsStored.03
>
>
> When create a replicated file in an existing EC zone, the edit logs does not 
> differentiate it from an EC file. When {{FSEditLogLoader}} to replay edits, 
> this file is treated as EC file, as a results, it crashes the NN because the 
> blocks of this file are replicated, which does not match with {{INode}}.
> {noformat}
> ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered 
> exception on operation AddBlockOp [path=/system/balancer.id, 
> penultimateBlock=NULL, lastBlock=blk_1073743259_2455, RpcClientId=, 
> RpcCallId=-2]
> java.lang.IllegalArgumentException: reportedBlock is not striped
>   at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoStriped.addStorage(BlockInfoStriped.java:118)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.addBlock(DatanodeStorageInfo.java:256)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addStoredBlock(BlockManager.java:3141)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addStoredBlockUnderConstruction(BlockManager.java:3068)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processAndHandleReportedBlock(BlockManager.java:3864)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processQueuedMessages(BlockManager.java:2916)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processQueuedMessagesForBlock(BlockManager.java:2903)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.addNewBlock(FSEditLogLoader.java:1069)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:532)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:882)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:863)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:293)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:380)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:397)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12896) when set replicate EC policy for a directory or file,it's EC policy cannot be querying by getPolicy command.

2017-12-05 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16279618#comment-16279618
 ] 

SammiChen commented on HDFS-12896:
--

Hi [~candychencan], thanks for reporting this!  Currently it's the designed 
behavior to not return the special replicate EC policy when query.  HDFS-12308 
is tracked to implement the function to return effective EC policy. 

> when set replicate EC policy for a directory or file,it's EC policy cannot be 
> querying by getPolicy command.
> 
>
> Key: HDFS-12896
> URL: https://issues.apache.org/jira/browse/HDFS-12896
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: chencan
>
> When i set replicate EC policy for ecDir,then query it by getPolicy,it return 
> ‘The erasure coding policy of /ecDir is unspecified', as follow.
> [root@master bin]# hdfs dfs -mkdir /ecDir
> [root@master bin]# hdfs ec -setPolicy -path /ecDir -replicate
> Set erasure coding policy replication on /ecDir
> [root@master bin]# hdfs ec -getPolicy -path /ecDir
> The erasure coding policy of /ecDir is unspecified



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12840) Creating a file with non-default EC policy in a EC zone is not correctly serialized in the editlog

2017-12-04 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16278150#comment-16278150
 ] 

SammiChen commented on HDFS-12840:
--

Thanks [~eddyxu] !  The latest patch looks overall good.  

1. {{addFileForEditLog}} in {{FsDirWriteFileOp}}
   bq.  ErasureCodingPolicy ecPolicy = null;
  the variable declaration can be in scope of {{isStriped}}

2. TestOfflineEditsViewer fails locally with editsStored.03 

The current solution will append a "ERASURE_CODING_POLICY_ID" with value "63" 
to each "OP_ADD" operation. do you think a "0" value for the "replication 
policy ID" is more appropriate given this case? 




> Creating a file with non-default EC policy in a EC zone is not correctly 
> serialized in the editlog
> --
>
> Key: HDFS-12840
> URL: https://issues.apache.org/jira/browse/HDFS-12840
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.0.0-beta1
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
>Priority: Blocker
>  Labels: hdfs-ec-3.0-must-do
> Attachments: HDFS-12840.00.patch, HDFS-12840.01.patch, 
> HDFS-12840.02.patch, HDFS-12840.03.patch, HDFS-12840.04.patch, 
> HDFS-12840.reprod.patch, editsStored, editsStored, editsStored.03
>
>
> When create a replicated file in an existing EC zone, the edit logs does not 
> differentiate it from an EC file. When {{FSEditLogLoader}} to replay edits, 
> this file is treated as EC file, as a results, it crashes the NN because the 
> blocks of this file are replicated, which does not match with {{INode}}.
> {noformat}
> ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered 
> exception on operation AddBlockOp [path=/system/balancer.id, 
> penultimateBlock=NULL, lastBlock=blk_1073743259_2455, RpcClientId=, 
> RpcCallId=-2]
> java.lang.IllegalArgumentException: reportedBlock is not striped
>   at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoStriped.addStorage(BlockInfoStriped.java:118)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.addBlock(DatanodeStorageInfo.java:256)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addStoredBlock(BlockManager.java:3141)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addStoredBlockUnderConstruction(BlockManager.java:3068)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processAndHandleReportedBlock(BlockManager.java:3864)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processQueuedMessages(BlockManager.java:2916)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processQueuedMessagesForBlock(BlockManager.java:2903)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.addNewBlock(FSEditLogLoader.java:1069)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:532)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:882)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:863)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:293)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:380)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:397)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12840) Creating a replicated file in a EC zone does not correctly serialized in EditLogs

2017-11-29 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16272270#comment-16272270
 ] 

SammiChen commented on HDFS-12840:
--

Hi Eddy, thanks for working on it. 

some comments here, 
1. {{REPLICATION_POLICY_ID}} is defined in {{ErasureCodeConstants}} already 
with value 63.  Suggest reuse it. 
2. {{TestRetryCacheWithHA}},  40 instead of 41.
 bq. assertEquals("Retry cache size is wrong", 41, cacheSet.size());



> Creating a replicated file in a EC zone does not correctly serialized in 
> EditLogs
> -
>
> Key: HDFS-12840
> URL: https://issues.apache.org/jira/browse/HDFS-12840
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.0.0-beta1
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
>Priority: Blocker
>  Labels: hdfs-ec-3.0-must-do
> Attachments: HDFS-12840.00.patch, HDFS-12840.01.patch, 
> HDFS-12840.02.patch, HDFS-12840.reprod.patch, editsStored, editsStored
>
>
> When create a replicated file in an existing EC zone, the edit logs does not 
> differentiate it from an EC file. When {{FSEditLogLoader}} to replay edits, 
> this file is treated as EC file, as a results, it crashes the NN because the 
> blocks of this file are replicated, which does not match with {{INode}}.
> {noformat}
> ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered 
> exception on operation AddBlockOp [path=/system/balancer.id, 
> penultimateBlock=NULL, lastBlock=blk_1073743259_2455, RpcClientId=, 
> RpcCallId=-2]
> java.lang.IllegalArgumentException: reportedBlock is not striped
>   at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoStriped.addStorage(BlockInfoStriped.java:118)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.addBlock(DatanodeStorageInfo.java:256)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addStoredBlock(BlockManager.java:3141)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addStoredBlockUnderConstruction(BlockManager.java:3068)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processAndHandleReportedBlock(BlockManager.java:3864)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processQueuedMessages(BlockManager.java:2916)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processQueuedMessagesForBlock(BlockManager.java:2903)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.addNewBlock(FSEditLogLoader.java:1069)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:532)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:882)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:863)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:293)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:380)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:397)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12840) Creating a replicated file in a EC zone does not correctly serialized in EditLogs

2017-11-23 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16264935#comment-16264935
 ] 

SammiChen commented on HDFS-12840:
--

Hi [~eddyxu], thanks for reporting and fix this.  I'm not able to apply the 
01.patch locally against latest trunk code while 00.patch is OK to apply.  Can 
you double check if the patch format is correct? 

> Creating a replicated file in a EC zone does not correctly serialized in 
> EditLogs
> -
>
> Key: HDFS-12840
> URL: https://issues.apache.org/jira/browse/HDFS-12840
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.0.0-beta1
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
>Priority: Blocker
>  Labels: hdfs-ec-3.0-must-do
> Attachments: HDFS-12840.00.patch, HDFS-12840.01.patch, 
> HDFS-12840.reprod.patch, editsStored
>
>
> When create a replicated file in an existing EC zone, the edit logs does not 
> differentiate it from an EC file. When {{FSEditLogLoader}} to replay edits, 
> this file is treated as EC file, as a results, it crashes the NN because the 
> blocks of this file are replicated, which does not match with {{INode}}.
> {noformat}
> ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered 
> exception on operation AddBlockOp [path=/system/balancer.id, 
> penultimateBlock=NULL, lastBlock=blk_1073743259_2455, RpcClientId=, 
> RpcCallId=-2]
> java.lang.IllegalArgumentException: reportedBlock is not striped
>   at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoStriped.addStorage(BlockInfoStriped.java:118)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.addBlock(DatanodeStorageInfo.java:256)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addStoredBlock(BlockManager.java:3141)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addStoredBlockUnderConstruction(BlockManager.java:3068)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processAndHandleReportedBlock(BlockManager.java:3864)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processQueuedMessages(BlockManager.java:2916)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processQueuedMessagesForBlock(BlockManager.java:2903)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.addNewBlock(FSEditLogLoader.java:1069)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:532)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:882)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:863)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:293)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:380)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:397)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12682) ECAdmin -listPolicies will always show SystemErasureCodingPolicies state as DISABLED

2017-10-31 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16226598#comment-16226598
 ] 

SammiChen commented on HDFS-12682:
--

HI [~xiaochen], thanks for the update. The patch looks overall good.  

Minor nits:

 {{toStringWithState}} in {{ErasureCodingPolicy}} is duplicate with 
{{toString}} and is not used. 



I'm thinking besides {{getErasureCodingPolicies}},  do we also need to return 
{{ErasureCodingPolicyInfo}} in the response of {{addErasureCodingPolicies}}?  
From API's semantics, return {{ErasureCodingPolicyInfo}} seems more fit.  But 
I'm wonder would that provide more benefit to end users for this API, maybe 
current {{ErasureCodingPolicy}} is already enough. 



> ECAdmin -listPolicies will always show SystemErasureCodingPolicies state as 
> DISABLED
> 
>
> Key: HDFS-12682
> URL: https://issues.apache.org/jira/browse/HDFS-12682
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Blocker
>  Labels: hdfs-ec-3.0-must-do
> Attachments: HDFS-12682.01.patch, HDFS-12682.02.patch, 
> HDFS-12682.03.patch, HDFS-12682.04.patch, HDFS-12682.05.patch
>
>
> On a real cluster, {{hdfs ec -listPolicies}} will always show policy state as 
> DISABLED.
> {noformat}
> [hdfs@nightly6x-1 root]$ hdfs ec -listPolicies
> Erasure Coding Policies:
> ErasureCodingPolicy=[Name=RS-10-4-1024k, Schema=[ECSchema=[Codec=rs, 
> numDataUnits=10, numParityUnits=4]], CellSize=1048576, Id=5, State=DISABLED]
> ErasureCodingPolicy=[Name=RS-3-2-1024k, Schema=[ECSchema=[Codec=rs, 
> numDataUnits=3, numParityUnits=2]], CellSize=1048576, Id=2, State=DISABLED]
> ErasureCodingPolicy=[Name=RS-6-3-1024k, Schema=[ECSchema=[Codec=rs, 
> numDataUnits=6, numParityUnits=3]], CellSize=1048576, Id=1, State=DISABLED]
> ErasureCodingPolicy=[Name=RS-LEGACY-6-3-1024k, 
> Schema=[ECSchema=[Codec=rs-legacy, numDataUnits=6, numParityUnits=3]], 
> CellSize=1048576, Id=3, State=DISABLED]
> ErasureCodingPolicy=[Name=XOR-2-1-1024k, Schema=[ECSchema=[Codec=xor, 
> numDataUnits=2, numParityUnits=1]], CellSize=1048576, Id=4, State=DISABLED]
> [hdfs@nightly6x-1 root]$ hdfs ec -getPolicy -path /ecec
> XOR-2-1-1024k
> {noformat}
> This is because when [deserializing 
> protobuf|https://github.com/apache/hadoop/blob/branch-3.0.0-beta1/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java#L2942],
>  the static instance of [SystemErasureCodingPolicies 
> class|https://github.com/apache/hadoop/blob/branch-3.0.0-beta1/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/SystemErasureCodingPolicies.java#L101]
>  is first checked, and always returns the cached policy objects, which are 
> created by default with state=DISABLED.
> All the existing unit tests pass, because that static instance that the 
> client (e.g. ECAdmin) reads in unit test is updated by NN. :)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12686) Erasure coding system policy state is not correctly saved and loaded during real cluster restart

2017-10-24 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16217973#comment-16217973
 ] 

SammiChen commented on HDFS-12686:
--

Hi [~jojochuang], this JIRA is closely related with HDFS-12682. I was plan to 
work on it after HDFS-12682 is committed.  Thanks [~xiaochen] for taking care 
it together in HDFS-12682. 

> Erasure coding system policy state is not correctly saved and loaded during 
> real cluster restart
> 
>
> Key: HDFS-12686
> URL: https://issues.apache.org/jira/browse/HDFS-12686
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0-beta1
>Reporter: SammiChen
>Assignee: SammiChen
>Priority: Blocker
>  Labels: hdfs-ec-3.0-must-do
>
> Inspired by HDFS-12682,  I found the system erasure coding policy state will  
> not  be correctly saved and loaded in a real cluster.  Through there are such 
> kind of unit tests and all are passed with MiniCluster. It's because the 
> MiniCluster keeps the same static system erasure coding policy object after 
> the NN restart operation. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-12046) Hadoop CRC implementation using Intel ISA-L library

2017-10-24 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen reassigned HDFS-12046:


Assignee: SammiChen  (was: luhuichun)

> Hadoop CRC implementation using Intel ISA-L library
> ---
>
> Key: HDFS-12046
> URL: https://issues.apache.org/jira/browse/HDFS-12046
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: luhuichun
>Assignee: SammiChen
> Attachments: HDFS-12046-001.patch, ISA-L CRC Performance Report using 
> intel ISA-L.pdf
>
>
> Intel ISA-L open source library provides set of highly optimized functions 
> for RAID, erasure code, CRC, cryptographic hash, encryption, and compression. 
> Ref. https://github.com/01org/isa-l. HDFS-EC has already integrated ISA-L and 
> added the necessary building options support for Hadoop. For Hadoop CRC, we 
> recently explored more, developing a Hadoop CRC using Intel ISA-L, performing 
> a test on Broadwell and Skylake servers, comparing the performance against 
> Hadoop native CRC. On Broadwell/Skylake, ISA-L CRC has about 8%~ performance 
> gain over Hadoop native CRC. We suggest adding a new Hadoop native CRC using 
> the ISA-L library, the extra advantage is it’s already optimized when we 
> upgrade to new servers and Hadoop developers don’t have to maintain their own 
> bunch of ASM codes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12682) ECAdmin -listPolicies will always show policy state as DISABLED

2017-10-19 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210790#comment-16210790
 ] 

SammiChen commented on HDFS-12682:
--

Hi [~xiaochen],  thanks for reporting this issue. Inspired by your discovery, I 
found the same issue exists in system EC persist into and load from fsImage 
(HFDS-12686).  The current convertErasureCodingPolicy function is perfect in 
most cases. For special cases, like get all erasure coding policy and persist 
policy into fsImage, I think we need a new edition for full convert. 

{quote}
The problem I see from HDFS-12258's implementation though, is the mutable ECPS 
is saved on the immutable ECP, breaking assumptions such as shared single 
instance policy. At the same time the policy is still not persisted 
independently. I think ECPS is highly dependent on the missing piece from 
HDFS-7337: policies are not persisted to NN metadata. The state of whether a 
policy is enabled could be persisted together with the policy, without 
impacting HDFSFileStatus.
{quote}
Persist ec policies is implemented in HDFS-7337. 

{quote}
I think this bug (HDFS-12682) and HDFS-12258 would make more sense if we could 
first persist policies to NN metadata. Would also be helpful to separate out 
something like ErasureCodingPolicyAndState for the policy-specific APIs, so the 
state isn't deserialized onto HDFSFileStatus.
{quote}
For HDFS-12258, [~zhouwei],  [~drankye] and I, we discussed and do have two 
different approaches when we first think about how to implement it. One is the 
current implemented approach, which add one extra "state" field in the existing 
ECP definition. Another is define a new class, something like 
{{ErasureCodingPolicyWithState}} to hold the EPC and new policy state field.  
They are almost equally good.  The only concern is if we introduce the new 
{{ErasureCodingPolicyWithState}}, it may introduce complexity to API 
interfaces, and to end users. There are multiple EC related APIs.  If we return 
 {{ErasureCodingPolicyWithState}} for {{getAllErasureCodingPolicies}} , should 
we return {{ErasureCodingPolicyWithState}} or {{ErasureCodingPolicy}} for 
{{getErasureCodingPolicy}}? something like that. Also is it worth to introduce 
a new class definition in Hadoop which only has 1 extra new field?   After all 
the considerations, the current approach is chosen to leverage the existing 
ECP. 

Please let me know if you have other concerns.  Thanks!

> ECAdmin -listPolicies will always show policy state as DISABLED
> ---
>
> Key: HDFS-12682
> URL: https://issues.apache.org/jira/browse/HDFS-12682
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>  Labels: hdfs-ec-3.0-must-do
>
> On a real cluster, {{hdfs ec -listPolicies}} will always show policy state as 
> DISABLED.
> {noformat}
> [hdfs@nightly6x-1 root]$ hdfs ec -listPolicies
> Erasure Coding Policies:
> ErasureCodingPolicy=[Name=RS-10-4-1024k, Schema=[ECSchema=[Codec=rs, 
> numDataUnits=10, numParityUnits=4]], CellSize=1048576, Id=5, State=DISABLED]
> ErasureCodingPolicy=[Name=RS-3-2-1024k, Schema=[ECSchema=[Codec=rs, 
> numDataUnits=3, numParityUnits=2]], CellSize=1048576, Id=2, State=DISABLED]
> ErasureCodingPolicy=[Name=RS-6-3-1024k, Schema=[ECSchema=[Codec=rs, 
> numDataUnits=6, numParityUnits=3]], CellSize=1048576, Id=1, State=DISABLED]
> ErasureCodingPolicy=[Name=RS-LEGACY-6-3-1024k, 
> Schema=[ECSchema=[Codec=rs-legacy, numDataUnits=6, numParityUnits=3]], 
> CellSize=1048576, Id=3, State=DISABLED]
> ErasureCodingPolicy=[Name=XOR-2-1-1024k, Schema=[ECSchema=[Codec=xor, 
> numDataUnits=2, numParityUnits=1]], CellSize=1048576, Id=4, State=DISABLED]
> [hdfs@nightly6x-1 root]$ hdfs ec -getPolicy -path /ecec
> XOR-2-1-1024k
> {noformat}
> This is because when [deserializing 
> protobuf|https://github.com/apache/hadoop/blob/branch-3.0.0-beta1/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java#L2942],
>  the static instance of [SystemErasureCodingPolicies 
> class|https://github.com/apache/hadoop/blob/branch-3.0.0-beta1/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/SystemErasureCodingPolicies.java#L101]
>  is first checked, and always returns the cached policy objects, which are 
> created by default with state=DISABLED.
> All the existing unit tests pass, because that static instance that the 
> client (e.g. ECAdmin) reads in unit test is updated by NN. :)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: 

[jira] [Created] (HDFS-12686) Erasure coding system policy state is not correctly saved and loaded during real cluster restart

2017-10-19 Thread SammiChen (JIRA)
SammiChen created HDFS-12686:


 Summary: Erasure coding system policy state is not correctly saved 
and loaded during real cluster restart
 Key: HDFS-12686
 URL: https://issues.apache.org/jira/browse/HDFS-12686
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0-beta1
Reporter: SammiChen
Assignee: SammiChen


Inspired by HDFS-12682,  I found the system erasure coding policy state will  
not  be correctly saved and loaded in a real cluster.  Through there are such 
kind of unit tests and all are passed with MiniCluster. It's because the 
MiniCluster keeps the same static system erasure coding policy object after the 
NN restart operation. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-7337) Configurable and pluggable erasure codec and policy

2017-10-19 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210640#comment-16210640
 ] 

SammiChen commented on HDFS-7337:
-

Hi [~xiaochen], HDFS-7859 is for persist EC policy in protobuffer fsImage, and 
HDFS-12395 is for support EC policy API in edit log.

Thanks [~rakeshr]!

> Configurable and pluggable erasure codec and policy
> ---
>
> Key: HDFS-7337
> URL: https://issues.apache.org/jira/browse/HDFS-7337
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: erasure-coding
>Reporter: Zhe Zhang
>Assignee: SammiChen
>Priority: Critical
>  Labels: hdfs-ec-3.0-nice-to-have
> Fix For: 3.0.0-beta1
>
> Attachments: HDFS-7337-prototype-v1.patch, 
> HDFS-7337-prototype-v2.zip, HDFS-7337-prototype-v3.zip, PluggableErasureCodec 
> v4.pdf, PluggableErasureCodec-v2.pdf, PluggableErasureCodec-v3.pdf, 
> PluggableErasureCodec.pdf
>
>
> According to HDFS-7285 and the design, this considers to support multiple 
> Erasure Codecs via pluggable approach. It allows to define and configure 
> multiple codec schemas with different coding algorithms and parameters. The 
> resultant codec schemas can be utilized and specified via command tool for 
> different file folders. While design and implement such pluggable framework, 
> it’s also to implement a concrete codec by default (Reed Solomon) to prove 
> the framework is useful and workable. Separate JIRA could be opened for the 
> RS codec implementation.
> Note HDFS-7353 will focus on the very low level codec API and implementation 
> to make concrete vendor libraries transparent to the upper layer. This JIRA 
> focuses on high level stuffs that interact with configuration, schema and etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12613) Native EC coder should implement release() as idempotent function.

2017-10-16 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16206917#comment-16206917
 ] 

SammiChen commented on HDFS-12613:
--

HDFS-12672 is fired for track. Will prepare a Window environment later to 
verify it. 

> Native EC coder should implement release() as idempotent function.
> --
>
> Key: HDFS-12613
> URL: https://issues.apache.org/jira/browse/HDFS-12613
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding
>Affects Versions: 3.0.0-beta1
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
> Attachments: HDFS-12613.00.patch, HDFS-12613.01.patch, 
> HDFS-12613.02.patch, HDFS-12613.03.patch, HDFS-12613.04.patch
>
>
> Recently, we found native EC coder crashes JVM because 
> {{NativeRSDecoder#release()}} being called multiple times (HDFS-12612 and 
> HDFS-12606). 
> We should strength the implement the native code to make {{release()}} 
> idempotent  as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-12672) Verify erasure coding native code on Windows platform

2017-10-16 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen reassigned HDFS-12672:


Assignee: SammiChen

> Verify erasure coding native code on Windows platform
> -
>
> Key: HDFS-12672
> URL: https://issues.apache.org/jira/browse/HDFS-12672
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: SammiChen
>Assignee: SammiChen
>  Labels: hdfs-ec-3.0-nice-to-have
>
> Recently there is some change in erasure coding native code to fix some known 
> issues.  It's better to verify the code change on Window platform also. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-12672) Verify erasure coding native code on Windows platform

2017-10-16 Thread SammiChen (JIRA)
SammiChen created HDFS-12672:


 Summary: Verify erasure coding native code on Windows platform
 Key: HDFS-12672
 URL: https://issues.apache.org/jira/browse/HDFS-12672
 Project: Hadoop HDFS
  Issue Type: Task
Reporter: SammiChen


Recently there is some change in erasure coding native code to fix some known 
issues.  It's better to verify the code change on Window platform also. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11467) Support ErasureCoding section in OIV XML/ReverseXML

2017-10-15 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16205451#comment-16205451
 ] 

SammiChen commented on HDFS-11467:
--

Thanks [~HuafengWang] for working on it! 

1. Suggest move the new private  {{convertErasureCodingPolicy}} function to 
{{PBHelperClient}}, and rename it to distinguish with the current 
{{convertErasureCodingPolicy}} function. 
2.  A schema is a must-have to a policy, so would handle the case when schema 
is null. Can use Preconditions to check it's null or not.  Also should handle 
the case when there is no policy found in the section. 
3. If "extraOptions" is not NULL,  persist it. 
4. Extra unit test is preferred. 

> Support ErasureCoding section in OIV XML/ReverseXML
> ---
>
> Key: HDFS-11467
> URL: https://issues.apache.org/jira/browse/HDFS-11467
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: tools
>Affects Versions: 3.0.0-alpha4
>Reporter: Wei-Chiu Chuang
>Assignee: Huafeng Wang
>  Labels: hdfs-ec-3.0-must-do
> Attachments: HDFS-11467.001.patch
>
>
> As discussed in HDFS-7859, after ErasureCoding section is added into fsimage, 
> we would like to also support exporting this section into an XML back and 
> forth using the OIV tool.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12575) Improve test coverage for EC related edit logs ops

2017-10-12 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16201668#comment-16201668
 ] 

SammiChen commented on HDFS-12575:
--

Hi [~eddyxu], sure, I will work on it. I'm not clear about the detail steps to 
carry out the "Replay edits after checkpoint" and "Apply edits on SNN". Can you 
help to give some hit?  Also by SNN, you mean both secondary namenode and 
standby namenode, right? 

> Improve test coverage for EC related edit logs ops
> --
>
> Key: HDFS-12575
> URL: https://issues.apache.org/jira/browse/HDFS-12575
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding
>Affects Versions: 3.0.0-alpha4
>Reporter: Lei (Eddy) Xu
>Assignee: SammiChen
>  Labels: hdfs-ec-3.0-nice-to-have
>
> HDFS-12569 found that we have little test coverage for edit logs ops of 
> erasure coding.
> And we've seen the following bug bring down SNN in our test environments:
> {code}
> 6:42:18.177 AMERROR   FSEditLogLoader 
> Encountered exception on operation AddBlockOp [path=/tmp/foo/bar, 
> penultimateBlock=NULL, lastBlock=blk_1073743386_69322, RpcClientId=, 
> RpcCallId=-2]
> java.lang.IllegalArgumentException: reportedBlock is not striped
>   at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
>   at 
> 
> 6:42:18.190 AMFATAL   EditLogTailer   
> Unknown error encountered while tailing edits. Shutting down standby NN.
> java.io.IOException: java.lang.IllegalArgumentException: reportedBlock is not 
> striped
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:251)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:150)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:882)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:863)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:293)
>   at 
> {code}
> We should add coverage for these important edit logs, i.e., set/unset policy, 
> enable/remove policies and etc are correctly persisted in edit logs, and test 
> the scenarios like:
> * Restart NN
> * Replay edits after checkpoint
> * Apply edits on SNN.
> * and etc. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12613) Native EC coder should implement release() as idempotent function.

2017-10-11 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16201395#comment-16201395
 ] 

SammiChen commented on HDFS-12613:
--

Hi [~eddyxu], agree. Check NULL pointer in native code is a must-have. Check 
NULL pointer at JAVA level is a nice-to-have to avoid one JNI call. 

> Native EC coder should implement release() as idempotent function.
> --
>
> Key: HDFS-12613
> URL: https://issues.apache.org/jira/browse/HDFS-12613
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding
>Affects Versions: 3.0.0-beta1
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
> Attachments: HDFS-12613.00.patch, HDFS-12613.01.patch, 
> HDFS-12613.02.patch
>
>
> Recently, we found native EC coder crashes JVM because 
> {{NativeRSDecoder#release()}} being called multiple times (HDFS-12612 and 
> HDFS-12606). 
> We should strength the implement the native code to make {{release()}} 
> idempotent  as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12613) Native EC coder should implement release() as idempotent function.

2017-10-11 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16199928#comment-16199928
 ] 

SammiChen commented on HDFS-12613:
--

Hi [~eddyxu], thanks for reporting and working on it.  

1. Apart from add {{synchronized}} on {{release}},  {{performEncodeImpl}} and 
{{performDecodeImpl}} can also have the {{synchronized}} keyword
2. I see you add the NULL check of {{nativeCoder}} in native code.  We can also 
check it's status in JAVA code. If it's already null, we don't need to call the 
native code through JNI.


> Native EC coder should implement release() as idempotent function.
> --
>
> Key: HDFS-12613
> URL: https://issues.apache.org/jira/browse/HDFS-12613
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding
>Affects Versions: 3.0.0-beta1
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
> Attachments: HDFS-12613.00.patch, HDFS-12613.01.patch
>
>
> Recently, we found native EC coder crashes JVM because 
> {{NativeRSDecoder#release()}} being called multiple times (HDFS-12612 and 
> HDFS-12606). 
> We should strength the implement the native code to make {{release()}} 
> idempotent  as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12569) Unset EC policy logs empty payload in edit log

2017-09-29 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16186826#comment-16186826
 ] 

SammiChen commented on HDFS-12569:
--

Hi Andrew,


Next week is PRC National Day.  We will all take 1 week vocation (1st Oct. ~ 
8th Oct.) at least.  Assume delayed email response during this period. 

If any task fits or we can help, just ping or assign to us, we will take them 
over after the vocation. 


Bests,
Sammi



> Unset EC policy logs empty payload in edit log
> --
>
> Key: HDFS-12569
> URL: https://issues.apache.org/jira/browse/HDFS-12569
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.0.0-alpha4
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
>Priority: Blocker
> Attachments: HDFS-12569.00.patch
>
>
> The edit log generated by {{hdfs ec -unsetPolicy}} generates an 
> {{OP_REMOVE_XATTR}} entry in edit logs, but the payload like xattr namespace 
> / name / vaue are missing:
> {code}
>   
> OP_REMOVE_XATTR
> 
>   420481
>   /
>   b098e758-9d7f-48b7-aa91-80ca52133b09
>   0
> 
>   
> {code}
> As a result, when Active NN restarts, or the Standby NN replay edits, this op 
> has not effect.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-7337) Configurable and pluggable erasure codec and policy

2017-09-28 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16185280#comment-16185280
 ] 

SammiChen commented on HDFS-7337:
-

Hi [~andrew.wang],  the release note  is ready. Is there anything I need to do 
besides that? 

> Configurable and pluggable erasure codec and policy
> ---
>
> Key: HDFS-7337
> URL: https://issues.apache.org/jira/browse/HDFS-7337
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: erasure-coding
>Reporter: Zhe Zhang
>Assignee: SammiChen
>Priority: Critical
>  Labels: hdfs-ec-3.0-nice-to-have
> Fix For: 3.0.0-beta1
>
> Attachments: HDFS-7337-prototype-v1.patch, 
> HDFS-7337-prototype-v2.zip, HDFS-7337-prototype-v3.zip, 
> PluggableErasureCodec.pdf, PluggableErasureCodec-v2.pdf, 
> PluggableErasureCodec-v3.pdf, PluggableErasureCodec v4.pdf
>
>
> According to HDFS-7285 and the design, this considers to support multiple 
> Erasure Codecs via pluggable approach. It allows to define and configure 
> multiple codec schemas with different coding algorithms and parameters. The 
> resultant codec schemas can be utilized and specified via command tool for 
> different file folders. While design and implement such pluggable framework, 
> it’s also to implement a concrete codec by default (Reed Solomon) to prove 
> the framework is useful and workable. Separate JIRA could be opened for the 
> RS codec implementation.
> Note HDFS-7353 will focus on the very low level codec API and implementation 
> to make concrete vendor libraries transparent to the upper layer. This JIRA 
> focuses on high level stuffs that interact with configuration, schema and etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12497) Re-enable TestDFSStripedOutputStreamWithFailure tests

2017-09-27 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16183560#comment-16183560
 ] 

SammiChen commented on HDFS-12497:
--

Thanks Huafeng for taking over the task! 

> Re-enable TestDFSStripedOutputStreamWithFailure tests
> -
>
> Key: HDFS-12497
> URL: https://issues.apache.org/jira/browse/HDFS-12497
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.0.0-beta1
>Reporter: Andrew Wang
>Assignee: Huafeng Wang
>  Labels: flaky-test, hdfs-ec-3.0-must-do
> Attachments: HDFS-12497.001.patch
>
>
> We disabled this suite of tests in HDFS-12417 since they were very flaky. We 
> should fix these tests and re-enable them.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12399) Improve erasure coding codec framework adding more unit tests

2017-09-26 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16182027#comment-16182027
 ] 

SammiChen commented on HDFS-12399:
--

I double checked test case failures. They are not relevant. 

Ping [~drankye] and [~eddyxu], can you help to review the patch? 

> Improve erasure coding codec framework adding more unit tests 
> --
>
> Key: HDFS-12399
> URL: https://issues.apache.org/jira/browse/HDFS-12399
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding
>Affects Versions: 3.0.0-alpha3
>Reporter: SammiChen
>Assignee: SammiChen
>  Labels: hdfs-ec-3.0-nice-to-have
> Attachments: HDFS-12399.000.patch
>
>
> Improve erasure coding codec through add more unit tests 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12497) Re-enable TestDFSStripedOutputStreamWithFailure tests

2017-09-21 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16174379#comment-16174379
 ] 

SammiChen commented on HDFS-12497:
--

I see the {{testMultipleDatanodeFailure56}} failure too.  By reducing the 
{{stripesPerBlock}} from 4 to 2, all test*() and testMultipleDatanodeFailure56 
run very well locally. But, {{testBlockTokenExpired}} will always fail in this 
case no matter how long the token lifetime is.  If I set the "tokenExpire" to 
false, then everything is OK. So the failure is token related but not token 
lifetime related. Need more time to find the root cause. 

> Re-enable TestDFSStripedOutputStreamWithFailure tests
> -
>
> Key: HDFS-12497
> URL: https://issues.apache.org/jira/browse/HDFS-12497
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.0.0-beta1
>Reporter: Andrew Wang
>Assignee: SammiChen
>  Labels: flaky-test, hdfs-ec-3.0-must-do
>
> We disabled this suite of tests in HDFS-12417 since they were very flaky. We 
> should fix these tests and re-enable them.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12449) TestReconstructStripedFile.testNNSendsErasureCodingTasks randomly cannot finish in 60s

2017-09-19 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16172611#comment-16172611
 ] 

SammiChen commented on HDFS-12449:
--

Thanks [~eddyxu] for review and commit the patch!

> TestReconstructStripedFile.testNNSendsErasureCodingTasks randomly cannot 
> finish in 60s
> --
>
> Key: HDFS-12449
> URL: https://issues.apache.org/jira/browse/HDFS-12449
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.0.0-alpha4
>Reporter: SammiChen
>Assignee: SammiChen
>  Labels: flaky-test
> Fix For: 3.0.0-beta1
>
> Attachments: HDFS-12449.001.patch
>
>
> TestReconstructStripedFile.testNNSendsErasureCodingTasks randomly cannot 
> finish in 60s
> reduce the file size and loop count



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-12497) Re-enable TestDFSStripedOutputStreamWithFailure tests

2017-09-19 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen reassigned HDFS-12497:


Assignee: SammiChen

> Re-enable TestDFSStripedOutputStreamWithFailure tests
> -
>
> Key: HDFS-12497
> URL: https://issues.apache.org/jira/browse/HDFS-12497
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.0.0-beta1
>Reporter: Andrew Wang
>Assignee: SammiChen
>  Labels: flaky-test, hdfs-ec-3.0-must-do
>
> We disabled this suite of tests in HDFS-12417 since they were very flaky. We 
> should fix these tests and re-enable them.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12447) Refactor addErasureCodingPolicy

2017-09-19 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16171525#comment-16171525
 ] 

SammiChen commented on HDFS-12447:
--

Build failed at step "Apache Hadoop Client Packaging Invariants" and "Apache 
Hadoop Client Packaging Invariants for Test".  Not sure why these two module 
failed. 
{quote}
[INFO] Apache Hadoop Scheduler Load Simulator . SUCCESS [  6.028 s]
[INFO] Apache Hadoop Azure Data Lake support .. SUCCESS [  4.058 s]
[INFO] Apache Hadoop Tools Dist ... SUCCESS [  1.229 s]
[INFO] Apache Hadoop Tools  SUCCESS [  0.029 s]
[INFO] Apache Hadoop Client API ... SUCCESS [01:59 min]
[INFO] Apache Hadoop Client Runtime ... SUCCESS [01:50 min]
[INFO] Apache Hadoop Client Packaging Invariants .. FAILURE [  1.081 s]
[INFO] Apache Hadoop Client Test Minicluster .. SUCCESS [02:24 min]
[INFO] Apache Hadoop Client Packaging Invariants for Test . FAILURE [  0.120 s]
[INFO] Apache Hadoop Client Packaging Integration Tests ... SUCCESS [  1.231 s]
[INFO] Apache Hadoop Distribution . SKIPPED
[INFO] Apache Hadoop Client Modules ... SUCCESS [  0.075 s]
[INFO] Apache Hadoop Cloud Storage  SUCCESS [  1.091 s]
[INFO] Apache Hadoop Cloud Storage Project  SUCCESS [  0.044 s]
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 15:37 min
[INFO] Finished at: 2017-09-19T07:49:44+00:00
[INFO] Final Memory: 121M/497M
[INFO] 
[ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.3.1:exec 
(check-jar-contents) on project hadoop-client-check-invariants: Command 
execution failed. Process exited with an error: 1 (Exit value: 1) -> [Help 1]
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-enforcer-plugin:3.0.0-M1:enforce 
(enforce-banned-dependencies) on project hadoop-client-check-test-invariants: 
Some Enforcer rules have failed. Look above for specific messages explaining 
why the rule failed. -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :hadoop-client-check-invariants
{quote}

> Refactor addErasureCodingPolicy
> ---
>
> Key: HDFS-12447
> URL: https://issues.apache.org/jira/browse/HDFS-12447
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: SammiChen
>Assignee: SammiChen
>  Labels: hdfs-ec-3.0-must-do
> Attachments: HDFS-12447.001.patch, HDFS-12447.002.patch, 
> HDFS-12447.003.patch, HDFS-12447.004.patch
>
>
> As a follow on to handle some issues discussed in HDFS-12395, this is to 
> majorly refactor addErasureCodingPoliy API, change AddECPolicyResponse => 
> AddErasureCodingPolicyResponse



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12399) Improve erasure coding codec framework adding more unit tests

2017-09-19 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HDFS-12399:
-
Attachment: HDFS-12399.000.patch

Initial patch.

> Improve erasure coding codec framework adding more unit tests 
> --
>
> Key: HDFS-12399
> URL: https://issues.apache.org/jira/browse/HDFS-12399
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding
>Affects Versions: 3.0.0-alpha3
>Reporter: SammiChen
>Assignee: SammiChen
>  Labels: hdfs-ec-3.0-nice-to-have
> Attachments: HDFS-12399.000.patch
>
>
> Improve erasure coding codec through add more unit tests 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12449) TestReconstructStripedFile.testNNSendsErasureCodingTasks randomly cannot finish in 60s

2017-09-19 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16171236#comment-16171236
 ] 

SammiChen commented on HDFS-12449:
--

Hi [~eddyxu], do you have time to take a look at the patch?  I double checked 
failed unit tests, not relevant. 

> TestReconstructStripedFile.testNNSendsErasureCodingTasks randomly cannot 
> finish in 60s
> --
>
> Key: HDFS-12449
> URL: https://issues.apache.org/jira/browse/HDFS-12449
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.0.0-alpha4
>Reporter: SammiChen
>Assignee: SammiChen
>  Labels: flaky-test
> Attachments: HDFS-12449.001.patch
>
>
> TestReconstructStripedFile.testNNSendsErasureCodingTasks randomly cannot 
> finish in 60s
> reduce the file size and loop count



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12447) Refactor addErasureCodingPolicy

2017-09-19 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HDFS-12447:
-
Attachment: HDFS-12447.004.patch

Rebase the patch against trunk. 


> Refactor addErasureCodingPolicy
> ---
>
> Key: HDFS-12447
> URL: https://issues.apache.org/jira/browse/HDFS-12447
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: SammiChen
>Assignee: SammiChen
>  Labels: hdfs-ec-3.0-must-do
> Attachments: HDFS-12447.001.patch, HDFS-12447.002.patch, 
> HDFS-12447.003.patch, HDFS-12447.004.patch
>
>
> As a follow on to handle some issues discussed in HDFS-12395, this is to 
> majorly refactor addErasureCodingPoliy API, change AddECPolicyResponse => 
> AddErasureCodingPolicyResponse



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12395) Support erasure coding policy operations in namenode edit log

2017-09-19 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16171232#comment-16171232
 ] 

SammiChen commented on HDFS-12395:
--

Thanks [~kihwal] for the reminder of taking care the NN layout version,  I have 
the same opinion as [~andrew.wang].  

And [~brahmareddy], thanks for your advice.  I will issue separate JIRA next 
time in this case. 

> Support erasure coding policy operations in namenode edit log
> -
>
> Key: HDFS-12395
> URL: https://issues.apache.org/jira/browse/HDFS-12395
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding
>Reporter: SammiChen
>Assignee: SammiChen
>  Labels: hdfs-ec-3.0-must-do
> Fix For: 3.0.0-beta1
>
> Attachments: editsStored, HDFS-12395.001.patch, HDFS-12395.002.patch, 
> HDFS-12395.003.patch, HDFS-12395.004.patch
>
>
> Support add, remove, disable, enable erasure coding policy operation in edit 
> log. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12395) Support erasure coding policy operations in namenode edit log

2017-09-18 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16169666#comment-16169666
 ] 

SammiChen commented on HDFS-12395:
--

Hi, [~brahmareddy], thanks for the reminder. I will address these two failed 
cases in HDFS-12460.

> Support erasure coding policy operations in namenode edit log
> -
>
> Key: HDFS-12395
> URL: https://issues.apache.org/jira/browse/HDFS-12395
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding
>Reporter: SammiChen
>Assignee: SammiChen
>  Labels: hdfs-ec-3.0-must-do
> Fix For: 3.0.0-beta1
>
> Attachments: editsStored, HDFS-12395.001.patch, HDFS-12395.002.patch, 
> HDFS-12395.003.patch, HDFS-12395.004.patch
>
>
> Support add, remove, disable, enable erasure coding policy operation in edit 
> log. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12447) Refactor addErasureCodingPolicy

2017-09-15 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HDFS-12447:
-
Attachment: HDFS-12447.002.patch

Improve the patch after offline discussion with Kai. 

> Refactor addErasureCodingPolicy
> ---
>
> Key: HDFS-12447
> URL: https://issues.apache.org/jira/browse/HDFS-12447
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: SammiChen
>Assignee: SammiChen
> Attachments: HDFS-12447.001.patch, HDFS-12447.002.patch
>
>
> As a follow on to handle some issues discussed in HDFS-12395, this is to 
> majorly refactor addErasureCodingPoliy API, change AddECPolicyResponse => 
> AddErasureCodingPolicyResponse



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12460) make addErasureCodingPolicy an idempotent operation

2017-09-15 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HDFS-12460:
-
Attachment: HDFS-12460.001.patch

Initial patch.

> make addErasureCodingPolicy an idempotent operation
> ---
>
> Key: HDFS-12460
> URL: https://issues.apache.org/jira/browse/HDFS-12460
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: caching
>Reporter: SammiChen
>Assignee: SammiChen
> Attachments: HDFS-12460.001.patch
>
>
> Make addErasureCodingPolicy an idempotent operation to guarantee  after HA 
> switch, addErasureCodingPolicy edit log  can be applied smoothly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12460) make addErasureCodingPolicy an idempotent operation

2017-09-15 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HDFS-12460:
-
Description: Make addErasureCodingPolicy an idempotent operation to 
guarantee  after HA switch, addErasureCodingPolicy edit log  can be applied 
smoothly.  (was: Make addErasureCodingPolicy an idempotent operation to 
guarantee  after HA switch, all edit log  can be applied smoothly )

> make addErasureCodingPolicy an idempotent operation
> ---
>
> Key: HDFS-12460
> URL: https://issues.apache.org/jira/browse/HDFS-12460
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: caching
>Reporter: SammiChen
>Assignee: SammiChen
>
> Make addErasureCodingPolicy an idempotent operation to guarantee  after HA 
> switch, addErasureCodingPolicy edit log  can be applied smoothly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12460) make addErasureCodingPolicy an idempotent operation

2017-09-15 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HDFS-12460:
-
Description: Make addErasureCodingPolicy an idempotent operation to 
guarantee  after HA switch, all edit log  can be applied smoothly   (was: 
TestNamenodeRetryCache.testRetryCacheRebuild unit test case failure due to edit 
log opcode number increase.)

> make addErasureCodingPolicy an idempotent operation
> ---
>
> Key: HDFS-12460
> URL: https://issues.apache.org/jira/browse/HDFS-12460
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: caching
>Reporter: SammiChen
>Assignee: SammiChen
>
> Make addErasureCodingPolicy an idempotent operation to guarantee  after HA 
> switch, all edit log  can be applied smoothly 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12460) make addErasureCodingPolicy an idempotent operation

2017-09-15 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HDFS-12460:
-
Issue Type: Improvement  (was: Bug)

> make addErasureCodingPolicy an idempotent operation
> ---
>
> Key: HDFS-12460
> URL: https://issues.apache.org/jira/browse/HDFS-12460
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: caching
>Reporter: SammiChen
>Assignee: SammiChen
>
> TestNamenodeRetryCache.testRetryCacheRebuild unit test case failure due to 
> edit log opcode number increase.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12460) make addErasureCodingPolicy an idempotent operation

2017-09-15 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HDFS-12460:
-
Summary: make addErasureCodingPolicy an idempotent operation  (was: 
addErasureCodingPolicy should)

> make addErasureCodingPolicy an idempotent operation
> ---
>
> Key: HDFS-12460
> URL: https://issues.apache.org/jira/browse/HDFS-12460
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: caching
>Reporter: SammiChen
>Assignee: SammiChen
>
> TestNamenodeRetryCache.testRetryCacheRebuild unit test case failure due to 
> edit log opcode number increase.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12460) addErasureCodingPolicy should

2017-09-15 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HDFS-12460:
-
Summary: addErasureCodingPolicy should  (was: 
TestNamenodeRetryCache.testRetryCacheRebuild unit test case failure)

> addErasureCodingPolicy should
> -
>
> Key: HDFS-12460
> URL: https://issues.apache.org/jira/browse/HDFS-12460
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: caching
>Reporter: SammiChen
>Assignee: SammiChen
>
> TestNamenodeRetryCache.testRetryCacheRebuild unit test case failure due to 
> edit log opcode number increase.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12462) Erasure coding policy extra options should be sorted by key value

2017-09-14 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HDFS-12462:
-
Labels: hdfs-ec-3.0-nice-to-have  (was: )

> Erasure coding policy extra options should be sorted by key value
> -
>
> Key: HDFS-12462
> URL: https://issues.apache.org/jira/browse/HDFS-12462
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding
>Reporter: SammiChen
>  Labels: hdfs-ec-3.0-nice-to-have
>
> To make sure the serialized fsimage and editlog binary equal, Erasure coding 
> policy extra options should be sorted by key value.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12462) Erasure coding policy extra options should be sorted by key value

2017-09-14 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HDFS-12462:
-
Component/s: erasure-coding

> Erasure coding policy extra options should be sorted by key value
> -
>
> Key: HDFS-12462
> URL: https://issues.apache.org/jira/browse/HDFS-12462
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding
>Reporter: SammiChen
>
> To make sure the serialized fsimage and editlog binary equal, Erasure coding 
> policy extra options should be sorted by key value.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-12462) Erasure coding policy extra options should be sorted by key value

2017-09-14 Thread SammiChen (JIRA)
SammiChen created HDFS-12462:


 Summary: Erasure coding policy extra options should be sorted by 
key value
 Key: HDFS-12462
 URL: https://issues.apache.org/jira/browse/HDFS-12462
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: SammiChen


To make sure the serialized fsimage and editlog binary equal, Erasure coding 
policy extra options should be sorted by key value.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12447) Refactor addErasureCodingPolicy

2017-09-14 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HDFS-12447:
-
Attachment: HDFS-12447.001.patch

> Refactor addErasureCodingPolicy
> ---
>
> Key: HDFS-12447
> URL: https://issues.apache.org/jira/browse/HDFS-12447
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: SammiChen
>Assignee: SammiChen
> Attachments: HDFS-12447.001.patch
>
>
> As a follow on to handle some issues discussed in HDFS-12395, this is to 
> majorly refactor addErasureCodingPoliy API, change AddECPolicyResponse => 
> AddErasureCodingPolicyResponse



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12447) Refactor addErasureCodingPolicy

2017-09-14 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HDFS-12447:
-
Status: Patch Available  (was: Open)

> Refactor addErasureCodingPolicy
> ---
>
> Key: HDFS-12447
> URL: https://issues.apache.org/jira/browse/HDFS-12447
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: SammiChen
>Assignee: SammiChen
> Attachments: HDFS-12447.001.patch
>
>
> As a follow on to handle some issues discussed in HDFS-12395, this is to 
> majorly refactor addErasureCodingPoliy API, change AddECPolicyResponse => 
> AddErasureCodingPolicyResponse



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-12460) TestNamenodeRetryCache.testRetryCacheRebuild unit test case failure

2017-09-14 Thread SammiChen (JIRA)
SammiChen created HDFS-12460:


 Summary: TestNamenodeRetryCache.testRetryCacheRebuild unit test 
case failure
 Key: HDFS-12460
 URL: https://issues.apache.org/jira/browse/HDFS-12460
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: caching
Reporter: SammiChen
Assignee: SammiChen


TestNamenodeRetryCache.testRetryCacheRebuild unit test case failure due to edit 
log opcode number increase.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-7859) Erasure Coding: Persist erasure coding policies in NameNode

2017-09-14 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16167239#comment-16167239
 ] 

SammiChen commented on HDFS-7859:
-

Sure. Release note is ready. Thanks [~drankye], [~eddyxu], [~andrew.wang], 
[~xinwei], [~szetszwo], [~zhz] and [~jingzhao] for all your contribution and 
effort!

> Erasure Coding: Persist erasure coding policies in NameNode
> ---
>
> Key: HDFS-7859
> URL: https://issues.apache.org/jira/browse/HDFS-7859
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: SammiChen
>  Labels: hdfs-ec-3.0-must-do
> Fix For: 3.0.0-beta1
>
> Attachments: HDFS-7859.001.patch, HDFS-7859.002.patch, 
> HDFS-7859.004.patch, HDFS-7859.005.patch, HDFS-7859.006.patch, 
> HDFS-7859.007.patch, HDFS-7859.008.patch, HDFS-7859.009.patch, 
> HDFS-7859.010.patch, HDFS-7859.011.patch, HDFS-7859.012.patch, 
> HDFS-7859.013.patch, HDFS-7859.014.patch, HDFS-7859.015.patch, 
> HDFS-7859.016.patch, HDFS-7859.017.patch, HDFS-7859.018.patch, 
> HDFS-7859.019.patch, HDFS-7859-HDFS-7285.002.patch, 
> HDFS-7859-HDFS-7285.002.patch, HDFS-7859-HDFS-7285.003.patch
>
>
> In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we 
> persist EC schemas in NameNode centrally and reliably, so that EC zones can 
> reference them by name efficiently.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-7859) Erasure Coding: Persist erasure coding policies in NameNode

2017-09-14 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HDFS-7859:

Release Note: Persist all built-in erasure coding policies and user defined 
erasure coding policies into NameNode fsImage and editlog reliably, so that all 
erasure coding policies remain consistent after NameNode restart.

> Erasure Coding: Persist erasure coding policies in NameNode
> ---
>
> Key: HDFS-7859
> URL: https://issues.apache.org/jira/browse/HDFS-7859
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: SammiChen
>  Labels: hdfs-ec-3.0-must-do
> Fix For: 3.0.0-beta1
>
> Attachments: HDFS-7859.001.patch, HDFS-7859.002.patch, 
> HDFS-7859.004.patch, HDFS-7859.005.patch, HDFS-7859.006.patch, 
> HDFS-7859.007.patch, HDFS-7859.008.patch, HDFS-7859.009.patch, 
> HDFS-7859.010.patch, HDFS-7859.011.patch, HDFS-7859.012.patch, 
> HDFS-7859.013.patch, HDFS-7859.014.patch, HDFS-7859.015.patch, 
> HDFS-7859.016.patch, HDFS-7859.017.patch, HDFS-7859.018.patch, 
> HDFS-7859.019.patch, HDFS-7859-HDFS-7285.002.patch, 
> HDFS-7859-HDFS-7285.002.patch, HDFS-7859-HDFS-7285.003.patch
>
>
> In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we 
> persist EC schemas in NameNode centrally and reliably, so that EC zones can 
> reference them by name efficiently.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12449) TestReconstructStripedFile.testNNSendsErasureCodingTasks randomly cannot finish in 60s

2017-09-14 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HDFS-12449:
-
Description: 
TestReconstructStripedFile.testNNSendsErasureCodingTasks randomly cannot finish 
in 60s

reduce the file size and loop count

  was:
TestReconstructStripedFile.testNNSendsErasureCodingTasks randomly cannot finish 
in 60s

reduce the file size and loop account


> TestReconstructStripedFile.testNNSendsErasureCodingTasks randomly cannot 
> finish in 60s
> --
>
> Key: HDFS-12449
> URL: https://issues.apache.org/jira/browse/HDFS-12449
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.0.0-alpha4
>Reporter: SammiChen
>Assignee: SammiChen
>  Labels: flaky-test
> Attachments: HDFS-12449.001.patch
>
>
> TestReconstructStripedFile.testNNSendsErasureCodingTasks randomly cannot 
> finish in 60s
> reduce the file size and loop count



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12449) TestReconstructStripedFile.testNNSendsErasureCodingTasks randomly cannot finish in 60s

2017-09-14 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HDFS-12449:
-
Status: Patch Available  (was: Open)

> TestReconstructStripedFile.testNNSendsErasureCodingTasks randomly cannot 
> finish in 60s
> --
>
> Key: HDFS-12449
> URL: https://issues.apache.org/jira/browse/HDFS-12449
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: SammiChen
>Assignee: SammiChen
>  Labels: flaky-test
> Attachments: HDFS-12449.001.patch
>
>
> TestReconstructStripedFile.testNNSendsErasureCodingTasks randomly cannot 
> finish in 60s
> reduce the file size and loop account



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12449) TestReconstructStripedFile.testNNSendsErasureCodingTasks randomly cannot finish in 60s

2017-09-14 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HDFS-12449:
-
Attachment: HDFS-12449.001.patch

> TestReconstructStripedFile.testNNSendsErasureCodingTasks randomly cannot 
> finish in 60s
> --
>
> Key: HDFS-12449
> URL: https://issues.apache.org/jira/browse/HDFS-12449
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: SammiChen
>Assignee: SammiChen
>  Labels: flaky-test
> Attachments: HDFS-12449.001.patch
>
>
> TestReconstructStripedFile.testNNSendsErasureCodingTasks randomly cannot 
> finish in 60s
> reduce the file size and loop account



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



  1   2   3   4   5   6   7   >