[jira] [Updated] (MAPREDUCE-6774) Add support for HDFS erasure code policy to TestDFSIO

2016-09-06 Thread Kai Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Zheng updated MAPREDUCE-6774:
-
Assignee: SammiChen

> Add support for HDFS erasure code policy to TestDFSIO
> -
>
> Key: MAPREDUCE-6774
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6774
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: SammiChen
>Assignee: SammiChen
> Attachments: MAPREDUCE-6774-v1.patch
>
>
> HDFS erasure code policy allows user to store directory and file to 
> predefined erasure code policies. Currently only 3x replication is supported 
> in TestDFSIO implementation. This is going to add an new option to enable 
> tests of files with erasure code policy enabled. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6628) Potential memory leak in CryptoOutputStream

2016-09-06 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15468199#comment-15468199
 ] 

Chris Douglas commented on MAPREDUCE-6628:
--

[~masokan] thank you for your patience with this.

The unit test looks useful for debugging, but it doesn't actually verify the 
fix. As written, it's also expensive to run (starts a cluster) and relies on a 
platform-dependent scan of {{/proc/self/status}}, rather than using 
{{java.lang.management}} APIs. That said, unit testing this corner of MapReduce 
is not straightforward, and your posted results demonstrate both the issue and 
the fix. We can commit this without a MR test.

Would it be possible to write a short unit test for {{CryptoOutputStream}} 
verifying the new {{closeOutputStream}} semantics? This should be very 
straightforward in Mockito, just checking that {{close}} behaves as expected 
when the flag is passed.

It's unfortunate that we're switching behavior based on object reference 
equality, to check whether the stream was wrapped. As designed, I don't see a 
cleaner way to improve this without refactoring the crypto implementation.

> Potential memory leak in CryptoOutputStream
> ---
>
> Key: MAPREDUCE-6628
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6628
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.6.4
>Reporter: Mariappan Asokan
>Assignee: Mariappan Asokan
> Attachments: MAPREDUCE-6628.001.patch, MAPREDUCE-6628.002.patch, 
> MAPREDUCE-6628.003.patch, MAPREDUCE-6628.004.patch, MAPREDUCE-6628.005.patch, 
> MAPREDUCE-6628.006.patch, MAPREDUCE-6628.007.patch
>
>
> There is a potential memory leak in {{CryptoOutputStream.java.}}  It 
> allocates two direct byte buffers ({{inBuffer}} and {{outBuffer}}) that get 
> freed when {{close()}} method is called.  Most of the time, {{close()}} 
> method is called.  However, when writing to intermediate Map output file or 
> the spill files in {{MapTask}}, {{close()}} is never called since calling so  
> would close the underlying stream which is not desirable.  There is a single 
> underlying physical stream that contains multiple logical streams one per 
> partition of Map output.  
> By default the amount of memory allocated per byte buffer is 128 KB and  so 
> the total memory allocated is 256 KB,  This may not sound much.  However, if 
> the number of partitions (or number of reducers) is large (in the hundreds) 
> and/or there are spill files created in {{MapTask}}, this can grow into a few 
> hundred MB. 
> I can think of two ways to address this issue:
> h2. Possible Fix - 1
> According to JDK documentation:
> {quote}
> The contents of direct buffers may reside outside of the normal 
> garbage-collected heap, and so their impact upon the memory footprint of an 
> application might not be obvious.  It is therefore recommended that direct 
> buffers be allocated primarily for large, long-lived buffers that are subject 
> to the underlying system's native I/O operations.  In general it is best to 
> allocate direct buffers only when they yield a measureable gain in program 
> performance.
> {quote}
> It is not clear to me whether there is any benefit of allocating direct byte 
> buffers in {{CryptoOutputStream.java}}.  In fact, there is a slight CPU 
> overhead in moving data from {{outBuffer}} to a temporary byte array as per 
> the following code in {{CryptoOutputStream.java}}.
> {code}
> /*
>  * If underlying stream supports {@link ByteBuffer} write in future, needs
>  * refine here. 
>  */
> final byte[] tmp = getTmpBuf();
> outBuffer.get(tmp, 0, len);
> out.write(tmp, 0, len);
> {code}
> Even if the underlying stream supports direct byte buffer IO (or direct IO in 
> OS parlance), it is not clear whether it will yield any measurable 
> performance gain.
> The fix would be to allocate a ByteBuffer on the heap for inBuffer and wrap a 
> byte array in a {{ByteBuffer}} for {{outBuffer}}.  By the way, the 
> {{inBuffer}} and {{outBuffer}} have to be {{ByteBuffer}} as demanded by the 
> {{encrypt()}} method in {{Encryptor}}.
> h2. Possible Fix - 2
> Assuming that we want to keep the buffers as direct byte buffers, we can 
> create a new constructor to {{CryptoOutputStream}} and pass a boolean flag 
> {{ownOutputStream}} to indicate whether the underlying stream will be owned 
> by {{CryptoOutputStream}}. If it is true, then calling the {{close()}} method 
> will close the underlying stream.  Otherwise, when {{close()}} is called only 
> the direct byte buffers will be freed and the underlying stream will not be 
> closed.
> The scope of changes for this fix will be somewhat wider.  We need to modify 
> {{MapTask.java}}, {{CryptoUtils.java}}, and {{CryptoFSDataOutputStream.java}} 
> 

[jira] [Commented] (MAPREDUCE-6774) Add support for HDFS erasure code policy to TestDFSIO

2016-09-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15467778#comment-15467778
 ] 

Hadoop QA commented on MAPREDUCE-6774:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
47s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
18s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
18s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
28s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
27s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 17s 
{color} | {color:red} 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient:
 The patch generated 5 new + 50 unchanged - 0 fixed = 55 total (was 50) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 32s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
38s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 11s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 143m 25s 
{color} | {color:red} hadoop-mapreduce-client-jobclient in the patch failed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
28s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 158m 46s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.mapred.TestMROpportunisticMaps |
|   | hadoop.mapred.TestReduceFetch |
|   | hadoop.mapred.TestMerge |
|   | hadoop.mapreduce.TestMapReduceLazyOutput |
|   | hadoop.mapred.TestMRIntermediateDataEncryption |
|   | hadoop.mapred.TestLazyOutput |
|   | hadoop.mapreduce.TestLargeSort |
|   | hadoop.mapred.TestReduceFetchFromPartialMem |
|   | hadoop.mapreduce.v2.TestMRJobsWithProfiler |
|   | hadoop.mapreduce.lib.output.TestJobOutputCommitter |
|   | hadoop.mapreduce.security.ssl.TestEncryptedShuffle |
|   | hadoop.mapreduce.v2.TestMROldApiJobs |
|   | hadoop.mapred.TestJobCleanup |
|   | hadoop.mapreduce.v2.TestSpeculativeExecution |
|   | hadoop.mapred.TestClusterMRNotification |
|   | hadoop.mapreduce.security.TestUmbilicalProtocolWithJobToken |
|   | hadoop.mapreduce.v2.TestMRAMWithNonNormalizedCapabilities |
|   | hadoop.mapreduce.v2.TestMRJobs |
|   | hadoop.mapred.TestJobName |
|   | hadoop.mapreduce.TestMRJobClient |
|   | hadoop.mapred.TestClusterMapReduceTestCase |
|   | hadoop.mapred.TestAuditLogger |
|   | hadoop.mapreduce.security.TestMRCredentials |
|   | hadoop.mapred.TestMRTimelineEventHandling |
|   | hadoop.mapreduce.v2.TestMiniMRProxyUser |
|   | hadoop.mapreduce.v2.TestMRJobsWithHistoryService |
|   | hadoop.mapred.TestMiniMRClientCluster |
|   | hadoop.mapred.TestMiniMRChildTask |
|   | hadoop.mapreduce.TestChild |
|   | hadoop.mapreduce.security.TestBinaryTokenFile |
|   | hadoop.mapred.TestJobCounters |

[jira] [Updated] (MAPREDUCE-6774) Add support for HDFS erasure code policy to TestDFSIO

2016-09-06 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated MAPREDUCE-6774:
-
Status: Patch Available  (was: Open)

> Add support for HDFS erasure code policy to TestDFSIO
> -
>
> Key: MAPREDUCE-6774
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6774
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: SammiChen
> Attachments: MAPREDUCE-6774-v1.patch
>
>
> HDFS erasure code policy allows user to store directory and file to 
> predefined erasure code policies. Currently only 3x replication is supported 
> in TestDFSIO implementation. This is going to add an new option to enable 
> tests of files with erasure code policy enabled. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6774) Add support for HDFS erasure code policy to TestDFSIO

2016-09-06 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated MAPREDUCE-6774:
-
Attachment: MAPREDUCE-6774-v1.patch

> Add support for HDFS erasure code policy to TestDFSIO
> -
>
> Key: MAPREDUCE-6774
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6774
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: SammiChen
> Attachments: MAPREDUCE-6774-v1.patch
>
>
> HDFS erasure code policy allows user to store directory and file to 
> predefined erasure code policies. Currently only 3x replication is supported 
> in TestDFSIO implementation. This is going to add an new option to enable 
> tests of files with erasure code policy enabled. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-6774) Add support for HDFS erasure code policy to TestDFSIO

2016-09-06 Thread SammiChen (JIRA)
SammiChen created MAPREDUCE-6774:


 Summary: Add support for HDFS erasure code policy to TestDFSIO
 Key: MAPREDUCE-6774
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6774
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: SammiChen


HDFS erasure code policy allows user to store directory and file to predefined 
erasure code policies. Currently only 3x replication is supported in TestDFSIO 
implementation. This is going to add an new option to enable tests of files 
with erasure code policy enabled. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org