[jira] [Commented] (HADOOP-15515) adl.AdlFilesystem.close() doesn't release locks on open files

2018-06-25 Thread Chris Douglas (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16522926#comment-16522926
 ] 

Chris Douglas commented on HADOOP-15515:


bq. Could you please suggest if the openFileStreams modification is disallowed 
and create/append/creteNonRecursive should fail with IOException? Looked at 
DfsClient code and operation's after FileSystem.Close is allowed.
Throwing {{IOException}} makes sense to me, as this looks like a resource leak. 
I'm not sure if {{DFSClient}} also has a resource leak, or if it's significant 
that this is handled one level up for ADLS (i.e., as part of the 
{{FileSystem}}, not the client it wraps).

Aside: please don't delete attachments. Doing so orphans the discussion about 
them.

> adl.AdlFilesystem.close() doesn't release locks on open files
> -
>
> Key: HADOOP-15515
> URL: https://issues.apache.org/jira/browse/HADOOP-15515
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/adl
>Affects Versions: 2.7.3
> Environment: HDInsight on MS Azure:
>  
> Hadoop 2.7.3.2.6.2.25-1
> Subversion g...@github.com:hortonworks/hadoop.git -r 
> 1ceeb58bb3bb5904df0cbb7983389bcaf2ffd0b6
> Compiled by jenkins on 2017-11-29T15:28Z
> Compiled with protoc 2.5.0
> From source with checksum 90b73c4c185645c1f47b61f942230
> This command was run using 
> /usr/hdp/2.6.2.25-1/hadoop/hadoop-common-2.7.3.2.6.2.25-1.jar
>Reporter: Jay Hankinson
>Assignee: Vishwajeet Dusane
>Priority: Major
> Attachments: HADOOP-15515-001.patch
>
>
> If you write to a file on and Azure ADL filesystem and close the file system 
> but not the file before the process exits, the next time you try open the 
> file for append it fails with:
> Exception in thread "main" java.io.IOException: APPEND failed with error 
> 0x83090a16 (Failed to perform the requested operation because the file is 
> currently open in write mode by another user or process.). 
> [a67c6b32-e78b-4852-9fac-142a3e2ba963][2018-03-22T20:54:08.3520940-07:00]
>  The following moves local file to HDFS if it doesn't exist or appends it's 
> contents if it does:
>  
> {code:java}
> public void addFile(String source, String dest, Configuration conf) throws 
> IOException {
> FileSystem fileSystem = FileSystem.get(conf);
> // Get the filename out of the file path
> String filename = source.substring(source.lastIndexOf('/') + 
> 1,source.length());
> // Create the destination path including the filename.
> if (dest.charAt(dest.length() - 1) != '/')
> { dest = dest + "/" + filename; }
> else {
> dest = dest + filename;
> }
> // Check if the file already exists
> Path path = new Path(dest);
> FSDataOutputStream out;
> if (fileSystem.exists(path)) {
> System.out.println("File " + dest + " already exists appending");
> out = fileSystem.append(path);
> } else {
> out = fileSystem.create(path);
> }
> // Create a new file and write data to it.
> InputStream in = new BufferedInputStream(new FileInputStream(new File(
> source)));
> byte[] b = new byte[1024];
> int numBytes = 0;
> while ((numBytes = in.read(b)) > 0) {
> out.write(b, 0, numBytes);
> }
> // Close the file system not the file
> in.close();
> //out.close();
> fileSystem.close();
> }
> {code}
>  If "dest" is an adl:// location, invoking the function a second time (after 
> the process has exited) it raises the error. If it's a regular hdfs:// file 
> system, it doesn't as all the locks are released. The same exception is also 
> raised if a subsequent append is done using: hdfs dfs  -appendToFile.
> As I can't see a way to force lease recovery in this situation, this seems 
> like a bug. org.apache.hadoop.fs.adl.AdlFileSystem inherits close() from 
> org.apache.hadoop.fs.FileSystem
> [https://hadoop.apache.org/docs/r3.0.0/api/org/apache/hadoop/fs/adl/AdlFileSystem.html]
> Which states:
> Close this FileSystem instance. Will release any held locks. This does not 
> seem to be the case



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15533) Make WASB listStatus messages consistent

2018-06-18 Thread Chris Douglas (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated HADOOP-15533:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.4
   3.1.1
   3.2.0
   2.10.0
   Status: Resolved  (was: Patch Available)

+1 I committed this. Thanks, [~esmanii]

> Make WASB listStatus messages consistent
> 
>
> Key: HADOOP-15533
> URL: https://issues.apache.org/jira/browse/HADOOP-15533
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Reporter: Esfandiar Manii
>Assignee: Esfandiar Manii
>Priority: Trivial
> Fix For: 2.10.0, 3.2.0, 3.1.1, 3.0.4
>
> Attachments: HADOOP-15533-001.patch, HADOOP-15533-branch-2-001.patch
>
>
> - This change make WASB listStatus error messages to be consistent with the 
> rest of the listStatus error messages.
> - Inconsistent error messages cause a few WASB tests to fail only in 
> branch-2. The test bug was introduced in 
> "https://issues.apache.org/jira/browse/HADOOP-15506;. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15533) Make WASB listStatus messages consistent

2018-06-18 Thread Chris Douglas (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated HADOOP-15533:
---
Summary: Make WASB listStatus messages consistent  (was: Making WASB 
listStatus messages consistent)

> Make WASB listStatus messages consistent
> 
>
> Key: HADOOP-15533
> URL: https://issues.apache.org/jira/browse/HADOOP-15533
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Reporter: Esfandiar Manii
>Assignee: Esfandiar Manii
>Priority: Trivial
> Attachments: HADOOP-15533-001.patch, HADOOP-15533-branch-2-001.patch
>
>
> - This change make WASB listStatus error messages to be consistent with the 
> rest of the listStatus error messages.
> - Inconsistent error messages cause a few WASB tests to fail only in 
> branch-2. The test bug was introduced in 
> "https://issues.apache.org/jira/browse/HADOOP-15506;. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15506) Upgrade Azure Storage Sdk version to 7.0.0 and update corresponding code blocks

2018-06-13 Thread Chris Douglas (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16511554#comment-16511554
 ] 

Chris Douglas commented on HADOOP-15506:


bq. It only affects 3 wasb tests in trunk-2. But sent out a JIRA and made it 
consistent with trunk
To be clear (please correct if I've misunderstood), this JIRA updated a live 
test to match the exception text from the updated client library, but it missed 
a case. HADOOP-15533 addresses this (are three of these tests failing, now?). 
Would you mind updating HADOOP-15533 to include more context? For example, the 
description says that error messages should be consistent, but consistent with 
what?

> Upgrade Azure Storage Sdk version to 7.0.0 and update corresponding code 
> blocks
> ---
>
> Key: HADOOP-15506
> URL: https://issues.apache.org/jira/browse/HADOOP-15506
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Esfandiar Manii
>Assignee: Esfandiar Manii
>Priority: Minor
> Fix For: 2.10.0, 3.2.0, 3.1.1, 3.0.4
>
> Attachments: HADOOP-15506-001.patch
>
>
> - Upgraded Azure Storage Sdk to 7.0.0
> - Fixed code issues and couple of tests



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-15506) Upgrade Azure Storage Sdk version to 7.0.0 and update corresponding code blocks

2018-06-11 Thread Chris Douglas (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508904#comment-16508904
 ] 

Chris Douglas edited comment on HADOOP-15506 at 6/11/18 10:49 PM:
--

Backported through 2.10.0. [~esmanii], would you mind verifying the backported 
bits in branch-2?


was (Author: chris.douglas):
Backported through 2.10.0

> Upgrade Azure Storage Sdk version to 7.0.0 and update corresponding code 
> blocks
> ---
>
> Key: HADOOP-15506
> URL: https://issues.apache.org/jira/browse/HADOOP-15506
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Esfandiar Manii
>Assignee: Esfandiar Manii
>Priority: Minor
> Fix For: 2.10.0, 3.2.0, 3.1.1, 3.0.4
>
> Attachments: HADOOP-15506-001.patch
>
>
> - Upgraded Azure Storage Sdk to 7.0.0
> - Fixed code issues and couple of tests



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15506) Upgrade Azure Storage Sdk version to 7.0.0 and update corresponding code blocks

2018-06-11 Thread Chris Douglas (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508904#comment-16508904
 ] 

Chris Douglas commented on HADOOP-15506:


Backported through 2.10.0

> Upgrade Azure Storage Sdk version to 7.0.0 and update corresponding code 
> blocks
> ---
>
> Key: HADOOP-15506
> URL: https://issues.apache.org/jira/browse/HADOOP-15506
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Esfandiar Manii
>Assignee: Esfandiar Manii
>Priority: Minor
> Fix For: 2.10.0, 3.2.0, 3.1.1, 3.0.4
>
> Attachments: HADOOP-15506-001.patch
>
>
> - Upgraded Azure Storage Sdk to 7.0.0
> - Fixed code issues and couple of tests



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15506) Upgrade Azure Storage Sdk version to 7.0.0 and update corresponding code blocks

2018-06-11 Thread Chris Douglas (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated HADOOP-15506:
---
Fix Version/s: 3.0.4
   3.1.1
   2.10.0

> Upgrade Azure Storage Sdk version to 7.0.0 and update corresponding code 
> blocks
> ---
>
> Key: HADOOP-15506
> URL: https://issues.apache.org/jira/browse/HADOOP-15506
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Esfandiar Manii
>Assignee: Esfandiar Manii
>Priority: Minor
> Fix For: 2.10.0, 3.2.0, 3.1.1, 3.0.4
>
> Attachments: HADOOP-15506-001.patch
>
>
> - Upgraded Azure Storage Sdk to 7.0.0
> - Fixed code issues and couple of tests



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15521) Upgrading Azure Storage Sdk version to 7.0.0 and updating corresponding code blocks

2018-06-11 Thread Chris Douglas (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated HADOOP-15521:
---
Resolution: Duplicate
Status: Resolved  (was: Patch Available)

Closing as a dup of HADOOP-15506

> Upgrading Azure Storage Sdk version to 7.0.0 and updating corresponding code 
> blocks
> ---
>
> Key: HADOOP-15521
> URL: https://issues.apache.org/jira/browse/HADOOP-15521
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/azure
>Affects Versions: 2.10.0
>Reporter: Esfandiar Manii
>Assignee: Esfandiar Manii
>Priority: Minor
> Attachments: HADOOP-15521-001.patch, HADOOP-15521-branch-2-001.patch
>
>
> Upgraded Azure Storage Sdk to 7.0.0
> Fixed code issues and couple of tests



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15521) Upgrading Azure Storage Sdk version to 7.0.0 and updating corresponding code blocks

2018-06-11 Thread Chris Douglas (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508833#comment-16508833
 ] 

Chris Douglas commented on HADOOP-15521:


No worries, just wanted to be sure [^HADOOP-15521-branch-2-001.patch] was the 
correct patch. I'll close this as a duplicate and backport HADOOP-15506.

> Upgrading Azure Storage Sdk version to 7.0.0 and updating corresponding code 
> blocks
> ---
>
> Key: HADOOP-15521
> URL: https://issues.apache.org/jira/browse/HADOOP-15521
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/azure
>Affects Versions: 2.10.0
>Reporter: Esfandiar Manii
>Assignee: Esfandiar Manii
>Priority: Minor
> Attachments: HADOOP-15521-001.patch, HADOOP-15521-branch-2-001.patch
>
>
> Upgraded Azure Storage Sdk to 7.0.0
> Fixed code issues and couple of tests



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15521) Upgrading Azure Storage Sdk version to 7.0.0 and updating corresponding code blocks

2018-06-11 Thread Chris Douglas (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508795#comment-16508795
 ] 

Chris Douglas commented on HADOOP-15521:


Both versions of the patch look identical, to each other and to HADOOP-15506 
(+/- minor whitespace). Am I missing something, or is the patch missing some 
changes?

> Upgrading Azure Storage Sdk version to 7.0.0 and updating corresponding code 
> blocks
> ---
>
> Key: HADOOP-15521
> URL: https://issues.apache.org/jira/browse/HADOOP-15521
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/azure
>Affects Versions: 2.10.0
>Reporter: Esfandiar Manii
>Assignee: Esfandiar Manii
>Priority: Minor
> Attachments: HADOOP-15521-001.patch, HADOOP-15521-branch-2-001.patch
>
>
> Upgraded Azure Storage Sdk to 7.0.0
> Fixed code issues and couple of tests



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15506) Upgrade Azure Storage Sdk version to 7.0.0 and update corresponding code blocks

2018-06-11 Thread Chris Douglas (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508590#comment-16508590
 ] 

Chris Douglas commented on HADOOP-15506:


Marking as resolved, since this was committed to trunk.

> Upgrade Azure Storage Sdk version to 7.0.0 and update corresponding code 
> blocks
> ---
>
> Key: HADOOP-15506
> URL: https://issues.apache.org/jira/browse/HADOOP-15506
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Esfandiar Manii
>Assignee: Esfandiar Manii
>Priority: Minor
> Fix For: 3.2.0
>
> Attachments: HADOOP-15506-001.patch
>
>
> - Upgraded Azure Storage Sdk to 7.0.0
> - Fixed code issues and couple of tests



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15506) Upgrade Azure Storage Sdk version to 7.0.0 and update corresponding code blocks

2018-06-11 Thread Chris Douglas (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated HADOOP-15506:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.2.0
   Status: Resolved  (was: Patch Available)

> Upgrade Azure Storage Sdk version to 7.0.0 and update corresponding code 
> blocks
> ---
>
> Key: HADOOP-15506
> URL: https://issues.apache.org/jira/browse/HADOOP-15506
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Esfandiar Manii
>Assignee: Esfandiar Manii
>Priority: Minor
> Fix For: 3.2.0
>
> Attachments: HADOOP-15506-001.patch
>
>
> - Upgraded Azure Storage Sdk to 7.0.0
> - Fixed code issues and couple of tests



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15515) adl.AdlFilesystem.close() doesn't release locks on open files

2018-06-06 Thread Chris Douglas (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16503930#comment-16503930
 ] 

Chris Douglas commented on HADOOP-15515:


I looked more closely at the synchronization, and there may be some edge cases 
that aren't covered. Specifically:

* if {{close}} is holding the lock on {{openFileStreams}} while cleaning up, if 
another thread is waiting on its monitor then it may add elements to the 
collection after the filesystem is closed.
* In {{close}}, exceptions while closing streams may prevent the intended 
behavior. While {{close}} should be 
[idempotent|https://docs.oracle.com/javase/8/docs/api/java/io/Closeable.html#close--]
 (so a closed stream won't put this in an unrecoverable state), if any stream 
throws an exception, then subsequent entries will not be closed. The intended 
behavior is probably closer to {{IOUtils::cleanup}}, which swallows 
{{IOException}}.
* The lock order (i.e., hold {{openFileStreams}} while closing a stream, or 
hold one at at time) should be documented.
* {{openFileStreams}} can be a final field

> adl.AdlFilesystem.close() doesn't release locks on open files
> -
>
> Key: HADOOP-15515
> URL: https://issues.apache.org/jira/browse/HADOOP-15515
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/adl
>Affects Versions: 2.7.3
> Environment: HDInsight on MS Azure:
>  
> Hadoop 2.7.3.2.6.2.25-1
> Subversion g...@github.com:hortonworks/hadoop.git -r 
> 1ceeb58bb3bb5904df0cbb7983389bcaf2ffd0b6
> Compiled by jenkins on 2017-11-29T15:28Z
> Compiled with protoc 2.5.0
> From source with checksum 90b73c4c185645c1f47b61f942230
> This command was run using 
> /usr/hdp/2.6.2.25-1/hadoop/hadoop-common-2.7.3.2.6.2.25-1.jar
>Reporter: Jay Hankinson
>Assignee: Vishwajeet Dusane
>Priority: Major
> Attachments: HDFS-13344-001.patch, HDFS-13344-002.patch
>
>
> If you write to a file on and Azure ADL filesystem and close the file system 
> but not the file before the process exits, the next time you try open the 
> file for append it fails with:
> Exception in thread "main" java.io.IOException: APPEND failed with error 
> 0x83090a16 (Failed to perform the requested operation because the file is 
> currently open in write mode by another user or process.). 
> [a67c6b32-e78b-4852-9fac-142a3e2ba963][2018-03-22T20:54:08.3520940-07:00]
>  The following moves local file to HDFS if it doesn't exist or appends it's 
> contents if it does:
>  
> {code:java}
> public void addFile(String source, String dest, Configuration conf) throws 
> IOException {
> FileSystem fileSystem = FileSystem.get(conf);
> // Get the filename out of the file path
> String filename = source.substring(source.lastIndexOf('/') + 
> 1,source.length());
> // Create the destination path including the filename.
> if (dest.charAt(dest.length() - 1) != '/')
> { dest = dest + "/" + filename; }
> else {
> dest = dest + filename;
> }
> // Check if the file already exists
> Path path = new Path(dest);
> FSDataOutputStream out;
> if (fileSystem.exists(path)) {
> System.out.println("File " + dest + " already exists appending");
> out = fileSystem.append(path);
> } else {
> out = fileSystem.create(path);
> }
> // Create a new file and write data to it.
> InputStream in = new BufferedInputStream(new FileInputStream(new File(
> source)));
> byte[] b = new byte[1024];
> int numBytes = 0;
> while ((numBytes = in.read(b)) > 0) {
> out.write(b, 0, numBytes);
> }
> // Close the file system not the file
> in.close();
> //out.close();
> fileSystem.close();
> }
> {code}
>  If "dest" is an adl:// location, invoking the function a second time (after 
> the process has exited) it raises the error. If it's a regular hdfs:// file 
> system, it doesn't as all the locks are released. The same exception is also 
> raised if a subsequent append is done using: hdfs dfs  -appendToFile.
> As I can't see a way to force lease recovery in this situation, this seems 
> like a bug. org.apache.hadoop.fs.adl.AdlFileSystem inherits close() from 
> org.apache.hadoop.fs.FileSystem
> [https://hadoop.apache.org/docs/r3.0.0/api/org/apache/hadoop/fs/adl/AdlFileSystem.html]
> Which states:
> Close this FileSystem instance. Will release any held locks. This does not 
> seem to be the case



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15515) adl.AdlFilesystem.close() doesn't release locks on open files

2018-06-05 Thread Chris Douglas (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16502310#comment-16502310
 ] 

Chris Douglas commented on HADOOP-15515:


Moving to common, as this doesn't affect HDFS. v002 seems to implement a 
pattern similar to 
[DFSClient|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java#L626].

bq. As I can't see a way to force lease recovery in this situation
As long as this is only an optimization- and a client failing to release the 
lock won't wedge the system- this looks OK. I checked the [client 
docs|https://azure.github.io/azure-data-lake-store-java/javadoc/], and it looks 
like the ADL client doesn't have a {{close}} method, so presumably it's not 
tracking open streams and this is correct.

If there's no other feedback I'll commit this soon.

> adl.AdlFilesystem.close() doesn't release locks on open files
> -
>
> Key: HADOOP-15515
> URL: https://issues.apache.org/jira/browse/HADOOP-15515
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/adl
>Affects Versions: 2.7.3
> Environment: HDInsight on MS Azure:
>  
> Hadoop 2.7.3.2.6.2.25-1
> Subversion g...@github.com:hortonworks/hadoop.git -r 
> 1ceeb58bb3bb5904df0cbb7983389bcaf2ffd0b6
> Compiled by jenkins on 2017-11-29T15:28Z
> Compiled with protoc 2.5.0
> From source with checksum 90b73c4c185645c1f47b61f942230
> This command was run using 
> /usr/hdp/2.6.2.25-1/hadoop/hadoop-common-2.7.3.2.6.2.25-1.jar
>Reporter: Jay Hankinson
>Assignee: Vishwajeet Dusane
>Priority: Major
> Attachments: HDFS-13344-001.patch, HDFS-13344-002.patch
>
>
> If you write to a file on and Azure ADL filesystem and close the file system 
> but not the file before the process exits, the next time you try open the 
> file for append it fails with:
> Exception in thread "main" java.io.IOException: APPEND failed with error 
> 0x83090a16 (Failed to perform the requested operation because the file is 
> currently open in write mode by another user or process.). 
> [a67c6b32-e78b-4852-9fac-142a3e2ba963][2018-03-22T20:54:08.3520940-07:00]
>  The following moves local file to HDFS if it doesn't exist or appends it's 
> contents if it does:
>  
> {code:java}
> public void addFile(String source, String dest, Configuration conf) throws 
> IOException {
> FileSystem fileSystem = FileSystem.get(conf);
> // Get the filename out of the file path
> String filename = source.substring(source.lastIndexOf('/') + 
> 1,source.length());
> // Create the destination path including the filename.
> if (dest.charAt(dest.length() - 1) != '/')
> { dest = dest + "/" + filename; }
> else {
> dest = dest + filename;
> }
> // Check if the file already exists
> Path path = new Path(dest);
> FSDataOutputStream out;
> if (fileSystem.exists(path)) {
> System.out.println("File " + dest + " already exists appending");
> out = fileSystem.append(path);
> } else {
> out = fileSystem.create(path);
> }
> // Create a new file and write data to it.
> InputStream in = new BufferedInputStream(new FileInputStream(new File(
> source)));
> byte[] b = new byte[1024];
> int numBytes = 0;
> while ((numBytes = in.read(b)) > 0) {
> out.write(b, 0, numBytes);
> }
> // Close the file system not the file
> in.close();
> //out.close();
> fileSystem.close();
> }
> {code}
>  If "dest" is an adl:// location, invoking the function a second time (after 
> the process has exited) it raises the error. If it's a regular hdfs:// file 
> system, it doesn't as all the locks are released. The same exception is also 
> raised if a subsequent append is done using: hdfs dfs  -appendToFile.
> As I can't see a way to force lease recovery in this situation, this seems 
> like a bug. org.apache.hadoop.fs.adl.AdlFileSystem inherits close() from 
> org.apache.hadoop.fs.FileSystem
> [https://hadoop.apache.org/docs/r3.0.0/api/org/apache/hadoop/fs/adl/AdlFileSystem.html]
> Which states:
> Close this FileSystem instance. Will release any held locks. This does not 
> seem to be the case



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Moved] (HADOOP-15515) adl.AdlFilesystem.close() doesn't release locks on open files

2018-06-05 Thread Chris Douglas (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas moved HDFS-13344 to HADOOP-15515:
---

Affects Version/s: (was: 2.7.3)
   2.7.3
  Component/s: (was: fs/adl)
   fs/adl
  Key: HADOOP-15515  (was: HDFS-13344)
  Project: Hadoop Common  (was: Hadoop HDFS)

> adl.AdlFilesystem.close() doesn't release locks on open files
> -
>
> Key: HADOOP-15515
> URL: https://issues.apache.org/jira/browse/HADOOP-15515
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/adl
>Affects Versions: 2.7.3
> Environment: HDInsight on MS Azure:
>  
> Hadoop 2.7.3.2.6.2.25-1
> Subversion g...@github.com:hortonworks/hadoop.git -r 
> 1ceeb58bb3bb5904df0cbb7983389bcaf2ffd0b6
> Compiled by jenkins on 2017-11-29T15:28Z
> Compiled with protoc 2.5.0
> From source with checksum 90b73c4c185645c1f47b61f942230
> This command was run using 
> /usr/hdp/2.6.2.25-1/hadoop/hadoop-common-2.7.3.2.6.2.25-1.jar
>Reporter: Jay Hankinson
>Assignee: Vishwajeet Dusane
>Priority: Major
> Attachments: HDFS-13344-001.patch, HDFS-13344-002.patch
>
>
> If you write to a file on and Azure ADL filesystem and close the file system 
> but not the file before the process exits, the next time you try open the 
> file for append it fails with:
> Exception in thread "main" java.io.IOException: APPEND failed with error 
> 0x83090a16 (Failed to perform the requested operation because the file is 
> currently open in write mode by another user or process.). 
> [a67c6b32-e78b-4852-9fac-142a3e2ba963][2018-03-22T20:54:08.3520940-07:00]
>  The following moves local file to HDFS if it doesn't exist or appends it's 
> contents if it does:
>  
> {code:java}
> public void addFile(String source, String dest, Configuration conf) throws 
> IOException {
> FileSystem fileSystem = FileSystem.get(conf);
> // Get the filename out of the file path
> String filename = source.substring(source.lastIndexOf('/') + 
> 1,source.length());
> // Create the destination path including the filename.
> if (dest.charAt(dest.length() - 1) != '/')
> { dest = dest + "/" + filename; }
> else {
> dest = dest + filename;
> }
> // Check if the file already exists
> Path path = new Path(dest);
> FSDataOutputStream out;
> if (fileSystem.exists(path)) {
> System.out.println("File " + dest + " already exists appending");
> out = fileSystem.append(path);
> } else {
> out = fileSystem.create(path);
> }
> // Create a new file and write data to it.
> InputStream in = new BufferedInputStream(new FileInputStream(new File(
> source)));
> byte[] b = new byte[1024];
> int numBytes = 0;
> while ((numBytes = in.read(b)) > 0) {
> out.write(b, 0, numBytes);
> }
> // Close the file system not the file
> in.close();
> //out.close();
> fileSystem.close();
> }
> {code}
>  If "dest" is an adl:// location, invoking the function a second time (after 
> the process has exited) it raises the error. If it's a regular hdfs:// file 
> system, it doesn't as all the locks are released. The same exception is also 
> raised if a subsequent append is done using: hdfs dfs  -appendToFile.
> As I can't see a way to force lease recovery in this situation, this seems 
> like a bug. org.apache.hadoop.fs.adl.AdlFileSystem inherits close() from 
> org.apache.hadoop.fs.FileSystem
> [https://hadoop.apache.org/docs/r3.0.0/api/org/apache/hadoop/fs/adl/AdlFileSystem.html]
> Which states:
> Close this FileSystem instance. Will release any held locks. This does not 
> seem to be the case



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15429) unsynchronized index causes DataInputByteBuffer$Buffer.read hangs

2018-04-30 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16458852#comment-16458852
 ] 

Chris Douglas commented on HADOOP-15429:


{{DataInputByteBuffer}} is not threadsafe. What is the use case?

> unsynchronized index causes DataInputByteBuffer$Buffer.read hangs
> -
>
> Key: HADOOP-15429
> URL: https://issues.apache.org/jira/browse/HADOOP-15429
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: io
>Affects Versions: 0.23.0
>Reporter: John Doe
>Priority: Minor
>
> In DataInputByteBuffer$Buffer class, the fields bidx and buffers, etc are 
> unsynchronized when used in read() and reset() function. In certain 
> circumstances, e.g., the reset() is invoked in a loop, the unsynchronized 
> bidx and buffers can trigger a concurrency bug.
> Here is the code snippet.
> {code:java}
> ByteBuffer[] buffers = new ByteBuffer[0];
> int bidx, pos, length;
> @Override
> public int read(byte[] b, int off, int len) {
>   if (bidx >= buffers.length) {
> return -1;
>   }
>   int cur = 0;
>   do {
> int rem = Math.min(len, buffers[bidx].remaining());
> buffers[bidx].get(b, off, rem);
> cur += rem;
> off += rem;
> len -= rem;
>   } while (len > 0 && ++bidx < buffers.length); //bidx is unsynchronized
>   pos += cur;
>   return cur;
> }
> public void reset(ByteBuffer[] buffers) {//if one thread keeps calling 
> reset() in a loop
>   bidx = pos = length = 0;
>   this.buffers = buffers;
>   for (ByteBuffer b : buffers) {
> length += b.remaining();
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-11640) add user defined delimiter support to Configuration

2018-04-19 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1641#comment-1641
 ] 

Chris Douglas commented on HADOOP-11640:


Understood. I don't think MAPREDUCE-7069 covers this.

> add user defined delimiter support to Configuration
> ---
>
> Key: HADOOP-11640
> URL: https://issues.apache.org/jira/browse/HADOOP-11640
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 2.6.0
>Reporter: Xiaoshuang LU
>Assignee: Xiaoshuang LU
>Priority: Major
>  Labels: BB2015-05-TBR
> Attachments: HADOOP-11640.patch
>
>
> As mentioned by org.apache.hadoop.conf.Configuration.getStrings ("Get the 
> comma delimited values of the name property as an array of Strings"), only 
> comma separated strings can be used.  It would be much better if user defined 
> separators are supported.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-11640) add user defined delimiter support to Configuration

2018-04-17 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16441628#comment-16441628
 ] 

Chris Douglas commented on HADOOP-11640:


[~Jim_Brennan] {{Configuration::getStrings}} is a convenience method, but the 
delimiter is neither configurable nor escapable. This JIRA proposes to enhance 
the convenience method. MAPREDUCE-7069 covers this? I skimmed the patch, but 
might have missed the relevant code.

> add user defined delimiter support to Configuration
> ---
>
> Key: HADOOP-11640
> URL: https://issues.apache.org/jira/browse/HADOOP-11640
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 2.6.0
>Reporter: Xiaoshuang LU
>Assignee: Xiaoshuang LU
>Priority: Major
>  Labels: BB2015-05-TBR
> Attachments: HADOOP-11640.patch
>
>
> As mentioned by org.apache.hadoop.conf.Configuration.getStrings ("Get the 
> comma delimited values of the name property as an array of Strings"), only 
> comma separated strings can be used.  It would be much better if user defined 
> separators are supported.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13617) Swift client retrying original request is using expired token after re-authentication

2018-04-05 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16427416#comment-16427416
 ] 

Chris Douglas commented on HADOOP-13617:


[~ste...@apache.org] do you have cycles for this?

> Swift client retrying original request is using expired token after 
> re-authentication 
> --
>
> Key: HADOOP-13617
> URL: https://issues.apache.org/jira/browse/HADOOP-13617
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/swift
>Affects Versions: 2.6.0
> Environment: Linux EL6
>Reporter: Steve Yang
>Assignee: Yulei Li
>Priority: Blocker
>  Labels: patch
> Attachments: 2016_09_13.stderrout.log, HADOOP-13617.patch
>
>
> library used: org.apache.hadoop:hadoop-openstack:2.6.0
> For long running Swift read operation (e.g., reading a large container), the 
> issued auth token has at most 30 minutes life span from Oracle Storage 
> Service. If the token expired in the middle of the read operation the 
> SwiftRestClient 
> (https://github.com/apache/hadoop/blob/release-2.6.0/hadoop-tools/hadoop-openstack/src/main/java/org/apache/hadoop/fs/swift/http/SwiftRestClient.java#L1701)
>  re-authenticate and acquire a new auth token. However, in the retry request 
> the old, expired token is still used, causing the whole operation to fail.
> Because of this bug any meaningful(i.e., long-running) Swift operation is not 
> possible.
> Here is a summary of what happened with DEBUG logging turned on:
> ==
> 1. initial token acquired which will expire on 19:56:44(PDT; UTC-4):
> ---
> 2016-09-13 19:52:37 DEBUG [pool-3-thread-1] SwiftRestClient:268 - setAuth:
> endpoint=https://em2.storage.oraclecloud.com/v1/Storage-paas132;
> objectURI=https://em2.storage.oraclecloud.com/object_endpoint/null;
> token=AccessToken{id='AUTH_tk2dd9d639bbb992089dca008123c3046f',
> tenant=org.apache.hadoop.fs.swift.auth.entities.Tenant@af28493,
> expires='2016-09-13T23:56:44Z'}
> 2. token expiration and re-authentication:
> --
> 2016-09-13 19:56:44 DEBUG [pool-3-thread-1] SwiftRestClient:1727 - GET
> https://em2.storage.oraclecloud.com/v1/Storage-paas132/allTaxi/?prefix=000182/=json=/
> X-Auth-Token: AUTH_tk2dd9d639bbb992089dca008123c3046f
> User-Agent: Apache Hadoop Swift Client 2.6.0-cdh5.7.1 from
> ae44a8970a3f0da58d82e0fc65275fff8deabffd by jenkins source checksum
> 298b68dc3b308983f04cb37e8416f13
> .
> 2016-09-13 19:56:44 WARN [pool-3-thread-1] HttpMethodDirector:697 - Unable
> to respond to any of these challenges: {token=Token}
> 2016-09-13 19:56:44 DEBUG [pool-3-thread-1] SwiftRestClient:1731 - Status
> code = 401
> 2016-09-13 19:56:44 DEBUG [pool-3-thread-1] SwiftRestClient:1698 -
> Reauthenticating
> 2016-09-13 19:56:44 DEBUG [pool-3-thread-1] SwiftRestClient:1079 - started
> authentication
> 2016-09-13 19:56:44 DEBUG [pool-3-thread-1] SwiftRestClient:1228 -
> Authenticating with Authenticate as tenant 'Storage-paas132' user
> 'radha.sriniva...@oracle.com' with password of length 9
> 2016-09-13 19:56:44 DEBUG [pool-3-thread-1] SwiftRestClient:1727 - POST
> https://em2.storage.oraclecloud.com/auth/v2.0/tokens
> User-Agent: Apache Hadoop Swift Client 2.6.0-cdh5.7.1 from
> ae44a8970a3f0da58d82e0fc65275fff8deabffd by jenkins source checksum
> 298b68dc3b308983f04cb37e8416f13
> .
> 2016-09-13 19:56:45 DEBUG [pool-3-thread-1] SwiftRestClient:1731 - Status
> code = 200
> 2016-09-13 19:56:45 DEBUG [pool-3-thread-1] SwiftRestClient:1149 - Catalog
> entry [swift: object-store];
> 2016-09-13 19:56:45 DEBUG [pool-3-thread-1] SwiftRestClient:1156 - Found
> swift catalog as swift => object-store
> 2016-09-13 19:56:45 DEBUG [pool-3-thread-1] SwiftRestClient:1169 - Endpoint
> [US => https://em2.storage.oraclecloud.com/v1/Storage-paas132 / null];
> 2016-09-13 19:56:45 DEBUG [pool-3-thread-1] SwiftRestClient:268 - setAuth:
> endpoint=https://em2.storage.oraclecloud.com/v1/Storage-paas132;
> objectURI=https://em2.storage.oraclecloud.com/object_endpoint/null;
> token=AccessToken{id='AUTH_tk56bbb4d6fef57b7eeba7acae598f837c',
> tenant=org.apache.hadoop.fs.swift.auth.entities.Tenant@4f03838d,
> expires='2016-09-14T00:26:45Z'}
> 2016-09-13 19:56:45 DEBUG [pool-3-thread-1] SwiftRestClient:1216 -
> authenticated against https://em2.storage.oraclecloud.com/v1/Storage-paas132.
> 2016-09-13 19:56:45 DEBUG [pool-3-thread-1] SwiftRestClient:1727 - HEAD
> https://em2.storage.oraclecloud.com/v1/Storage-paas132/allTaxi/
> X-Newest: true
> X-Auth-Token: AUTH_tk56bbb4d6fef57b7eeba7acae598f837c
> User-Agent: Apache Hadoop Swift Client 2.6.0-cdh5.7.1 from
> ae44a8970a3f0da58d82e0fc65275fff8deabffd by jenkins 

[jira] [Updated] (HADOOP-15320) Remove customized getFileBlockLocations for hadoop-azure and hadoop-azure-datalake

2018-03-28 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated HADOOP-15320:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.9.1
   3.1.0
   Status: Resolved  (was: Patch Available)

I committed this. Thanks [~shanyu]

> Remove customized getFileBlockLocations for hadoop-azure and 
> hadoop-azure-datalake
> --
>
> Key: HADOOP-15320
> URL: https://issues.apache.org/jira/browse/HADOOP-15320
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/adl, fs/azure
>Affects Versions: 2.7.3, 2.9.0, 3.0.0
>Reporter: shanyu zhao
>Assignee: shanyu zhao
>Priority: Major
> Fix For: 3.1.0, 2.9.1
>
> Attachments: HADOOP-15320.01.patch, HADOOP-15320.patch
>
>
> hadoop-azure and hadoop-azure-datalake have its own implementation of 
> getFileBlockLocations(), which faked a list of artificial blocks based on the 
> hard-coded block size. And each block has one host with name "localhost". 
> Take a look at this code:
> [https://github.com/apache/hadoop/blob/release-2.9.0-RC3/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azure/NativeAzureFileSystem.java#L3485]
> This is a unnecessary mock up for a "remote" file system to mimic HDFS. And 
> the problem with this mock is that for large (~TB) files we generates lots of 
> artificial blocks, and FileInputFormat.getSplits() is slow in calculating 
> splits based on these blocks.
> We can safely remove this customized getFileBlockLocations() implementation, 
> fall back to the default FileSystem.getFileBlockLocations() implementation, 
> which is to return 1 block for any file with 1 host "localhost". Note that 
> this doesn't mean we will create much less splits, because the number of 
> splits is still limited by the blockSize in 
> FileInputFormat.computeSplitSize():
> {code:java}
> return Math.max(minSize, Math.min(goalSize, blockSize));{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15320) Remove customized getFileBlockLocations for hadoop-azure and hadoop-azure-datalake

2018-03-27 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416516#comment-16416516
 ] 

Chris Douglas commented on HADOOP-15320:


Fixed checkstyle warnings. +1 from me. [~ste...@apache.org], lgty?

> Remove customized getFileBlockLocations for hadoop-azure and 
> hadoop-azure-datalake
> --
>
> Key: HADOOP-15320
> URL: https://issues.apache.org/jira/browse/HADOOP-15320
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/adl, fs/azure
>Affects Versions: 2.7.3, 2.9.0, 3.0.0
>Reporter: shanyu zhao
>Assignee: shanyu zhao
>Priority: Major
> Attachments: HADOOP-15320.01.patch, HADOOP-15320.patch
>
>
> hadoop-azure and hadoop-azure-datalake have its own implementation of 
> getFileBlockLocations(), which faked a list of artificial blocks based on the 
> hard-coded block size. And each block has one host with name "localhost". 
> Take a look at this code:
> [https://github.com/apache/hadoop/blob/release-2.9.0-RC3/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azure/NativeAzureFileSystem.java#L3485]
> This is a unnecessary mock up for a "remote" file system to mimic HDFS. And 
> the problem with this mock is that for large (~TB) files we generates lots of 
> artificial blocks, and FileInputFormat.getSplits() is slow in calculating 
> splits based on these blocks.
> We can safely remove this customized getFileBlockLocations() implementation, 
> fall back to the default FileSystem.getFileBlockLocations() implementation, 
> which is to return 1 block for any file with 1 host "localhost". Note that 
> this doesn't mean we will create much less splits, because the number of 
> splits is still limited by the blockSize in 
> FileInputFormat.computeSplitSize():
> {code:java}
> return Math.max(minSize, Math.min(goalSize, blockSize));{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15320) Remove customized getFileBlockLocations for hadoop-azure and hadoop-azure-datalake

2018-03-27 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated HADOOP-15320:
---
Attachment: HADOOP-15320.01.patch

> Remove customized getFileBlockLocations for hadoop-azure and 
> hadoop-azure-datalake
> --
>
> Key: HADOOP-15320
> URL: https://issues.apache.org/jira/browse/HADOOP-15320
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/adl, fs/azure
>Affects Versions: 2.7.3, 2.9.0, 3.0.0
>Reporter: shanyu zhao
>Assignee: shanyu zhao
>Priority: Major
> Attachments: HADOOP-15320.01.patch, HADOOP-15320.patch
>
>
> hadoop-azure and hadoop-azure-datalake have its own implementation of 
> getFileBlockLocations(), which faked a list of artificial blocks based on the 
> hard-coded block size. And each block has one host with name "localhost". 
> Take a look at this code:
> [https://github.com/apache/hadoop/blob/release-2.9.0-RC3/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azure/NativeAzureFileSystem.java#L3485]
> This is a unnecessary mock up for a "remote" file system to mimic HDFS. And 
> the problem with this mock is that for large (~TB) files we generates lots of 
> artificial blocks, and FileInputFormat.getSplits() is slow in calculating 
> splits based on these blocks.
> We can safely remove this customized getFileBlockLocations() implementation, 
> fall back to the default FileSystem.getFileBlockLocations() implementation, 
> which is to return 1 block for any file with 1 host "localhost". Note that 
> this doesn't mean we will create much less splits, because the number of 
> splits is still limited by the blockSize in 
> FileInputFormat.computeSplitSize():
> {code:java}
> return Math.max(minSize, Math.min(goalSize, blockSize));{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15320) Remove customized getFileBlockLocations for hadoop-azure and hadoop-azure-datalake

2018-03-27 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416424#comment-16416424
 ] 

Chris Douglas commented on HADOOP-15320:


Thanks [~shanyu]. Running this through Jenkins.

We could add a unit test to signal that a change to the default behavior could 
affect these FS implementations, but that should be implied.

> Remove customized getFileBlockLocations for hadoop-azure and 
> hadoop-azure-datalake
> --
>
> Key: HADOOP-15320
> URL: https://issues.apache.org/jira/browse/HADOOP-15320
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/adl, fs/azure
>Affects Versions: 2.7.3, 2.9.0, 3.0.0
>Reporter: shanyu zhao
>Assignee: shanyu zhao
>Priority: Major
> Attachments: HADOOP-15320.patch
>
>
> hadoop-azure and hadoop-azure-datalake have its own implementation of 
> getFileBlockLocations(), which faked a list of artificial blocks based on the 
> hard-coded block size. And each block has one host with name "localhost". 
> Take a look at this code:
> [https://github.com/apache/hadoop/blob/release-2.9.0-RC3/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azure/NativeAzureFileSystem.java#L3485]
> This is a unnecessary mock up for a "remote" file system to mimic HDFS. And 
> the problem with this mock is that for large (~TB) files we generates lots of 
> artificial blocks, and FileInputFormat.getSplits() is slow in calculating 
> splits based on these blocks.
> We can safely remove this customized getFileBlockLocations() implementation, 
> fall back to the default FileSystem.getFileBlockLocations() implementation, 
> which is to return 1 block for any file with 1 host "localhost". Note that 
> this doesn't mean we will create much less splits, because the number of 
> splits is still limited by the blockSize in 
> FileInputFormat.computeSplitSize():
> {code:java}
> return Math.max(minSize, Math.min(goalSize, blockSize));{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15320) Remove customized getFileBlockLocations for hadoop-azure and hadoop-azure-datalake

2018-03-27 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated HADOOP-15320:
---
Status: Patch Available  (was: Open)

> Remove customized getFileBlockLocations for hadoop-azure and 
> hadoop-azure-datalake
> --
>
> Key: HADOOP-15320
> URL: https://issues.apache.org/jira/browse/HADOOP-15320
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/adl, fs/azure
>Affects Versions: 3.0.0, 2.9.0, 2.7.3
>Reporter: shanyu zhao
>Assignee: shanyu zhao
>Priority: Major
> Attachments: HADOOP-15320.patch
>
>
> hadoop-azure and hadoop-azure-datalake have its own implementation of 
> getFileBlockLocations(), which faked a list of artificial blocks based on the 
> hard-coded block size. And each block has one host with name "localhost". 
> Take a look at this code:
> [https://github.com/apache/hadoop/blob/release-2.9.0-RC3/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azure/NativeAzureFileSystem.java#L3485]
> This is a unnecessary mock up for a "remote" file system to mimic HDFS. And 
> the problem with this mock is that for large (~TB) files we generates lots of 
> artificial blocks, and FileInputFormat.getSplits() is slow in calculating 
> splits based on these blocks.
> We can safely remove this customized getFileBlockLocations() implementation, 
> fall back to the default FileSystem.getFileBlockLocations() implementation, 
> which is to return 1 block for any file with 1 host "localhost". Note that 
> this doesn't mean we will create much less splits, because the number of 
> splits is still limited by the blockSize in 
> FileInputFormat.computeSplitSize():
> {code:java}
> return Math.max(minSize, Math.min(goalSize, blockSize));{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15320) Remove customized getFileBlockLocations for hadoop-azure and hadoop-azure-datalake

2018-03-23 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16411827#comment-16411827
 ] 

Chris Douglas commented on HADOOP-15320:


bq. I know s3 "appears" to work, but I'm not actually confident that everything 
is getting the splits right there.
Me either, but 1.5h to generate synthetic splits is definitely wrong. If we 
develop a new, best practice for object stores, then we can apply that across 
the stores we support. The {{Blocklocations[]}} return type is pretty 
restrictive, but we could probably do better.

bq. The one I want you look at is: Spark, CSV, multiGB: SPARK-22240 . That's 
what's been niggling at me for a while.
Maybe I'm missing the bug. Block locations are hints for locality, not format 
partitioning. In that JIRA: gzip is not splittable, so a single reader is 
correct, absent some other preparation (saving the dictionary at offsets, 
writing zero-length gzip files as split markers, etc.). In general, framework 
parallelism should not rely exclusively on block locations...

> Remove customized getFileBlockLocations for hadoop-azure and 
> hadoop-azure-datalake
> --
>
> Key: HADOOP-15320
> URL: https://issues.apache.org/jira/browse/HADOOP-15320
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/adl, fs/azure
>Affects Versions: 2.7.3, 2.9.0, 3.0.0
>Reporter: shanyu zhao
>Assignee: shanyu zhao
>Priority: Major
> Attachments: HADOOP-15320.patch
>
>
> hadoop-azure and hadoop-azure-datalake have its own implementation of 
> getFileBlockLocations(), which faked a list of artificial blocks based on the 
> hard-coded block size. And each block has one host with name "localhost". 
> Take a look at this code:
> [https://github.com/apache/hadoop/blob/release-2.9.0-RC3/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azure/NativeAzureFileSystem.java#L3485]
> This is a unnecessary mock up for a "remote" file system to mimic HDFS. And 
> the problem with this mock is that for large (~TB) files we generates lots of 
> artificial blocks, and FileInputFormat.getSplits() is slow in calculating 
> splits based on these blocks.
> We can safely remove this customized getFileBlockLocations() implementation, 
> fall back to the default FileSystem.getFileBlockLocations() implementation, 
> which is to return 1 block for any file with 1 host "localhost". Note that 
> this doesn't mean we will create much less splits, because the number of 
> splits is still limited by the blockSize in 
> FileInputFormat.computeSplitSize():
> {code:java}
> return Math.max(minSize, Math.min(goalSize, blockSize));{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14600) LocatedFileStatus constructor forces RawLocalFS to exec a process to get the permissions

2018-03-23 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16411719#comment-16411719
 ] 

Chris Douglas commented on HADOOP-14600:


bq. Without realizing you had already fixed this issue, I put a patch up in 
HADOOP-15337 that shows the alternative implementation approach. Sorry again 
for being so late to the conversation, not intending to step on toes.

No worries. If the approach in HADOOP-15337 is more portable, performs 
comparably, and is equivalent, then let's take the simpler approach. Where 
applicable, do the unit tests on this issue pass in HADOOP-15337, on 
Linux/MacOS/Windows?

> LocatedFileStatus constructor forces RawLocalFS to exec a process to get the 
> permissions
> 
>
> Key: HADOOP-14600
> URL: https://issues.apache.org/jira/browse/HADOOP-14600
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 2.7.3
> Environment: file:// in a dir with many files
>Reporter: Steve Loughran
>Assignee: Ping Liu
>Priority: Major
> Fix For: 3.1.0
>
> Attachments: HADOOP-14600.001.patch, HADOOP-14600.002.patch, 
> HADOOP-14600.003.patch, HADOOP-14600.004.patch, HADOOP-14600.005.patch, 
> HADOOP-14600.006.patch, HADOOP-14600.007.patch, HADOOP-14600.008.patch, 
> HADOOP-14600.009.patch, TestRawLocalFileSystemContract.java, 
> command_line_test_result__linux.txt, command_line_test_result__windows.txt
>
>
> Reported in SPARK-21137. a {{FileSystem.listStatus}} call really craws 
> against the local FS, because {{FileStatus.getPemissions}} call forces  
> {{DeprecatedRawLocalFileStatus}} tp spawn a process to read the real UGI 
> values.
> That is: for every other FS, what's a field lookup or even a no-op, on the 
> local FS it's a process exec/spawn, with all the costs. This gets expensive 
> if you have many files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15320) Remove customized getFileBlockLocations for hadoop-azure and hadoop-azure-datalake

2018-03-22 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410265#comment-16410265
 ] 

Chris Douglas commented on HADOOP-15320:


bq. Anything else we should run?
As [~ste...@apache.org] suggested, the hadoop-azure and hadoop-azuredatalake 
test suites and contract tests should pass.

> Remove customized getFileBlockLocations for hadoop-azure and 
> hadoop-azure-datalake
> --
>
> Key: HADOOP-15320
> URL: https://issues.apache.org/jira/browse/HADOOP-15320
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/adl, fs/azure
>Affects Versions: 2.7.3, 2.9.0, 3.0.0
>Reporter: shanyu zhao
>Assignee: shanyu zhao
>Priority: Major
> Attachments: HADOOP-15320.patch
>
>
> hadoop-azure and hadoop-azure-datalake have its own implementation of 
> getFileBlockLocations(), which faked a list of artificial blocks based on the 
> hard-coded block size. And each block has one host with name "localhost". 
> Take a look at this code:
> [https://github.com/apache/hadoop/blob/release-2.9.0-RC3/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azure/NativeAzureFileSystem.java#L3485]
> This is a unnecessary mock up for a "remote" file system to mimic HDFS. And 
> the problem with this mock is that for large (~TB) files we generates lots of 
> artificial blocks, and FileInputFormat.getSplits() is slow in calculating 
> splits based on these blocks.
> We can safely remove this customized getFileBlockLocations() implementation, 
> fall back to the default FileSystem.getFileBlockLocations() implementation, 
> which is to return 1 block for any file with 1 host "localhost". Note that 
> this doesn't mean we will create much less splits, because the number of 
> splits is still limited by the blockSize in 
> FileInputFormat.computeSplitSize():
> {code:java}
> return Math.max(minSize, Math.min(goalSize, blockSize));{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15334) Upgrade Maven surefire plugin

2018-03-22 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16409838#comment-16409838
 ] 

Chris Douglas commented on HADOOP-15334:


+1

> Upgrade Maven surefire plugin
> -
>
> Key: HADOOP-15334
> URL: https://issues.apache.org/jira/browse/HADOOP-15334
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 3.0.0
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
>Priority: Major
> Attachments: HADOOP-15334.01.patch
>
>
> Recent versions of the surefire plugin suppress summary test execution output 
> in quiet mode. This is now fixed in plugin version 2.21.0 (via SUREFIRE-1436).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15334) Upgrade Maven surefire plugin

2018-03-21 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16408834#comment-16408834
 ] 

Chris Douglas commented on HADOOP-15334:


Can we bring this back to branch-2, also?

> Upgrade Maven surefire plugin
> -
>
> Key: HADOOP-15334
> URL: https://issues.apache.org/jira/browse/HADOOP-15334
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 3.0.0
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
>Priority: Major
> Attachments: HADOOP-15334.01.patch
>
>
> Recent versions of the surefire plugin suppress summary test execution output 
> in quiet mode. This is now fixed in plugin version 2.21.0 (via SUREFIRE-1436).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15320) Remove customized getFileBlockLocations for hadoop-azure and hadoop-azure-datalake

2018-03-19 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16405583#comment-16405583
 ] 

Chris Douglas commented on HADOOP-15320:


What testing has been done with this, already?

bq. do think it will need be bounced past the various tools, including: hive, 
spark, pig to see that it all goes OK. But given S3A is using that default with 
no adverse consequences, I think you'll be right.
Wouldn't one expect the same results, if the pattern worked for S3A? One would 
expect to find framework code that is unnecessarily serial after this change. 
What tests did S3A run that should be repeated?

bq. which endpoints did you run the entire hadoop-azure and 
hadoop-azuredatalake test suites?
Running these integration tests is a good idea. It's why they're there, after 
all.

> Remove customized getFileBlockLocations for hadoop-azure and 
> hadoop-azure-datalake
> --
>
> Key: HADOOP-15320
> URL: https://issues.apache.org/jira/browse/HADOOP-15320
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/adl, fs/azure
>Affects Versions: 2.7.3, 2.9.0, 3.0.0
>Reporter: shanyu zhao
>Assignee: shanyu zhao
>Priority: Major
> Attachments: HADOOP-15320.patch
>
>
> hadoop-azure and hadoop-azure-datalake have its own implementation of 
> getFileBlockLocations(), which faked a list of artificial blocks based on the 
> hard-coded block size. And each block has one host with name "localhost". 
> Take a look at this code:
> [https://github.com/apache/hadoop/blob/release-2.9.0-RC3/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azure/NativeAzureFileSystem.java#L3485]
> This is a unnecessary mock up for a "remote" file system to mimic HDFS. And 
> the problem with this mock is that for large (~TB) files we generates lots of 
> artificial blocks, and FileInputFormat.getSplits() is slow in calculating 
> splits based on these blocks.
> We can safely remove this customized getFileBlockLocations() implementation, 
> fall back to the default FileSystem.getFileBlockLocations() implementation, 
> which is to return 1 block for any file with 1 host "localhost". Note that 
> this doesn't mean we will create much less splits, because the number of 
> splits is still limited by the blockSize in 
> FileInputFormat.computeSplitSize():
> {code:java}
> return Math.max(minSize, Math.min(goalSize, blockSize));{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14667) Flexible Visual Studio support

2018-03-19 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated HADOOP-14667:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.2.0
   Status: Resolved  (was: Patch Available)

Thanks, [~elgoiri].

I committed this. Thanks, Allen

> Flexible Visual Studio support
> --
>
> Key: HADOOP-14667
> URL: https://issues.apache.org/jira/browse/HADOOP-14667
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 3.0.0-beta1
> Environment: Windows
>Reporter: Allen Wittenauer
>Assignee: Allen Wittenauer
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: HADOOP-14667.00.patch, HADOOP-14667.01.patch, 
> HADOOP-14667.02.patch, HADOOP-14667.03.patch, HADOOP-14667.04.patch, 
> HADOOP-14667.05.patch
>
>
> Is it time to upgrade the Windows native project files to use something more 
> modern than Visual Studio 2010?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15311) HttpServer2 needs a way to configure the acceptor/selector count

2018-03-13 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated HADOOP-15311:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.2
   Status: Resolved  (was: Patch Available)

Fixed the checkstyle warning.

+1 I committed this. Thanks, Erik

> HttpServer2 needs a way to configure the acceptor/selector count
> 
>
> Key: HADOOP-15311
> URL: https://issues.apache.org/jira/browse/HADOOP-15311
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: common
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Fix For: 3.0.2
>
> Attachments: HADOOP-15311.000.patch, HADOOP-15311.001.patch, 
> HADOOP-15311.002.patch
>
>
> HttpServer2 starts up with some number of acceptors and selectors, but only 
> allows for the automatic configuration of these based off of the number of 
> available cores:
> {code:title=org.eclipse.jetty.server.ServerConnector}
> selectors > 0 ? selectors : Math.max(1, Math.min(4, 
> Runtime.getRuntime().availableProcessors() / 2)))
> {code}
> {code:title=org.eclipse.jetty.server.AbstractConnector}
> if (acceptors < 0) {
>   acceptors = Math.max(1, Math.min(4, cores / 8));
> }
> {code}
> A thread pool is started of size, at minimum, {{acceptors + selectors + 1}}, 
> so in addition to allowing for a higher tuning value under heavily loaded 
> environments, adding configurability for this enables tuning these values 
> down in resource constrained environments such as a MiniDFSCluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15311) HttpServer2 needs a way to configure the acceptor/selector count

2018-03-13 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated HADOOP-15311:
---
Attachment: HADOOP-15311.002.patch

> HttpServer2 needs a way to configure the acceptor/selector count
> 
>
> Key: HADOOP-15311
> URL: https://issues.apache.org/jira/browse/HADOOP-15311
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: common
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HADOOP-15311.000.patch, HADOOP-15311.001.patch, 
> HADOOP-15311.002.patch
>
>
> HttpServer2 starts up with some number of acceptors and selectors, but only 
> allows for the automatic configuration of these based off of the number of 
> available cores:
> {code:title=org.eclipse.jetty.server.ServerConnector}
> selectors > 0 ? selectors : Math.max(1, Math.min(4, 
> Runtime.getRuntime().availableProcessors() / 2)))
> {code}
> {code:title=org.eclipse.jetty.server.AbstractConnector}
> if (acceptors < 0) {
>   acceptors = Math.max(1, Math.min(4, cores / 8));
> }
> {code}
> A thread pool is started of size, at minimum, {{acceptors + selectors + 1}}, 
> so in addition to allowing for a higher tuning value under heavily loaded 
> environments, adding configurability for this enables tuning these values 
> down in resource constrained environments such as a MiniDFSCluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15311) HttpServer2 needs a way to configure the acceptor/selector count

2018-03-13 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397465#comment-16397465
 ] 

Chris Douglas commented on HADOOP-15311:


Since this is overwriting the {{server}} field that's set using 
{{\@BeforeClass}}, does that interfere with other tests?

> HttpServer2 needs a way to configure the acceptor/selector count
> 
>
> Key: HADOOP-15311
> URL: https://issues.apache.org/jira/browse/HADOOP-15311
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: common
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HADOOP-15311.000.patch
>
>
> HttpServer2 starts up with some number of acceptors and selectors, but only 
> allows for the automatic configuration of these based off of the number of 
> available cores:
> {code:title=org.eclipse.jetty.server.ServerConnector}
> selectors > 0 ? selectors : Math.max(1, Math.min(4, 
> Runtime.getRuntime().availableProcessors() / 2)))
> {code}
> {code:title=org.eclipse.jetty.server.AbstractConnector}
> if (acceptors < 0) {
>   acceptors = Math.max(1, Math.min(4, cores / 8));
> }
> {code}
> A thread pool is started of size, at minimum, {{acceptors + selectors + 1}}, 
> so in addition to allowing for a higher tuning value under heavily loaded 
> environments, adding configurability for this enables tuning these values 
> down in resource constrained environments such as a MiniDFSCluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15153) [branch-2.8] Increase heap memory to avoid the OOM in pre-commit

2018-03-12 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated HADOOP-15153:
---
Resolution: Duplicate
Status: Resolved  (was: Patch Available)

Committed as HADOOP-15279

> [branch-2.8] Increase heap memory to avoid the OOM in pre-commit
> 
>
> Key: HADOOP-15153
> URL: https://issues.apache.org/jira/browse/HADOOP-15153
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: build
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>Priority: Major
> Attachments: HADOOP-15153-branch-2.8.patch
>
>
> Refernce:
> https://builds.apache.org/job/PreCommit-HDFS-Build/22528/consoleFull
> https://builds.apache.org/job/PreCommit-HDFS-Build/22528/artifact/out/branch-mvninstall-root.txt
> {noformat}
> [ERROR] unable to create new native thread -> [Help 1]
> [ERROR] 
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR] 
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/OutOfMemoryError
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14696) parallel tests don't work for Windows

2018-03-12 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated HADOOP-14696:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.9.1
   Status: Resolved  (was: Patch Available)

+1 I committed this. Thanks, Allen.

[~ste...@apache.org], please reopen if the v07 patch breaks on OSX.

> parallel tests don't work for Windows
> -
>
> Key: HADOOP-14696
> URL: https://issues.apache.org/jira/browse/HADOOP-14696
> Project: Hadoop Common
>  Issue Type: Test
>  Components: test
>Affects Versions: 3.0.0-beta1
> Environment: Windows
>Reporter: Allen Wittenauer
>Assignee: Allen Wittenauer
>Priority: Minor
> Fix For: 2.9.1
>
> Attachments: HADOOP-14696-002.patch, HADOOP-14696-003.patch, 
> HADOOP-14696.00.patch, HADOOP-14696.01.patch, HADOOP-14696.04.patch, 
> HADOOP-14696.05.patch, HADOOP-14696.06.patch, HADOOP-14696.07.patch
>
>
> If hadoop-common-project/hadoop-common is run with the -Pparallel-tests flag, 
> it fails in create-parallel-tests-dirs from the pom.xml
> {code}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-antrun-plugin:1.7:run 
> (create-parallel-tests-dirs) on project hadoop-common: An Ant BuildException 
> has occured: Directory 
> F:\jenkins\jenkins-slave\workspace\hadoop-trunk-win\s\hadoop-common-project\hadoop-common\jenkinsjenkins-slaveworkspacehadoop-trunk-winshadoop-common-projecthadoop-common
> arget\test\data\1 creation was not successful for an unknown reason
> [ERROR] around Ant part 

[jira] [Updated] (HADOOP-14742) Document multi-URI replication Inode for ViewFS

2018-03-12 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated HADOOP-14742:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.2
   Status: Resolved  (was: Patch Available)

+1 I committed this. Thanks, Gera

> Document multi-URI replication Inode for ViewFS
> ---
>
> Key: HADOOP-14742
> URL: https://issues.apache.org/jira/browse/HADOOP-14742
> Project: Hadoop Common
>  Issue Type: Task
>  Components: documentation, viewfs
>Affects Versions: 3.0.0-beta1
>Reporter: Chris Douglas
>Assignee: Gera Shegalov
>Priority: Major
> Fix For: 3.0.2
>
> Attachments: HADOOP-14742.001.patch, HADOOP-14742.002.patch
>
>
> HADOOP-12077 added client-side "replication" capabilities to ViewFS. Its 
> semantics and configuration should be documented.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15292) Distcp's use of pread is slowing it down.

2018-03-07 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390440#comment-16390440
 ] 

Chris Douglas commented on HADOOP-15292:


+1 

> Distcp's use of pread is slowing it down.
> -
>
> Key: HADOOP-15292
> URL: https://issues.apache.org/jira/browse/HADOOP-15292
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 2.5.0
>Reporter: Virajith Jalaparti
>Priority: Minor
> Attachments: HADOOP-15292.000.patch, HADOOP-15292.001.patch, 
> HADOOP-15292.002.patch
>
>
> Distcp currently uses positioned-reads (in 
> RetriableFileCopyCommand#copyBytes) when the source offset is > 0. This 
> results in unnecessary overheads (new BlockReader being created on the 
> client-side, multiple readBlock() calls to the Datanodes, each of which 
> requires the creation of a BlockSender and an inputstream to the ReplicaInfo).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15292) Distcp's use of pread is slowing it down.

2018-03-06 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16388959#comment-16388959
 ] 

Chris Douglas commented on HADOOP-15292:


{noformat}
+  if (sourceOffset != 0) {
+inStream.seek(sourceOffset);
{noformat}
Should this be {{sourceOffset != inStream.getPos()}} ?

> Distcp's use of pread is slowing it down.
> -
>
> Key: HADOOP-15292
> URL: https://issues.apache.org/jira/browse/HADOOP-15292
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 2.5.0
>Reporter: Virajith Jalaparti
>Priority: Minor
> Attachments: HADOOP-15292.000.patch, HADOOP-15292.001.patch
>
>
> Distcp currently uses positioned-reads (in 
> RetriableFileCopyCommand#copyBytes) when the source offset is > 0. This 
> results in unnecessary overheads (new BlockReader being created on the 
> client-side, multiple readBlock() calls to the Datanodes, each of which 
> requires the creation of a BlockSender and an inputstream to the ReplicaInfo).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15292) Distcp's use of pread is slowing it down.

2018-03-06 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated HADOOP-15292:
---
Target Version/s: 2.9.1

> Distcp's use of pread is slowing it down.
> -
>
> Key: HADOOP-15292
> URL: https://issues.apache.org/jira/browse/HADOOP-15292
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 2.5.0
>Reporter: Virajith Jalaparti
>Priority: Minor
> Attachments: HADOOP-15292.000.patch, HADOOP-15292.001.patch
>
>
> Distcp currently uses positioned-reads (in 
> RetriableFileCopyCommand#copyBytes) when the source offset is > 0. This 
> results in unnecessary overheads (new BlockReader being created on the 
> client-side, multiple readBlock() calls to the Datanodes, each of which 
> requires the creation of a BlockSender and an inputstream to the ReplicaInfo).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15292) Distcp's use of pread is slowing it down.

2018-03-06 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated HADOOP-15292:
---
Affects Version/s: (was: 3.0.0)
   2.5.0

> Distcp's use of pread is slowing it down.
> -
>
> Key: HADOOP-15292
> URL: https://issues.apache.org/jira/browse/HADOOP-15292
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 2.5.0
>Reporter: Virajith Jalaparti
>Priority: Minor
> Attachments: HADOOP-15292.000.patch, HADOOP-15292.001.patch
>
>
> Distcp currently uses positioned-reads (in 
> RetriableFileCopyCommand#copyBytes) when the source offset is > 0. This 
> results in unnecessary overheads (new BlockReader being created on the 
> client-side, multiple readBlock() calls to the Datanodes, each of which 
> requires the creation of a BlockSender and an inputstream to the ReplicaInfo).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15292) Distcp's use of pread is slowing it down.

2018-03-06 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16388859#comment-16388859
 ] 

Chris Douglas commented on HADOOP-15292:


Instead of passing a flag to {{readBytes}}, this can just call {{seek()}} 
outside the loop (and include the {{getPos() != position}} optimization).

[~ste...@apache.org] are you set up to test S3? {{pread}} happens to have an 
expensive implementation in HDFS (and other {{FileSystem}} impls), but creating 
a test for distcp to ensure the {{PositionedReadable}} APIs aren't used seems 
excessive.

bq. Not sure if it's worth extending that unit test to track how many times we 
open the stream.
>From the description, it's inside the DN where {{pread}} creates multiple 
>streams. IIRC the position of the stream isn't updated when using PR APIs. If 
>the stream were shared that could be an issue, but that's not in the design. 
>In HDFS, updating the set of locations for each read (without checking the 
>distcp invariants) is also unused, here.

Demonstrating the fix with a demo in HDFS would be sufficient for commit, IMO. 
It might be possible to add a test around the command itself to ensure the 
{{seek()}} is correct on retry, but wiring the flaw into a test would require a 
{{MiniDFSCluster}}.

> Distcp's use of pread is slowing it down.
> -
>
> Key: HADOOP-15292
> URL: https://issues.apache.org/jira/browse/HADOOP-15292
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 3.0.0
>Reporter: Virajith Jalaparti
>Priority: Minor
> Attachments: HADOOP-15292.000.patch
>
>
> Distcp currently uses positioned-reads (in 
> RetriableFileCopyCommand#copyBytes) when the source offset is > 0. This 
> results in unnecessary overheads (new BlockReader being created on the 
> client-side, multiple readBlock() calls to the Datanodes, each of requires 
> the creation of a BlockSender and an inputstream to the ReplicaInfo).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14898) Create official Docker images for development and testing features

2018-03-06 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16388797#comment-16388797
 ] 

Chris Douglas commented on HADOOP-14898:


Thanks, [~anu]. Could you also update the release documentation, so we 
understand how to maintain the set of valid targets?

> Create official Docker images for development and testing features 
> ---
>
> Key: HADOOP-14898
> URL: https://issues.apache.org/jira/browse/HADOOP-14898
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
> Attachments: HADOOP-14898.001.tar.gz, HADOOP-14898.002.tar.gz, 
> HADOOP-14898.003.tgz, docker_design.pdf
>
>
> This is the original mail from the mailing list:
> {code}
> TL;DR: I propose to create official hadoop images and upload them to the 
> dockerhub.
> GOAL/SCOPE: I would like improve the existing documentation with easy-to-use 
> docker based recipes to start hadoop clusters with various configuration.
> The images also could be used to test experimental features. For example 
> ozone could be tested easily with these compose file and configuration:
> https://gist.github.com/elek/1676a97b98f4ba561c9f51fce2ab2ea6
> Or even the configuration could be included in the compose file:
> https://github.com/elek/hadoop/blob/docker-2.8.0/example/docker-compose.yaml
> I would like to create separated example compose files for federation, ha, 
> metrics usage, etc. to make it easier to try out and understand the features.
> CONTEXT: There is an existing Jira 
> https://issues.apache.org/jira/browse/HADOOP-13397
> But it’s about a tool to generate production quality docker images (multiple 
> types, in a flexible way). If no objections, I will create a separated issue 
> to create simplified docker images for rapid prototyping and investigating 
> new features. And register the branch to the dockerhub to create the images 
> automatically.
> MY BACKGROUND: I am working with docker based hadoop/spark clusters quite a 
> while and run them succesfully in different environments (kubernetes, 
> docker-swarm, nomad-based scheduling, etc.) My work is available from here: 
> https://github.com/flokkr but they could handle more complex use cases (eg. 
> instrumenting java processes with btrace, or read/reload configuration from 
> consul).
>  And IMHO in the official hadoop documentation it’s better to suggest to use 
> official apache docker images and not external ones (which could be changed).
> {code}
> The next list will enumerate the key decision points regarding to docker 
> image creating
> A. automated dockerhub build  / jenkins build
> Docker images could be built on the dockerhub (a branch pattern should be 
> defined for a github repository and the location of the Docker files) or 
> could be built on a CI server and pushed.
> The second one is more flexible (it's more easy to create matrix build, for 
> example)
> The first one had the advantage that we can get an additional flag on the 
> dockerhub that the build is automated (and built from the source by the 
> dockerhub).
> The decision is easy as ASF supports the first approach: (see 
> https://issues.apache.org/jira/browse/INFRA-12781?focusedCommentId=15824096=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15824096)
> B. source: binary distribution or source build
> The second question is about creating the docker image. One option is to 
> build the software on the fly during the creation of the docker image the 
> other one is to use the binary releases.
> I suggest to use the second approach as:
> 1. In that case the hadoop:2.7.3 could contain exactly the same hadoop 
> distrubution as the downloadable one
> 2. We don't need to add development tools to the image, the image could be 
> more smaller (which is important as the goal for this image to getting 
> started as fast as possible)
> 3. The docker definition will be more simple (and more easy to maintain)
> Usually this approach is used in other projects (I checked Apache Zeppelin 
> and Apache Nutch)
> C. branch usage
> Other question is the location of the Docker file. It could be on the 
> official source-code branches (branch-2, trunk, etc.) or we can create 
> separated branches for the dockerhub (eg. docker/2.7 docker/2.8 docker/3.0)
> For the first approach it's easier to find the docker images, but it's less 
> flexible. For example if we had a Dockerfile for on the source code it should 
> be used for every release (for example the Docker file from the tag 
> release-3.0.0 should be used for the 3.0 hadoop docker image). In that case 
> the release process is much more harder: in case of a Dockerfile error (which 
> could be test on dockerhub only after the taging), a 

[jira] [Commented] (HADOOP-15289) FileStatus.readFields() assertion incorrect

2018-03-05 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386812#comment-16386812
 ] 

Chris Douglas commented on HADOOP-15289:


+1

> FileStatus.readFields() assertion incorrect
> ---
>
> Key: HADOOP-15289
> URL: https://issues.apache.org/jira/browse/HADOOP-15289
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Steve Loughran
>Priority: Critical
> Attachments: HADOOP-15289-001.patch
>
>
> As covered inHBASE-20123,  "Backup test fails against hadoop 3; ", I think 
> the assert at the end of {{FileStatus.readFields()}} is wrong; if you run the 
> code with assert=true against a directory, an IOE will get raised.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14898) Create official Docker images for development and testing features

2018-03-01 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16382897#comment-16382897
 ] 

Chris Douglas commented on HADOOP-14898:


lgtm. Do we need to change our [release 
process|https://wiki.apache.org/hadoop/HowToRelease/] to update these branches? 
Or will that happen implicitly? The patch in HADOOP-15256 looks like the 
version is hard-coded in the Dockerfile, so I'm not sure I'm following how we 
maintain the set of valid release targets from the ASF repo.

I'd expect that downstream, one could specify 2 (implicitly the latest 2.x.y), 
2.8 (implicitly the latest 2.8.y) or 2.8.3 (exact). Is that the case?

> Create official Docker images for development and testing features 
> ---
>
> Key: HADOOP-14898
> URL: https://issues.apache.org/jira/browse/HADOOP-14898
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
> Attachments: HADOOP-14898.001.tar.gz, HADOOP-14898.002.tar.gz, 
> HADOOP-14898.003.tgz, docker_design.pdf
>
>
> This is the original mail from the mailing list:
> {code}
> TL;DR: I propose to create official hadoop images and upload them to the 
> dockerhub.
> GOAL/SCOPE: I would like improve the existing documentation with easy-to-use 
> docker based recipes to start hadoop clusters with various configuration.
> The images also could be used to test experimental features. For example 
> ozone could be tested easily with these compose file and configuration:
> https://gist.github.com/elek/1676a97b98f4ba561c9f51fce2ab2ea6
> Or even the configuration could be included in the compose file:
> https://github.com/elek/hadoop/blob/docker-2.8.0/example/docker-compose.yaml
> I would like to create separated example compose files for federation, ha, 
> metrics usage, etc. to make it easier to try out and understand the features.
> CONTEXT: There is an existing Jira 
> https://issues.apache.org/jira/browse/HADOOP-13397
> But it’s about a tool to generate production quality docker images (multiple 
> types, in a flexible way). If no objections, I will create a separated issue 
> to create simplified docker images for rapid prototyping and investigating 
> new features. And register the branch to the dockerhub to create the images 
> automatically.
> MY BACKGROUND: I am working with docker based hadoop/spark clusters quite a 
> while and run them succesfully in different environments (kubernetes, 
> docker-swarm, nomad-based scheduling, etc.) My work is available from here: 
> https://github.com/flokkr but they could handle more complex use cases (eg. 
> instrumenting java processes with btrace, or read/reload configuration from 
> consul).
>  And IMHO in the official hadoop documentation it’s better to suggest to use 
> official apache docker images and not external ones (which could be changed).
> {code}
> The next list will enumerate the key decision points regarding to docker 
> image creating
> A. automated dockerhub build  / jenkins build
> Docker images could be built on the dockerhub (a branch pattern should be 
> defined for a github repository and the location of the Docker files) or 
> could be built on a CI server and pushed.
> The second one is more flexible (it's more easy to create matrix build, for 
> example)
> The first one had the advantage that we can get an additional flag on the 
> dockerhub that the build is automated (and built from the source by the 
> dockerhub).
> The decision is easy as ASF supports the first approach: (see 
> https://issues.apache.org/jira/browse/INFRA-12781?focusedCommentId=15824096=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15824096)
> B. source: binary distribution or source build
> The second question is about creating the docker image. One option is to 
> build the software on the fly during the creation of the docker image the 
> other one is to use the binary releases.
> I suggest to use the second approach as:
> 1. In that case the hadoop:2.7.3 could contain exactly the same hadoop 
> distrubution as the downloadable one
> 2. We don't need to add development tools to the image, the image could be 
> more smaller (which is important as the goal for this image to getting 
> started as fast as possible)
> 3. The docker definition will be more simple (and more easy to maintain)
> Usually this approach is used in other projects (I checked Apache Zeppelin 
> and Apache Nutch)
> C. branch usage
> Other question is the location of the Docker file. It could be on the 
> official source-code branches (branch-2, trunk, etc.) or we can create 
> separated branches for the dockerhub (eg. docker/2.7 docker/2.8 docker/3.0)
> For the first approach it's easier to find the docker images, but it's less 
> 

[jira] [Commented] (HADOOP-15279) increase maven heap size recommendations

2018-03-01 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16382804#comment-16382804
 ] 

Chris Douglas commented on HADOOP-15279:


+1

> increase maven heap size recommendations
> 
>
> Key: HADOOP-15279
> URL: https://issues.apache.org/jira/browse/HADOOP-15279
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build, test
>Affects Versions: 3.0.0
>Reporter: Allen Wittenauer
>Assignee: Allen Wittenauer
>Priority: Minor
> Attachments: HADOOP-15279.00.patch
>
>
> 1G is just a bit too low for JDK8+surefire 2.20+hdfs unit tests running in 
> parallel.  Bump it up a bit more.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13972) ADLS to support per-store configuration

2018-02-28 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated HADOOP-13972:
---
Fix Version/s: 2.8.4
   2.9.1

> ADLS to support per-store configuration
> ---
>
> Key: HADOOP-13972
> URL: https://issues.apache.org/jira/browse/HADOOP-13972
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/adl
>Affects Versions: 3.0.0-alpha2
>Reporter: John Zhuge
>Assignee: Sharad Sonker
>Priority: Major
> Fix For: 3.1.0, 2.10.0, 2.9.1, 2.8.4, 3.0.2
>
>
> Useful when distcp needs to access 2 Data Lake stores with different SPIs.
> Of course, a workaround is to grant the same SPI access permission to both 
> stores, but sometimes it might not be feasible.
> One idea is to embed the store name in the configuration property names, 
> e.g., {{dfs.adls.oauth2..client.id}}. Per-store keys will be consulted 
> first, then fall back to the global keys.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13972) ADLS to support per-store configuration

2018-02-28 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16381249#comment-16381249
 ] 

Chris Douglas commented on HADOOP-13972:


Backported to branch-2.8 and branch-2.9

> ADLS to support per-store configuration
> ---
>
> Key: HADOOP-13972
> URL: https://issues.apache.org/jira/browse/HADOOP-13972
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/adl
>Affects Versions: 3.0.0-alpha2
>Reporter: John Zhuge
>Assignee: Sharad Sonker
>Priority: Major
> Fix For: 3.1.0, 2.10.0, 2.9.1, 2.8.4, 3.0.2
>
>
> Useful when distcp needs to access 2 Data Lake stores with different SPIs.
> Of course, a workaround is to grant the same SPI access permission to both 
> stores, but sometimes it might not be feasible.
> One idea is to embed the store name in the configuration property names, 
> e.g., {{dfs.adls.oauth2..client.id}}. Per-store keys will be consulted 
> first, then fall back to the global keys.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-15256) Create docker images for latest stable hadoop3 build

2018-02-28 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16381145#comment-16381145
 ] 

Chris Douglas edited comment on HADOOP-15256 at 2/28/18 10:40 PM:
--

* Would it be possible to fetch RAT artifacts from the ASF-hosted mirror 
script, rather than hard-coding the mirror?
{noformat}
+ARG 
HADOOP_URL=https://www.apache.org/dyn/mirrors/mirrors.cgi?action=download=hadoop/common/hadoop-3.0.0/hadoop-3.0.0.tar.gz
...
+   wget 
http://xenia.sote.hu/ftp/mirrors/www.apache.org/creadur/apache-rat-0.12/apache-rat-0.12-bin.tar.gz
 -O $DIR/build/apache-rat.tar.gz
{noformat}
* Typo: {{+Please use the included docker-compoase.yaml to test it:}}
* Use the default port for the namenode (8020) instead of 9870?


was (Author: chris.douglas):
Would it be possible to fetch RAT artifacts from the ASF-hosted mirror script, 
rather than hard-coding the mirror?
{noformat}
+ARG 
HADOOP_URL=https://www.apache.org/dyn/mirrors/mirrors.cgi?action=download=hadoop/common/hadoop-3.0.0/hadoop-3.0.0.tar.gz
...
+   wget 
http://xenia.sote.hu/ftp/mirrors/www.apache.org/creadur/apache-rat-0.12/apache-rat-0.12-bin.tar.gz
 -O $DIR/build/apache-rat.tar.gz
{noformat}

> Create docker images for latest stable hadoop3 build
> 
>
> Key: HADOOP-15256
> URL: https://issues.apache.org/jira/browse/HADOOP-15256
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
> Attachments: HADOOP-15256-docker-hadoop-3.001.patch
>
>
> Similar to the hadoop2 image we can provide a developer hadoop image which 
> contains the latest hadoop from the binary release.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15256) Create docker images for latest stable hadoop3 build

2018-02-28 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16381145#comment-16381145
 ] 

Chris Douglas commented on HADOOP-15256:


Would it be possible to fetch RAT artifacts from the ASF-hosted mirror script, 
rather than hard-coding the mirror?
{noformat}
+ARG 
HADOOP_URL=https://www.apache.org/dyn/mirrors/mirrors.cgi?action=download=hadoop/common/hadoop-3.0.0/hadoop-3.0.0.tar.gz
...
+   wget 
http://xenia.sote.hu/ftp/mirrors/www.apache.org/creadur/apache-rat-0.12/apache-rat-0.12-bin.tar.gz
 -O $DIR/build/apache-rat.tar.gz
{noformat}

> Create docker images for latest stable hadoop3 build
> 
>
> Key: HADOOP-15256
> URL: https://issues.apache.org/jira/browse/HADOOP-15256
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
> Attachments: HADOOP-15256-docker-hadoop-3.001.patch
>
>
> Similar to the hadoop2 image we can provide a developer hadoop image which 
> contains the latest hadoop from the binary release.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14898) Create official Docker images for development and testing features

2018-02-28 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16381133#comment-16381133
 ] 

Chris Douglas commented on HADOOP-14898:


bq. file a ticket with INFRA so that they can push this image as an Apache 
Image to DockerHub. This will allow INFRA to push these dockerHub Images if and 
when we make the changes. That is, we will have official Apache base images 
which can be trusted by end-users and updated with various releases if needed.

Sorry, still catching up on this. We should only release official images of 
releases, since that's the only artifact licensed for downstream use by a PMC 
vote. The infra process linked through the document (INFRA-12781) wasn't clear 
to me. We create branches- this proposes a base image, {{hadoop2}}, and 
{{hadoop3}}- and that will be pushed to the ASF repo on Docker Hub? Presumably 
these are just the head of versioned images, each corresponding to a release?

> Create official Docker images for development and testing features 
> ---
>
> Key: HADOOP-14898
> URL: https://issues.apache.org/jira/browse/HADOOP-14898
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
> Attachments: HADOOP-14898.001.tar.gz, HADOOP-14898.002.tar.gz, 
> HADOOP-14898.003.tgz, docker_design.pdf
>
>
> This is the original mail from the mailing list:
> {code}
> TL;DR: I propose to create official hadoop images and upload them to the 
> dockerhub.
> GOAL/SCOPE: I would like improve the existing documentation with easy-to-use 
> docker based recipes to start hadoop clusters with various configuration.
> The images also could be used to test experimental features. For example 
> ozone could be tested easily with these compose file and configuration:
> https://gist.github.com/elek/1676a97b98f4ba561c9f51fce2ab2ea6
> Or even the configuration could be included in the compose file:
> https://github.com/elek/hadoop/blob/docker-2.8.0/example/docker-compose.yaml
> I would like to create separated example compose files for federation, ha, 
> metrics usage, etc. to make it easier to try out and understand the features.
> CONTEXT: There is an existing Jira 
> https://issues.apache.org/jira/browse/HADOOP-13397
> But it’s about a tool to generate production quality docker images (multiple 
> types, in a flexible way). If no objections, I will create a separated issue 
> to create simplified docker images for rapid prototyping and investigating 
> new features. And register the branch to the dockerhub to create the images 
> automatically.
> MY BACKGROUND: I am working with docker based hadoop/spark clusters quite a 
> while and run them succesfully in different environments (kubernetes, 
> docker-swarm, nomad-based scheduling, etc.) My work is available from here: 
> https://github.com/flokkr but they could handle more complex use cases (eg. 
> instrumenting java processes with btrace, or read/reload configuration from 
> consul).
>  And IMHO in the official hadoop documentation it’s better to suggest to use 
> official apache docker images and not external ones (which could be changed).
> {code}
> The next list will enumerate the key decision points regarding to docker 
> image creating
> A. automated dockerhub build  / jenkins build
> Docker images could be built on the dockerhub (a branch pattern should be 
> defined for a github repository and the location of the Docker files) or 
> could be built on a CI server and pushed.
> The second one is more flexible (it's more easy to create matrix build, for 
> example)
> The first one had the advantage that we can get an additional flag on the 
> dockerhub that the build is automated (and built from the source by the 
> dockerhub).
> The decision is easy as ASF supports the first approach: (see 
> https://issues.apache.org/jira/browse/INFRA-12781?focusedCommentId=15824096=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15824096)
> B. source: binary distribution or source build
> The second question is about creating the docker image. One option is to 
> build the software on the fly during the creation of the docker image the 
> other one is to use the binary releases.
> I suggest to use the second approach as:
> 1. In that case the hadoop:2.7.3 could contain exactly the same hadoop 
> distrubution as the downloadable one
> 2. We don't need to add development tools to the image, the image could be 
> more smaller (which is important as the goal for this image to getting 
> started as fast as possible)
> 3. The docker definition will be more simple (and more easy to maintain)
> Usually this approach is used in other projects (I checked Apache Zeppelin 
> and Apache Nutch)
> C. branch usage
> Other question 

[jira] [Updated] (HADOOP-15251) Backport HADOOP-13514 (surefire upgrade) to branch-2

2018-02-26 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated HADOOP-15251:
---
   Resolution: Fixed
 Assignee: Chris Douglas
 Hadoop Flags: Reviewed
Fix Version/s: 2.9.1
   Status: Resolved  (was: Patch Available)

Thanks, Akira. I committed this.

> Backport HADOOP-13514 (surefire upgrade) to branch-2
> 
>
> Key: HADOOP-15251
> URL: https://issues.apache.org/jira/browse/HADOOP-15251
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: test
>Reporter: Chris Douglas
>Assignee: Chris Douglas
>Priority: Major
> Fix For: 2.9.1
>
> Attachments: HADOOP-15251-branch-2.001.patch, 
> HADOOP-15251-branch-2.002.patch
>
>
> Tests in branch-2 are not running reliably in Jenkins, and due to 
> SUREFIRE-524, these are not being cleaned up properly (see HADOOP-15153).
> Upgrading to a more recent version of the surefire plugin will help make the 
> problem easier to address in branch-2



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15251) Backport HADOOP-13514 (surefire upgrade) to branch-2

2018-02-26 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated HADOOP-15251:
---
Summary: Backport HADOOP-13514 (surefire upgrade) to branch-2  (was: 
Upgrade surefire version in branch-2)

> Backport HADOOP-13514 (surefire upgrade) to branch-2
> 
>
> Key: HADOOP-15251
> URL: https://issues.apache.org/jira/browse/HADOOP-15251
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: test
>Reporter: Chris Douglas
>Priority: Major
> Attachments: HADOOP-15251-branch-2.001.patch, 
> HADOOP-15251-branch-2.002.patch
>
>
> Tests in branch-2 are not running reliably in Jenkins, and due to 
> SUREFIRE-524, these are not being cleaned up properly (see HADOOP-15153).
> Upgrading to a more recent version of the surefire plugin will help make the 
> problem easier to address in branch-2



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13374) Add the L verification script

2018-02-23 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated HADOOP-13374:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.2.0
   Status: Resolved  (was: Patch Available)

+1 I committed this.

> Add the L verification script
> ---
>
> Key: HADOOP-13374
> URL: https://issues.apache.org/jira/browse/HADOOP-13374
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Xiao Chen
>Assignee: Allen Wittenauer
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: HADOOP-13374.01.patch, HADOOP-13374.02.patch, 
> HADOOP-13374.03.patch, HADOOP-13374.04.patch
>
>
> This is the script that's used for L change verification during 
> HADOOP-12893. We should commit this as [~ozawa] 
> [suggested|https://issues.apache.org/jira/browse/HADOOP-13298?focusedCommentId=15374498=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15374498].
> I was 
> [initially|https://issues.apache.org/jira/browse/HADOOP-12893?focusedCommentId=15283040=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15283040]
>  verifying from an on-fly shell command, and [~andrew.wang] contributed the 
> script later in [a comment|
> https://issues.apache.org/jira/browse/HADOOP-12893?focusedCommentId=15303281=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15303281],
>  so most credit should go to him. :)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15251) Upgrade surefire version in branch-2

2018-02-23 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated HADOOP-15251:
---
Attachment: HADOOP-15251-branch-2.002.patch

> Upgrade surefire version in branch-2
> 
>
> Key: HADOOP-15251
> URL: https://issues.apache.org/jira/browse/HADOOP-15251
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: test
>Reporter: Chris Douglas
>Priority: Major
> Attachments: HADOOP-15251-branch-2.001.patch, 
> HADOOP-15251-branch-2.002.patch
>
>
> Tests in branch-2 are not running reliably in Jenkins, and due to 
> SUREFIRE-524, these are not being cleaned up properly (see HADOOP-15153).
> Upgrading to a more recent version of the surefire plugin will help make the 
> problem easier to address in branch-2



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15251) Upgrade surefire version in branch-2

2018-02-22 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16373717#comment-16373717
 ] 

Chris Douglas commented on HADOOP-15251:


The reverts were in trunk, also. After it was committed to trunk, stability 
[improved|https://issues.apache.org/jira/browse/HADOOP-13514?focusedCommentId=16259532=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16259532].

> Upgrade surefire version in branch-2
> 
>
> Key: HADOOP-15251
> URL: https://issues.apache.org/jira/browse/HADOOP-15251
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: test
>Reporter: Chris Douglas
>Priority: Major
> Attachments: HADOOP-15251-branch-2.001.patch
>
>
> Tests in branch-2 are not running reliably in Jenkins, and due to 
> SUREFIRE-524, these are not being cleaned up properly (see HADOOP-15153).
> Upgrading to a more recent version of the surefire plugin will help make the 
> problem easier to address in branch-2



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15251) Upgrade surefire version in branch-2

2018-02-22 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16373341#comment-16373341
 ] 

Chris Douglas commented on HADOOP-15251:


bq. So I should try to apply HADOOP-13514 to branch-2 and also include this 
version upgrade?
Yup, sounds good. I looked for surefire JIRAs, but somehow missed HADOOP-13514.

bq. I'd also like to see the results of running the maven integration tests 
against hadoop-aws and hadoop-aws as they use failsafe and depend on property 
passdown
Are these backported to branch-2? Are you set up to run these?

> Upgrade surefire version in branch-2
> 
>
> Key: HADOOP-15251
> URL: https://issues.apache.org/jira/browse/HADOOP-15251
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: test
>Reporter: Chris Douglas
>Priority: Major
> Attachments: HADOOP-15251-branch-2.001.patch
>
>
> Tests in branch-2 are not running reliably in Jenkins, and due to 
> SUREFIRE-524, these are not being cleaned up properly (see HADOOP-15153).
> Upgrading to a more recent version of the surefire plugin will help make the 
> problem easier to address in branch-2



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15251) Upgrade surefire version in branch-2

2018-02-21 Thread Chris Douglas (JIRA)
Chris Douglas created HADOOP-15251:
--

 Summary: Upgrade surefire version in branch-2
 Key: HADOOP-15251
 URL: https://issues.apache.org/jira/browse/HADOOP-15251
 Project: Hadoop Common
  Issue Type: Bug
  Components: test
Reporter: Chris Douglas


Tests in branch-2 are not running reliably in Jenkins, and due to SUREFIRE-524, 
these are not being cleaned up properly (see HADOOP-15153).

Upgrading to a more recent version of the surefire plugin will help make the 
problem easier to address in branch-2



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15247) Move commons-net up to 3.6

2018-02-20 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370430#comment-16370430
 ] 

Chris Douglas commented on HADOOP-15247:


+1 Sure, let's do it. Do we know which downstream projects also take this 
dependency, and at what version?

> Move commons-net up to 3.6
> --
>
> Key: HADOOP-15247
> URL: https://issues.apache.org/jira/browse/HADOOP-15247
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Priority: Minor
> Attachments: HADOOP-15247-001.patch
>
>
> Bump up commons-net to latest version, which appears to be 3.6
> Uses: netutils, ftpfs, dns utils in registry



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15204) Add Configuration API for parsing storage sizes

2018-02-14 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364772#comment-16364772
 ] 

Chris Douglas commented on HADOOP-15204:


+1 lgtm

> Add Configuration API for parsing storage sizes
> ---
>
> Key: HADOOP-15204
> URL: https://issues.apache.org/jira/browse/HADOOP-15204
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: conf
>Affects Versions: 3.1.0
>Reporter: Anu Engineer
>Assignee: Anu Engineer
>Priority: Minor
> Fix For: 3.1.0
>
> Attachments: HADOOP-15204.001.patch, HADOOP-15204.002.patch, 
> HADOOP-15204.003.patch
>
>
> Hadoop has a lot of configurations that specify memory and disk size. This 
> JIRA proposes to add an API like {{Configuration.getStorageSize}} which will 
> allow users
>  to specify units like KB, MB, GB etc. This is JIRA is inspired by 
> HADOOP-8608 and Ozone. Adding {{getTimeDuration}} support was a great 
> improvement for ozone code base, this JIRA hopes to do the same thing for 
> configs that deal with disk and memory usage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-14077) Improve the patch of HADOOP-13119

2018-02-13 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas resolved HADOOP-14077.

Resolution: Fixed

This has already been part of a release. Please leave it resolved.

> Improve the patch of HADOOP-13119
> -
>
> Key: HADOOP-14077
> URL: https://issues.apache.org/jira/browse/HADOOP-14077
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: security
>Reporter: Yuanbo Liu
>Assignee: Yuanbo Liu
>Priority: Major
> Fix For: 3.0.0-alpha4
>
> Attachments: HADOOP-14077.001.patch, HADOOP-14077.002.patch, 
> HADOOP-14077.003.patch
>
>
> For some links(such as "/jmx, /stack"), blocking the links in filter chain 
> due to impersonation issue is not friendly for users. For example, user "sam" 
> is not allowed to be impersonated by user "knox", and the link "/jmx" doesn't 
> need any user to do authorization by default. It only needs user "knox" to do 
> authentication, in this case, it's not right to  block the access in SPNEGO 
> filter. We intend to check impersonation permission when the method 
> "getRemoteUser" of request is used, so that such kind of links("/jmx, 
> /stack") would not be blocked by mistake.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15195) With SELinux enabled, directories mounted with start-build-env.sh may not be accessible.

2018-02-12 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16361831#comment-16361831
 ] 

Chris Douglas commented on HADOOP-15195:


Sorry for the revert noise; pushed v003 the first time.

> With SELinux enabled, directories mounted with start-build-env.sh may not be 
> accessible.
> 
>
> Key: HADOOP-15195
> URL: https://issues.apache.org/jira/browse/HADOOP-15195
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build
> Environment: Systems with SELinux enabled, e.g., Red Hat Linux 7, 
> CentOS 7, Fedora.
>Reporter: Grigori Rybkine
>Assignee: Grigori Rybkine
>Priority: Major
> Fix For: 3.1.0
>
> Attachments: HADOOP-15195.001.patch, HADOOP-15195.001.patch, 
> HADOOP-15195.002.patch, HADOOP-15195.003.patch, HADOOP-15195.004.patch, 
> HADOOP-15195.005.patch
>
>
> On a system with SELinux enabled, e.g., Red Hat Linux 7, CentOS 7, Fedora, 
> the host directories - with the Hadoop code and the maven .m2 - mounted with 
> start-build-env.sh may not be accessible to the container. This precludes 
> Hadoop development on such systems.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15195) With SELinux enabled, directories mounted with start-build-env.sh may not be accessible.

2018-02-12 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated HADOOP-15195:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.1.0
   Status: Resolved  (was: Patch Available)

+1 Great, thanks for the tests. Committed to trunk and branch-3.1. Thanks 
[~rybkine]

> With SELinux enabled, directories mounted with start-build-env.sh may not be 
> accessible.
> 
>
> Key: HADOOP-15195
> URL: https://issues.apache.org/jira/browse/HADOOP-15195
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build
> Environment: Systems with SELinux enabled, e.g., Red Hat Linux 7, 
> CentOS 7, Fedora.
>Reporter: Grigori Rybkine
>Assignee: Grigori Rybkine
>Priority: Major
> Fix For: 3.1.0
>
> Attachments: HADOOP-15195.001.patch, HADOOP-15195.001.patch, 
> HADOOP-15195.002.patch, HADOOP-15195.003.patch, HADOOP-15195.004.patch, 
> HADOOP-15195.005.patch
>
>
> On a system with SELinux enabled, e.g., Red Hat Linux 7, CentOS 7, Fedora, 
> the host directories - with the Hadoop code and the maven .m2 - mounted with 
> start-build-env.sh may not be accessible to the container. This precludes 
> Hadoop development on such systems.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15195) With SELinux enabled, directories mounted with start-build-env.sh may not be accessible.

2018-02-06 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354896#comment-16354896
 ] 

Chris Douglas commented on HADOOP-15195:


I don't have access to an SELinux system and haven't even tried to use it since 
2004. [~aw]?

Superficially:
* The comment could explain what this is working around, rather than a pointer 
to the blog point
* Similarly, explaining what :z does might be useful
* Some {{bats}} unit tests exist for the shell code. Are there any useful tests 
one could write for this?
* (trivial) {{\{split($3,ver,".");print ver[1]"."ver[2]\}}} could match 
/^Docker version/

> With SELinux enabled, directories mounted with start-build-env.sh may not be 
> accessible.
> 
>
> Key: HADOOP-15195
> URL: https://issues.apache.org/jira/browse/HADOOP-15195
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build
> Environment: Systems with SELinux enabled, e.g., Red Hat Linux 7, 
> CentOS 7, Fedora.
>Reporter: Grigori Rybkine
>Assignee: Grigori Rybkine
>Priority: Major
> Attachments: HADOOP-15195.001.patch, HADOOP-15195.001.patch, 
> HADOOP-15195.002.patch
>
>
> On a system with SELinux enabled, e.g., Red Hat Linux 7, CentOS 7, Fedora, 
> the host directories - with the Hadoop code and the maven .m2 - mounted with 
> start-build-env.sh may not be accessible to the container. This precludes 
> Hadoop development on such systems.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15204) Add Configuration API for parsing storage sizes

2018-02-03 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351490#comment-16351490
 ] 

Chris Douglas commented on HADOOP-15204:


Sorry, missed this
bq. Precondition.checkArgument for validation
Please, no. This is the silliest dependency we have on Guava.

> Add Configuration API for parsing storage sizes
> ---
>
> Key: HADOOP-15204
> URL: https://issues.apache.org/jira/browse/HADOOP-15204
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: conf
>Affects Versions: 3.1.0
>Reporter: Anu Engineer
>Assignee: Anu Engineer
>Priority: Minor
> Fix For: 3.1.0
>
> Attachments: HADOOP-15204.001.patch, HADOOP-15204.002.patch
>
>
> Hadoop has a lot of configurations that specify memory and disk size. This 
> JIRA proposes to add an API like {{Configuration.getStorageSize}} which will 
> allow users
>  to specify units like KB, MB, GB etc. This is JIRA is inspired by 
> HADOOP-8608 and Ozone. Adding {{getTimeDuration}} support was a great 
> improvement for ozone code base, this JIRA hopes to do the same thing for 
> configs that deal with disk and memory usage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15204) Add Configuration API for parsing storage sizes

2018-02-02 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350889#comment-16350889
 ] 

Chris Douglas commented on HADOOP-15204:


bq. I find getTimeDuration API extremely intuitive, hence imitating that for 
this API
Soright; I only mention it as relevant context.

bq. Rounding causes a significant loss when we convert from x bytes to y 
exabytes. Hence I voted for the least element of surprise and decided to return 
double
Is it sufficient for the caller to provide the precision? If the caller wants 
petabytes with some decimal places, then they can request terabytes. If they 
want to ensure the conversion is within some epsilon, then they can request the 
value with high precision and measure the loss. Even rounding decisions can be 
left to the caller. Instead of passing this context into {{Configuration}} (or 
setting defaults), its role can be limited to converting and scaling the 
stringly-typed value.

Similarly:
bq. That would let me return something like -1 to mean "no value set", which I 
can't do with the current API.
{{Configuration}} supports that with a raw {{get(key)}}. It's only where we 
have the default in hand that it provides typed getters.

bq. This is the curse of writing a unit as a library; we need to be cognizant 
of that single use case which will break us. Hence I have used bigDecimal to be 
safe and correct and return doubles. It yields values that people expect.
Sure, I only mention it because it differs from {{getTimeDuration}}. With 
{{TimeUnit}}, a caller could, with low false positives, check if the result was 
equal to max to detect overflow. Doing that here would have a higher false 
positive rate, so the {{BigDecimal}} approach with explicit precision is 
superior.

Minor:
* In this overload:
{noformat}
+  public double getStorageSize(String name, String defaultValue,
+  StorageUnit targetUnit) {
{noformat}
Does this come up often? The {{getTimeDuration}} assumes that the default will 
be in the same unit as the conversion. So in your example, one would write 
{{getTimeDuration("key", 5000, MEGABYTES)}}. It's less elegant, but it type 
checks.
* I'd lean toward {{MB}} instead of {{MEGABYTES}}, and similar. Even as a 
static import, those are unlikely to collide and they're equally readable.

> Add Configuration API for parsing storage sizes
> ---
>
> Key: HADOOP-15204
> URL: https://issues.apache.org/jira/browse/HADOOP-15204
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: conf
>Affects Versions: 3.1.0
>Reporter: Anu Engineer
>Assignee: Anu Engineer
>Priority: Minor
> Fix For: 3.1.0
>
> Attachments: HADOOP-15204.001.patch, HADOOP-15204.002.patch
>
>
> Hadoop has a lot of configurations that specify memory and disk size. This 
> JIRA proposes to add an API like {{Configuration.getStorageSize}} which will 
> allow users
>  to specify units like KB, MB, GB etc. This is JIRA is inspired by 
> HADOOP-8608 and Ozone. Adding {{getTimeDuration}} support was a great 
> improvement for ozone code base, this JIRA hopes to do the same thing for 
> configs that deal with disk and memory usage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15151) MapFile.fix creates a wrong index file in case of block-compressed data file.

2018-02-01 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349533#comment-16349533
 ] 

Chris Douglas commented on HADOOP-15151:


Thanks, [~rybkine]. While {{MapFile}} has some odd quirks, it's maintained by 
the project mostly for compatibility with older applications. Improving the 
code quality may not be a good use of your time. If you're interested in file 
formats generally, you may want to check out the ORC and Parquet projects.

Of course, if that impression is mistaken and {{MapFile}} is important to you, 
please feel free to open JIRAs to improve it (provided it continues to 
read/write files compatible with the existing format).

> MapFile.fix creates a wrong index file in case of block-compressed data file.
> -
>
> Key: HADOOP-15151
> URL: https://issues.apache.org/jira/browse/HADOOP-15151
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common
>Reporter: Grigori Rybkine
>Assignee: Grigori Rybkine
>Priority: Major
>  Labels: patch
> Fix For: 2.9.1
>
> Attachments: HADOOP-15151.001.patch, HADOOP-15151.002.patch, 
> HADOOP-15151.003.patch, HADOOP-15151.004.patch, HADOOP-15151.004.patch, 
> HADOOP-15151.005.patch
>
>
> Index file created with MapFile.fix for an ordered block-compressed data file 
> does not allow to find values for keys existing in the data file via the 
> MapFile.get method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15204) Add Configuration API for parsing storage sizes

2018-02-01 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349492#comment-16349492
 ] 

Chris Douglas commented on HADOOP-15204:


bq. As the author of HADOOP-8608, I would appreciate any perspectives you have 
on this JIRA.
I find the API intuitive, but that is not universal (e.g., HDFS-9847). 
Explaining it has taken more cycles than I expected, and perhaps more than a 
good API should.
* {{TERRABYTES}} is misspelled.
* Is {{long}} insufficient as a return type for {{getStorageSize}}? I 
appreciate future-proofing, but for {{Configuration}} values, that's what, ~8 
petabytes? I haven't looked carefully at the semantics of {{BigDecimal}}, but 
the comments imply that {{setScale}} is used to guarantee the result will fit. 
An overload of {{setStorageSize}} taking {{double}} might make sense, as would 
using doubles in the overload of {{String}}.
* Why 
[ROUND_UP|https://docs.oracle.com/javase/8/docs/api/java/math/BigDecimal.html#ROUND_UP]
 of the options? Just curious.
* {{TimeUnit}} uses min/max values for Long (e.g., 
[TimeUnit::toNanos|https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/TimeUnit.html#toNanos-long-])
 for overflow/underflow. Storage units are more likely to be exact powers of 
two so that may not be appropriate.

> Add Configuration API for parsing storage sizes
> ---
>
> Key: HADOOP-15204
> URL: https://issues.apache.org/jira/browse/HADOOP-15204
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: conf
>Affects Versions: 3.1.0
>Reporter: Anu Engineer
>Assignee: Anu Engineer
>Priority: Minor
> Fix For: 3.1.0
>
> Attachments: HADOOP-15204.001.patch
>
>
> Hadoop has a lot of configurations that specify memory and disk size. This 
> JIRA proposes to add an API like {{Configuration.getStorageSize}} which will 
> allow users
>  to specify units like KB, MB, GB etc. This is JIRA is inspired by 
> HADOOP-8608 and Ozone. Adding {{getTimeDuration}} support was a great 
> improvement for ozone code base, this JIRA hopes to do the same thing for 
> configs that deal with disk and memory usage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-15205) maven release: missing source attachments for hadoop-mapreduce-client-core

2018-02-01 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349377#comment-16349377
 ] 

Chris Douglas edited comment on HADOOP-15205 at 2/1/18 10:19 PM:
-

[~shv] could you take a look? At least for the 2.7.x releases.


was (Author: chris.douglas):
[~shv] could you take a look?

> maven release: missing source attachments for hadoop-mapreduce-client-core
> --
>
> Key: HADOOP-15205
> URL: https://issues.apache.org/jira/browse/HADOOP-15205
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.7.5, 3.0.0
>Reporter: Zoltan Haindrich
>Priority: Major
>
> I wanted to use the source attachment; however it looks like since 2.7.5 that 
> artifact is not present at maven central ; it looks like the last release 
> which had source attachments / javadocs was 2.7.4
> http://central.maven.org/maven2/org/apache/hadoop/hadoop-mapreduce-client-core/2.7.4/
> http://central.maven.org/maven2/org/apache/hadoop/hadoop-mapreduce-client-core/2.7.5/
> this seems to be not limited to mapreduce; as the same change is present for 
> yarn-common as well
> http://central.maven.org/maven2/org/apache/hadoop/hadoop-yarn-common/2.7.4/
> http://central.maven.org/maven2/org/apache/hadoop/hadoop-yarn-common/2.7.5/
> and also hadoop-common
> http://central.maven.org/maven2/org/apache/hadoop/hadoop-common/2.7.4/
> http://central.maven.org/maven2/org/apache/hadoop/hadoop-common/2.7.5/
> http://central.maven.org/maven2/org/apache/hadoop/hadoop-common/3.0.0/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15205) maven release: missing source attachments for hadoop-mapreduce-client-core

2018-02-01 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349377#comment-16349377
 ] 

Chris Douglas commented on HADOOP-15205:


[~shv] could you take a look?

> maven release: missing source attachments for hadoop-mapreduce-client-core
> --
>
> Key: HADOOP-15205
> URL: https://issues.apache.org/jira/browse/HADOOP-15205
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.7.5, 3.0.0
>Reporter: Zoltan Haindrich
>Priority: Major
>
> I wanted to use the source attachment; however it looks like since 2.7.5 that 
> artifact is not present at maven central ; it looks like the last release 
> which had source attachments / javadocs was 2.7.4
> http://central.maven.org/maven2/org/apache/hadoop/hadoop-mapreduce-client-core/2.7.4/
> http://central.maven.org/maven2/org/apache/hadoop/hadoop-mapreduce-client-core/2.7.5/
> this seems to be not limited to mapreduce; as the same change is present for 
> yarn-common as well
> http://central.maven.org/maven2/org/apache/hadoop/hadoop-yarn-common/2.7.4/
> http://central.maven.org/maven2/org/apache/hadoop/hadoop-yarn-common/2.7.5/
> and also hadoop-common
> http://central.maven.org/maven2/org/apache/hadoop/hadoop-common/2.7.4/
> http://central.maven.org/maven2/org/apache/hadoop/hadoop-common/2.7.5/
> http://central.maven.org/maven2/org/apache/hadoop/hadoop-common/3.0.0/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15190) Use Jacoco to generate Unit Test coverage reports

2018-01-30 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16345959#comment-16345959
 ] 

Chris Douglas commented on HADOOP-15190:


Can we move the module under dev-support? If downstream builds want to measure 
(and presumably improve) code coverage by these metrics, I don't mind adding 
hooks for JaCoCo, OpenClover, and other tools. If this falls out of sync and it 
doesn't get fixed, then we can just delete it later. Not a big deal.

> Use Jacoco to generate Unit Test coverage reports
> -
>
> Key: HADOOP-15190
> URL: https://issues.apache.org/jira/browse/HADOOP-15190
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build
>Reporter: Duo Xu
>Assignee: Duo Xu
>Priority: Minor
> Attachments: HADOOP-15190.01.patch, jacoco_report_2018_01_25.JPG
>
>
> Currently Hadoop is using maven-clover2-plugin for code coverage, which is 
> outdated. And Atlassian open-sourced clover last year so license cannot be 
> purchased although we can switch to use the license-free version called 
> "openclover".
> This Jira is to replace clover with Jacoco, which is actively maintained by 
> the community.
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15190) Use Jacoco to generate Unit Test coverage reports

2018-01-26 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16341476#comment-16341476
 ] 

Chris Douglas commented on HADOOP-15190:


Whatever's easier for Yetus to work with, then. JaCoCo writes an XML report. It 
looks like it produces some summary statistics that could be used to evaluate 
how the patch changed test coverage of the files it affected (likely a superset 
of those it changed).

Would this suffice? Rather, would the ease of diffing reports help us make a 
choice between them?

> Use Jacoco to generate Unit Test coverage reports
> -
>
> Key: HADOOP-15190
> URL: https://issues.apache.org/jira/browse/HADOOP-15190
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build
>Reporter: Duo Xu
>Assignee: Duo Xu
>Priority: Minor
> Attachments: HADOOP-15190.01.patch, jacoco_report_2018_01_25.JPG
>
>
> Currently Hadoop is using maven-clover2-plugin for code coverage, which is 
> outdated. And Atlassian open-sourced clover last year so license cannot be 
> purchased although we can switch to use the license-free version called 
> "openclover".
> This Jira is to replace clover with Jacoco, which is actively maintained by 
> the community.
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15190) Use Jacoco to generate Unit Test coverage reports

2018-01-26 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16341364#comment-16341364
 ] 

Chris Douglas commented on HADOOP-15190:


Thanks for the patch, Duo. Are there advantages to writing this as a new 
submodule, as in [^HADOOP-15190.01.patch]? HADOOP-14663 seems to achieve the 
same objective, and we wouldn't need to update {{hadoop-coverage}} when we add 
new submodules.

While the two tools measure coverage 
[differently|http://openclover.org/doc/manual/4.2.0/general--comparison-of-code-coverage-tools.html],
 either would be suitable. OpenClover uses ALv2, JaCoCo uses the 
[EPL|https://www.apache.org/legal/resolved.html#category-b]; no significant 
difference in our use. Is there a strong argument in favor of one over the 
other?

> Use Jacoco to generate Unit Test coverage reports
> -
>
> Key: HADOOP-15190
> URL: https://issues.apache.org/jira/browse/HADOOP-15190
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build
>Reporter: Duo Xu
>Assignee: Duo Xu
>Priority: Minor
> Attachments: HADOOP-15190.01.patch, jacoco_report_2018_01_25.JPG
>
>
> Currently Hadoop is using maven-clover2-plugin for code coverage, which is 
> outdated. And Atlassian open-sourced clover last year so license cannot be 
> purchased although we can switch to use the license-free version called 
> "openclover".
> This Jira is to replace clover with Jacoco, which is actively maintained by 
> the community.
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15151) MapFile.fix creates a wrong index file in case of block-compressed data file.

2018-01-26 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated HADOOP-15151:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.9.1
   Status: Resolved  (was: Patch Available)

I committed this. Thanks, Grigori

> MapFile.fix creates a wrong index file in case of block-compressed data file.
> -
>
> Key: HADOOP-15151
> URL: https://issues.apache.org/jira/browse/HADOOP-15151
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common
>Reporter: Grigori Rybkine
>Assignee: Grigori Rybkine
>Priority: Major
>  Labels: patch
> Fix For: 2.9.1
>
> Attachments: HADOOP-15151.001.patch, HADOOP-15151.002.patch, 
> HADOOP-15151.003.patch, HADOOP-15151.004.patch, HADOOP-15151.004.patch, 
> HADOOP-15151.005.patch
>
>
> Index file created with MapFile.fix for an ordered block-compressed data file 
> does not allow to find values for keys existing in the data file via the 
> MapFile.get method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15151) MapFile.fix creates a wrong index file in case of block-compressed data file.

2018-01-26 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16341312#comment-16341312
 ] 

Chris Douglas commented on HADOOP-15151:


Sorry for the delay, [~rybkine]. {{MapFile}} is rarely modified, these days.

I have some concerns about the existing code (catching {{Throwable}} while 
reconstructing the index?), but the patch and unit test look good. +1

> MapFile.fix creates a wrong index file in case of block-compressed data file.
> -
>
> Key: HADOOP-15151
> URL: https://issues.apache.org/jira/browse/HADOOP-15151
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common
>Reporter: Grigori Rybkine
>Assignee: Grigori Rybkine
>Priority: Major
>  Labels: patch
> Attachments: HADOOP-15151.001.patch, HADOOP-15151.002.patch, 
> HADOOP-15151.003.patch, HADOOP-15151.004.patch, HADOOP-15151.004.patch, 
> HADOOP-15151.005.patch
>
>
> Index file created with MapFile.fix for an ordered block-compressed data file 
> does not allow to find values for keys existing in the data file via the 
> MapFile.get method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15140) S3guard mistakes root URI without / as non-absolute path

2018-01-23 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16336780#comment-16336780
 ] 

Chris Douglas commented on HADOOP-15140:


I can't claim any grand insight, unfortunately. The semantics of an empty path 
in the URI aren't crisp, since {{FileSystem}} instances maintain working/home 
directories as context to resolve relative paths. In the past, that's gotten in 
the way of defining clearer semantics, since FS overrides don't compose 
compatibly. It's one of the reasons {{FileContext}} tried to move all the path 
manipulation up a level, so the FS implementations only deal with absolute URIs 
(IIRC, [~sanjay.radia] may have a clearer memory of this).

bq. To clarify the failure only occurs when passing a URI as an argument to the 
Path constructor. Otherwise this is handled properly.
Can you expand on this?

bq. this surfaces in code which has a path p like s3a://bucket/file.txt and you 
can getFileStatus(p.getParent()). The key of that parent is being mapped to "/" 
and then s3guard is failing.
If I understand you, the path component of the URI should be resolving to "/", 
which s3guard should resolve (even as a special case). Is it getting dropped?


> S3guard mistakes root URI without / as non-absolute path
> 
>
> Key: HADOOP-15140
> URL: https://issues.apache.org/jira/browse/HADOOP-15140
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Abraham Fine
>Priority: Major
>
> If you call {{getFileStatus("s3a://bucket")}} then S3Guard will throw an 
> exception in putMetadata, as it mistakes the empty path for "non-absolute 
> path"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15117) open(PathHandle) contract test should be exhaustive for default options

2017-12-30 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated HADOOP-15117:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.1.0
   Status: Resolved  (was: Patch Available)

Thanks for the reviews, [~elgoiri]. I committed this.

> open(PathHandle) contract test should be exhaustive for default options
> ---
>
> Key: HADOOP-15117
> URL: https://issues.apache.org/jira/browse/HADOOP-15117
> Project: Hadoop Common
>  Issue Type: Test
>Reporter: Chris Douglas
>Assignee: Chris Douglas
> Fix For: 3.1.0
>
> Attachments: HADOOP-15117.000.patch, HADOOP-15117.001.patch, 
> HADOOP-15117.002.patch, HADOOP-15117.003.patch
>
>
> The current {{AbstractContractOpenTest}} covers many, but not all of the 
> permutations of the default {{HandleOpt}}. It could also be refactored to be 
> clearer as documentation



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15117) open(PathHandle) contract test should be exhaustive for default options

2017-12-29 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated HADOOP-15117:
---
Attachment: HADOOP-15117.003.patch

> open(PathHandle) contract test should be exhaustive for default options
> ---
>
> Key: HADOOP-15117
> URL: https://issues.apache.org/jira/browse/HADOOP-15117
> Project: Hadoop Common
>  Issue Type: Test
>Reporter: Chris Douglas
>Assignee: Chris Douglas
> Attachments: HADOOP-15117.000.patch, HADOOP-15117.001.patch, 
> HADOOP-15117.002.patch, HADOOP-15117.003.patch
>
>
> The current {{AbstractContractOpenTest}} covers many, but not all of the 
> permutations of the default {{HandleOpt}}. It could also be refactored to be 
> clearer as documentation



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15117) open(PathHandle) contract test should be exhaustive for default options

2017-12-29 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16306632#comment-16306632
 ] 

Chris Douglas commented on HADOOP-15117:


bq. In AbstractContractPathHandleTest, do we need the if checks in the catch 
after 180? We should've made sure those were true before. Are we covering 
anything there?
Yes, that's where the test verifies that if an exception was thrown, it must be 
because the rename or the modification was disallowed (because it's the 
expression in the {{try}} that throws). If one is allowed, then it must have 
been the other that caused the exception. In the case where both are 
disallowed, we don't need an assertion because we've already checked both 
conditions.

bq. Can we describe the serialize in TestHDFSContractPathHandle? Maybe move it 
from Boolean to boolean too.
Sure, I'll add a comment. IIRC the change to {{Boolean}} was in service of an 
invocation problem from {{Parameterized}}, but it might have been trying to 
work around the {{name}} issue. I'll make sure it's necessary.

> open(PathHandle) contract test should be exhaustive for default options
> ---
>
> Key: HADOOP-15117
> URL: https://issues.apache.org/jira/browse/HADOOP-15117
> Project: Hadoop Common
>  Issue Type: Test
>Reporter: Chris Douglas
>Assignee: Chris Douglas
> Attachments: HADOOP-15117.000.patch, HADOOP-15117.001.patch, 
> HADOOP-15117.002.patch
>
>
> The current {{AbstractContractOpenTest}} covers many, but not all of the 
> permutations of the default {{HandleOpt}}. It could also be refactored to be 
> clearer as documentation



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15128) TestViewFileSystem tests are broken in trunk

2017-12-29 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16306484#comment-16306484
 ] 

Chris Douglas commented on HADOOP-15128:


How close are we to a solution? HADOOP-10054 is neither critical nor difficult 
to maintain, so we should revert it if this isn't a simple fix.

> TestViewFileSystem tests are broken in trunk
> 
>
> Key: HADOOP-15128
> URL: https://issues.apache.org/jira/browse/HADOOP-15128
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: viewfs
>Affects Versions: 3.1.0
>Reporter: Anu Engineer
>Assignee: Hanisha Koneru
>
> The fix in Hadoop-10054 seems to have caused a test failure. Please take a 
> look. Thanks [~eyang] for reporting this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15117) open(PathHandle) contract test should be exhaustive for default options

2017-12-29 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16306474#comment-16306474
 ] 

Chris Douglas commented on HADOOP-15117:


The failing tests are being tracked in HADOOP-15128.

> open(PathHandle) contract test should be exhaustive for default options
> ---
>
> Key: HADOOP-15117
> URL: https://issues.apache.org/jira/browse/HADOOP-15117
> Project: Hadoop Common
>  Issue Type: Test
>Reporter: Chris Douglas
>Assignee: Chris Douglas
> Attachments: HADOOP-15117.000.patch, HADOOP-15117.001.patch, 
> HADOOP-15117.002.patch
>
>
> The current {{AbstractContractOpenTest}} covers many, but not all of the 
> permutations of the default {{HandleOpt}}. It could also be refactored to be 
> clearer as documentation



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15117) open(PathHandle) contract test should be exhaustive for default options

2017-12-29 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated HADOOP-15117:
---
Attachment: HADOOP-15117.002.patch

Fixed checkstyle, missing license. The unit test failures are also on trunk, 
unfortunately.

> open(PathHandle) contract test should be exhaustive for default options
> ---
>
> Key: HADOOP-15117
> URL: https://issues.apache.org/jira/browse/HADOOP-15117
> Project: Hadoop Common
>  Issue Type: Test
>Reporter: Chris Douglas
>Assignee: Chris Douglas
> Attachments: HADOOP-15117.000.patch, HADOOP-15117.001.patch, 
> HADOOP-15117.002.patch
>
>
> The current {{AbstractContractOpenTest}} covers many, but not all of the 
> permutations of the default {{HandleOpt}}. It could also be refactored to be 
> clearer as documentation



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15146) Remove DataOutputByteBuffer

2017-12-28 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated HADOOP-15146:
---
Hadoop Flags: Incompatible change

> Remove DataOutputByteBuffer
> ---
>
> Key: HADOOP-15146
> URL: https://issues.apache.org/jira/browse/HADOOP-15146
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: common
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Minor
> Attachments: HADOOP-15146.1.patch
>
>
> I can't seem to find any references to {{DataOutputByteBuffer}} maybe it 
> should be deprecated or simply removed?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15117) open(PathHandle) contract test should be exhaustive for default options

2017-12-28 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated HADOOP-15117:
---
Attachment: HADOOP-15117.001.patch

Thanks for taking a look, [~elgoiri].

bq. Should we make sure we trigger the exception? For example, should testMoved 
have a fail() after the verifyRead() that expects exceptions?
Instead of the try/fail, catch/ignore pattern this checks the option to 
determine if the exception should have been thrown (and vice versa). The 
{{verifyRead}} ensures the referent is correct with the expected content.

bq. We do the serialization of the PathHandle a bunch of times in the tests, 
I'm not sure we should add it to the handle itself but wrapping it in a 
function may make sense.
Good point. Changed whether the handle is first serialized to be another 
parameter for the tests. Either {{Paramaterized}} has a bug, or I'm using the 
naming incorrectly, because including this parameter in the name causes odd 
test failures. As-is, it's enough for someone to figure out what's going wrong.

bq. Similarly to what you do in testChangedAndMoved(), you could split (with 
extra lines), the part that does the changes/moves from the ones that do the 
checks.
Done

bq. Does it make sense to fail() in the unreachable statement case for 
getHandleOrSkip()
Unfortunately the return statement is still required. It really is unreachable 
if {{skip}} does what it's supposed to do.

> open(PathHandle) contract test should be exhaustive for default options
> ---
>
> Key: HADOOP-15117
> URL: https://issues.apache.org/jira/browse/HADOOP-15117
> Project: Hadoop Common
>  Issue Type: Test
>Reporter: Chris Douglas
>Assignee: Chris Douglas
> Attachments: HADOOP-15117.000.patch, HADOOP-15117.001.patch
>
>
> The current {{AbstractContractOpenTest}} covers many, but not all of the 
> permutations of the default {{HandleOpt}}. It could also be refactored to be 
> clearer as documentation



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15117) open(PathHandle) contract test should be exhaustive for default options

2017-12-26 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated HADOOP-15117:
---
Assignee: Chris Douglas
  Status: Patch Available  (was: Open)

> open(PathHandle) contract test should be exhaustive for default options
> ---
>
> Key: HADOOP-15117
> URL: https://issues.apache.org/jira/browse/HADOOP-15117
> Project: Hadoop Common
>  Issue Type: Test
>Reporter: Chris Douglas
>Assignee: Chris Douglas
> Attachments: HADOOP-15117.000.patch
>
>
> The current {{AbstractContractOpenTest}} covers many, but not all of the 
> permutations of the default {{HandleOpt}}. It could also be refactored to be 
> clearer as documentation



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15117) open(PathHandle) contract test should be exhaustive for default options

2017-12-26 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated HADOOP-15117:
---
Attachment: HADOOP-15117.000.patch

Added a new contract test to validate the {{PathHandle}} defaults

/cc [~elgoiri], [~ste...@apache.org]

> open(PathHandle) contract test should be exhaustive for default options
> ---
>
> Key: HADOOP-15117
> URL: https://issues.apache.org/jira/browse/HADOOP-15117
> Project: Hadoop Common
>  Issue Type: Test
>Reporter: Chris Douglas
> Attachments: HADOOP-15117.000.patch
>
>
> The current {{AbstractContractOpenTest}} covers many, but not all of the 
> permutations of the default {{HandleOpt}}. It could also be refactored to be 
> clearer as documentation



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15133) [JDK9] Ignore com.sun.javadoc.* and com.sun.tools.* in animal-sniffer-maven-plugin to compile with Java 9

2017-12-20 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16299292#comment-16299292
 ] 

Chris Douglas commented on HADOOP-15133:


+1 lgtm

I looked, but couldn't find a profile that included the unsafe APIs. Since 
these are more likely to change, ignoring these packages is unfortunate but 
necessary.

> [JDK9] Ignore com.sun.javadoc.* and com.sun.tools.* in 
> animal-sniffer-maven-plugin to compile with Java 9
> -
>
> Key: HADOOP-15133
> URL: https://issues.apache.org/jira/browse/HADOOP-15133
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
> Attachments: HADOOP-15133.001.patch
>
>
> com.sun.javadoc and com.sun.tools are internal APIs and are not included in 
> java18 profile, so signature check fails with JDK9.
> {noformat}
> $ mvn clean install -DskipTests -DskipShade
> (snip)
> [INFO] --- animal-sniffer-maven-plugin:1.16:check (signature-check) @ 
> hadoop-annotations ---
> [INFO] Checking unresolved references to 
> org.codehaus.mojo.signature:java18:1.0
> [ERROR] 
> /Users/ajisaka/git/hadoop/hadoop-common-project/hadoop-annotations/src/main/java/org/apache/hadoop/classification/tools/RootDocProcessor.java:56:
>  Undefined reference: com.sun.javadoc.RootDoc
> (snip)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15106) FileSystem::open(PathHandle) should throw a specific exception on validation failure

2017-12-16 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated HADOOP-15106:
---
   Resolution: Fixed
 Assignee: Chris Douglas
 Hadoop Flags: Reviewed
Fix Version/s: 3.1.0
   Status: Resolved  (was: Patch Available)

Thanks, [~elgoiri]. I committed this.

> FileSystem::open(PathHandle) should throw a specific exception on validation 
> failure
> 
>
> Key: HADOOP-15106
> URL: https://issues.apache.org/jira/browse/HADOOP-15106
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Chris Douglas
>Assignee: Chris Douglas
>Priority: Minor
> Fix For: 3.1.0
>
> Attachments: HADOOP-15106.00.patch, HADOOP-15106.01.patch, 
> HADOOP-15106.02.patch, HADOOP-15106.03.patch, HADOOP-15106.04.patch
>
>
> Callers of {{FileSystem::open(PathHandle)}} cannot distinguish between I/O 
> errors and an invalid handle. The signature should include a specific, 
> checked exception for this case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Moved] (HADOOP-15117) open(PathHandle) contract test should be exhaustive for default options

2017-12-14 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas moved HDFS-12926 to HADOOP-15117:
---

Target Version/s: 3.1.0  (was: 3.1.0)
 Key: HADOOP-15117  (was: HDFS-12926)
 Project: Hadoop Common  (was: Hadoop HDFS)

> open(PathHandle) contract test should be exhaustive for default options
> ---
>
> Key: HADOOP-15117
> URL: https://issues.apache.org/jira/browse/HADOOP-15117
> Project: Hadoop Common
>  Issue Type: Test
>Reporter: Chris Douglas
>
> The current {{AbstractContractOpenTest}} covers many, but not all of the 
> permutations of the default {{HandleOpt}}. It could also be refactored to be 
> clearer as documentation



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15106) FileSystem::open(PathHandle) should throw a specific exception on validation failure

2017-12-14 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated HADOOP-15106:
---
Attachment: HADOOP-15106.04.patch

On reflection... the contract test should be exhaustive and be easier to read. 
Opened HADOOP-15117

> FileSystem::open(PathHandle) should throw a specific exception on validation 
> failure
> 
>
> Key: HADOOP-15106
> URL: https://issues.apache.org/jira/browse/HADOOP-15106
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Chris Douglas
>Priority: Minor
> Attachments: HADOOP-15106.00.patch, HADOOP-15106.01.patch, 
> HADOOP-15106.02.patch, HADOOP-15106.03.patch, HADOOP-15106.04.patch
>
>
> Callers of {{FileSystem::open(PathHandle)}} cannot distinguish between I/O 
> errors and an invalid handle. The signature should include a specific, 
> checked exception for this case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15106) FileSystem::open(PathHandle) should throw a specific exception on validation failure

2017-12-14 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16291380#comment-16291380
 ] 

Chris Douglas commented on HADOOP-15106:


bq. Is there any unit test covering this?
Yes, the contract tests. I added more checks as part of HDFS-12882, but they're 
not exhaustive. The negative cases are updated to catch 
{{InvalidPathHandleException}} instead of {{IOException}} in the patch.

bq. How about moving InvalidPathHandleException above IOException in the 
javadoc to be consistent with others like getFileStatus (e.g., 
FileNotFoundException goes above IOException)
Sure, np.

> FileSystem::open(PathHandle) should throw a specific exception on validation 
> failure
> 
>
> Key: HADOOP-15106
> URL: https://issues.apache.org/jira/browse/HADOOP-15106
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Chris Douglas
>Priority: Minor
> Attachments: HADOOP-15106.00.patch, HADOOP-15106.01.patch, 
> HADOOP-15106.02.patch, HADOOP-15106.03.patch
>
>
> Callers of {{FileSystem::open(PathHandle)}} cannot distinguish between I/O 
> errors and an invalid handle. The signature should include a specific, 
> checked exception for this case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15106) FileSystem::open(PathHandle) should throw a specific exception on validation failure

2017-12-14 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated HADOOP-15106:
---
Attachment: HADOOP-15106.03.patch

Good idea. Added an example to the doc

> FileSystem::open(PathHandle) should throw a specific exception on validation 
> failure
> 
>
> Key: HADOOP-15106
> URL: https://issues.apache.org/jira/browse/HADOOP-15106
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Chris Douglas
>Priority: Minor
> Attachments: HADOOP-15106.00.patch, HADOOP-15106.01.patch, 
> HADOOP-15106.02.patch, HADOOP-15106.03.patch
>
>
> Callers of {{FileSystem::open(PathHandle)}} cannot distinguish between I/O 
> errors and an invalid handle. The signature should include a specific, 
> checked exception for this case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15106) FileSystem::open(PathHandle) should throw a specific exception on validation failure

2017-12-14 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16291287#comment-16291287
 ] 

Chris Douglas commented on HADOOP-15106:


/cc [~virajith], [~ste...@apache.org], [~elgoiri]

> FileSystem::open(PathHandle) should throw a specific exception on validation 
> failure
> 
>
> Key: HADOOP-15106
> URL: https://issues.apache.org/jira/browse/HADOOP-15106
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Chris Douglas
>Priority: Minor
> Attachments: HADOOP-15106.00.patch, HADOOP-15106.01.patch, 
> HADOOP-15106.02.patch
>
>
> Callers of {{FileSystem::open(PathHandle)}} cannot distinguish between I/O 
> errors and an invalid handle. The signature should include a specific, 
> checked exception for this case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15106) FileSystem::open(PathHandle) should throw a specific exception on validation failure

2017-12-14 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated HADOOP-15106:
---
Attachment: HADOOP-15106.02.patch

Fixed checkstyle. The unit tests failures are related, but the failure was in 
removing the {{final}} modifier from {{getPathHandle(Path,HandleOpt...)}}. I'd 
forgotten why that was required; added a comment.

> FileSystem::open(PathHandle) should throw a specific exception on validation 
> failure
> 
>
> Key: HADOOP-15106
> URL: https://issues.apache.org/jira/browse/HADOOP-15106
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Chris Douglas
>Priority: Minor
> Attachments: HADOOP-15106.00.patch, HADOOP-15106.01.patch, 
> HADOOP-15106.02.patch
>
>
> Callers of {{FileSystem::open(PathHandle)}} cannot distinguish between I/O 
> errors and an invalid handle. The signature should include a specific, 
> checked exception for this case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15106) FileSystem::open(PathHandle) should throw a specific exception on validation failure

2017-12-14 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated HADOOP-15106:
---
Attachment: HADOOP-15106.01.patch

> FileSystem::open(PathHandle) should throw a specific exception on validation 
> failure
> 
>
> Key: HADOOP-15106
> URL: https://issues.apache.org/jira/browse/HADOOP-15106
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Chris Douglas
>Priority: Minor
> Attachments: HADOOP-15106.00.patch, HADOOP-15106.01.patch
>
>
> Callers of {{FileSystem::open(PathHandle)}} cannot distinguish between I/O 
> errors and an invalid handle. The signature should include a specific, 
> checked exception for this case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15106) FileSystem::open(PathHandle) should throw a specific exception on validation failure

2017-12-14 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated HADOOP-15106:
---
Attachment: (was: HDFS-15106.01.patch)

> FileSystem::open(PathHandle) should throw a specific exception on validation 
> failure
> 
>
> Key: HADOOP-15106
> URL: https://issues.apache.org/jira/browse/HADOOP-15106
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Chris Douglas
>Priority: Minor
> Attachments: HADOOP-15106.00.patch, HADOOP-15106.01.patch
>
>
> Callers of {{FileSystem::open(PathHandle)}} cannot distinguish between I/O 
> errors and an invalid handle. The signature should include a specific, 
> checked exception for this case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15106) FileSystem::open(PathHandle) should throw a specific exception on validation failure

2017-12-14 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated HADOOP-15106:
---
Attachment: HDFS-15106.01.patch

> FileSystem::open(PathHandle) should throw a specific exception on validation 
> failure
> 
>
> Key: HADOOP-15106
> URL: https://issues.apache.org/jira/browse/HADOOP-15106
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Chris Douglas
>Priority: Minor
> Attachments: HADOOP-15106.00.patch, HDFS-15106.01.patch
>
>
> Callers of {{FileSystem::open(PathHandle)}} cannot distinguish between I/O 
> errors and an invalid handle. The signature should include a specific, 
> checked exception for this case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15106) FileSystem::open(PathHandle) should throw a specific exception on validation failure

2017-12-14 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated HADOOP-15106:
---
Status: Patch Available  (was: Open)

> FileSystem::open(PathHandle) should throw a specific exception on validation 
> failure
> 
>
> Key: HADOOP-15106
> URL: https://issues.apache.org/jira/browse/HADOOP-15106
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Chris Douglas
>Priority: Minor
> Attachments: HADOOP-15106.00.patch, HDFS-15106.01.patch
>
>
> Callers of {{FileSystem::open(PathHandle)}} cannot distinguish between I/O 
> errors and an invalid handle. The signature should include a specific, 
> checked exception for this case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15106) FileSystem::open(PathHandle) should throw a specific exception on validation failure

2017-12-08 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated HADOOP-15106:
---
Attachment: HADOOP-15106.00.patch

Add exception type. Will rebase on HDFS-12882 after commit.

> FileSystem::open(PathHandle) should throw a specific exception on validation 
> failure
> 
>
> Key: HADOOP-15106
> URL: https://issues.apache.org/jira/browse/HADOOP-15106
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Chris Douglas
>Priority: Minor
> Attachments: HADOOP-15106.00.patch
>
>
> Callers of {{FileSystem::open(PathHandle)}} cannot distinguish between I/O 
> errors and an invalid handle. The signature should include a specific, 
> checked exception for this case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15106) FileSystem::open(PathHandle) should throw a specific exception on validation failure

2017-12-08 Thread Chris Douglas (JIRA)
Chris Douglas created HADOOP-15106:
--

 Summary: FileSystem::open(PathHandle) should throw a specific 
exception on validation failure
 Key: HADOOP-15106
 URL: https://issues.apache.org/jira/browse/HADOOP-15106
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Chris Douglas
Priority: Minor


Callers of {{FileSystem::open(PathHandle)}} cannot distinguish between I/O 
errors and an invalid handle. The signature should include a specific, checked 
exception for this case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



  1   2   3   4   5   6   7   8   >