[jira] [Commented] (HADOOP-15314) Scheme assertion in S3Guard DynamoDBMetadataStore::checkPath is unnecessarily restrictive

2018-03-17 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16403690#comment-16403690
 ] 

Steve Loughran commented on HADOOP-15314:
-

aaron: w.r.t mixing clients, I was thinking "different versions of ASF hadoop 
s3a client"

What [~djhoffman] has probably done is just registered S3A as the implementor 
of s3://, so that any code from EMR will work without changing URLs. You are 
right, trying to share a bucket with EMR s3 and s3a could cause problems, but 
if it's just the fs schema in the JVM, that's a non-issue.

> Scheme assertion in S3Guard DynamoDBMetadataStore::checkPath is unnecessarily 
> restrictive
> -
>
> Key: HADOOP-15314
> URL: https://issues.apache.org/jira/browse/HADOOP-15314
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0
>Reporter: DJ Hoffman
>Priority: Major
>
> In version 3.0.0, the checkPath method for dealing with paths prevents us 
> from using the s3:// scheme when utilizing S3Guard. However, in our 
> core-site.xml we have included 
> {noformat}
>   
>     fs.s3.impl
>     org.apache.hadoop.fs.s3a.S3AFileSystem
>   {noformat}
> which should enforce that s3 prefixed paths go through s3a and are properly 
> compatible with s3guard. We removed the assertion that paths use the s3a 
> scheme (some of our paths use the s3 scheme) and our testing thus far with 
> S3Guard enabled have been positive. We believe the assertion in checkPath is 
> unnecessary and could be expanded to include the s3 and s3n schemes if not 
> dropped altogether or altered in some other way. We're happy to develop and 
> test a patch if the community is amenable to the change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15320) Remove customized getFileBlockLocations for hadoop-azure and hadoop-azure-datalake

2018-03-17 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16403665#comment-16403665
 ] 

Steve Loughran commented on HADOOP-15320:
-

Interesting. In HADOOP-14943 I'd proposed pulling up the azure one to hadoop 
common for shared use, spec a bit tighter what it did and then wire up S3A to 
it too.

Now you are saying for multiTB files we don't need this code at all? well, 
that's good news.
I see your arguments, but do think it will need be bounced past the various 
tools, including: hive, spark, pig to see that it all goes OK. But given S3A is 
using that default with no adverse consequences, I think you'll be right.

As usual: which endpoints did you run the entire hadoop-azure and 
hadoop-azuredatalake test suites?

> Remove customized getFileBlockLocations for hadoop-azure and 
> hadoop-azure-datalake
> --
>
> Key: HADOOP-15320
> URL: https://issues.apache.org/jira/browse/HADOOP-15320
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/adl, fs/azure
>Affects Versions: 2.7.3, 2.9.0, 3.0.0
>Reporter: shanyu zhao
>Assignee: shanyu zhao
>Priority: Major
> Attachments: HADOOP-15320.patch
>
>
> hadoop-azure and hadoop-azure-datalake have its own implementation of 
> getFileBlockLocations(), which faked a list of artificial blocks based on the 
> hard-coded block size. And each block has one host with name "localhost". 
> Take a look at this code:
> [https://github.com/apache/hadoop/blob/release-2.9.0-RC3/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azure/NativeAzureFileSystem.java#L3485]
> This is a unnecessary mock up for a "remote" file system to mimic HDFS. And 
> the problem with this mock is that for large (~TB) files we generates lots of 
> artificial blocks, and FileInputFormat.getSplits() is slow in calculating 
> splits based on these blocks.
> We can safely remove this customized getFileBlockLocations() implementation, 
> fall back to the default FileSystem.getFileBlockLocations() implementation, 
> which is to return 1 block for any file with 1 host "localhost". Note that 
> this doesn't mean we will create much less splits, because the number of 
> splits is still limited by the blockSize in 
> FileInputFormat.computeSplitSize():
> {code:java}
> return Math.max(minSize, Math.min(goalSize, blockSize));{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15322) LDAPGroupMapping search tree base improvement

2018-03-17 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HADOOP-15322:
---
Fix Version/s: (was: 2.7.6)

> LDAPGroupMapping search tree base improvement
> -
>
> Key: HADOOP-15322
> URL: https://issues.apache.org/jira/browse/HADOOP-15322
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: common, security
>Affects Versions: 2.7.4
>Reporter: Ganesh
>Priority: Major
>
> Currently the same ldap base is used for searching posixAccount and 
> posixGroup. This request is to make a separate base for each container (ie 
> posixAccount and posixGroup container)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15322) LDAPGroupMapping search tree base improvement

2018-03-17 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HADOOP-15322:
---
Component/s: security

> LDAPGroupMapping search tree base improvement
> -
>
> Key: HADOOP-15322
> URL: https://issues.apache.org/jira/browse/HADOOP-15322
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: common, security
>Affects Versions: 2.7.4
>Reporter: Ganesh
>Priority: Major
>
> Currently the same ldap base is used for searching posixAccount and 
> posixGroup. This request is to make a separate base for each container (ie 
> posixAccount and posixGroup container)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15322) LDAPGroupMapping search tree base improvement

2018-03-17 Thread Ganesh (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16403531#comment-16403531
 ] 

Ganesh commented on HADOOP-15322:
-


Looking through the code:

..
String LDAP_CONFIG_PREFIX = "hadoop.security.group.mapping.ldap";
String BASE_DN_KEY = LDAP_CONFIG_PREFIX + ".base";

baseDN = conf.get(BASE_DN_KEY, BASE_DN_DEFAULT);
..
and this baseDN is used in search for posixAccount and posixGroup .

..
NamingEnumeration results = ctx.search(baseDN,
userSearchFilter,
new Object[]{user},
SEARCH_CONTROLS);
..
groupResults =
  ctx.search(baseDN,
  "(&"+ groupSearchFilter + "(|(" + posixGidAttr + "={0})" +
  "(" + groupMemberAttr + "={1})))",
  new Object[] { gidNumber, uidNumber },
  SEARCH_CONTROLS);


Because the same baseDN is used in the the search, we are forced to set the 
search base of the ldap tree from dc=XX,dc=YY,dc=ZZ. This is generally not a 
problem. But most ldap servers have a limit on the number of entries returned 
in the search result(usually 2K to 10K) as a measure to prevent DDoS. 

If we can add 2 keys something  like
{code}
hadoop.security.group.mapping.ldap.base.user
hadoop.security.group.mapping.ldap.base.group
{code}

Then we could use valueof 'hadoop.security.group.mapping.ldap.base.user' to 
search posixAccount and
use valueof 'hadoop.security.group.mapping.ldap.base.group' to search for 
posixGroup and avoid searching a larger tree rooted from dc=XX,dc=YY,dc=ZZ . 
This would also help minimize the number of entries returned in the search 
result. 

(ofcourse another option is to use paged search result support)

> LDAPGroupMapping search tree base improvement
> -
>
> Key: HADOOP-15322
> URL: https://issues.apache.org/jira/browse/HADOOP-15322
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: common
>Affects Versions: 2.7.4
>Reporter: Ganesh
>Priority: Major
> Fix For: 2.7.6
>
>
> Currently the same ldap base is used for searching posixAccount and 
> posixGroup. This request is to make a separate base for each container (ie 
> posixAccount and posixGroup container)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15322) LDAPGroupMapping search tree base improvement

2018-03-17 Thread Ganesh (JIRA)
Ganesh created HADOOP-15322:
---

 Summary: LDAPGroupMapping search tree base improvement
 Key: HADOOP-15322
 URL: https://issues.apache.org/jira/browse/HADOOP-15322
 Project: Hadoop Common
  Issue Type: Improvement
  Components: common
Affects Versions: 2.7.4
Reporter: Ganesh
 Fix For: 2.7.6


Currently the same ldap base is used for searching posixAccount and posixGroup. 
This request is to make a separate base for each container (ie posixAccount and 
posixGroup container)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15262) AliyunOSS: rename() to move files in a directory in parallel

2018-03-17 Thread wujinhu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16403455#comment-16403455
 ] 

wujinhu commented on HADOOP-15262:
--

Thanks [~Sammi] for your comments. I have fixed the code style and attached 
patch file for branch-2!

> AliyunOSS: rename() to move files in a directory in parallel
> 
>
> Key: HADOOP-15262
> URL: https://issues.apache.org/jira/browse/HADOOP-15262
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Major
> Attachments: HADOOP-15262-branch-2.001.patch, HADOOP-15262.001.patch, 
> HADOOP-15262.002.patch, HADOOP-15262.003.patch, HADOOP-15262.004.patch, 
> HADOOP-15262.005.patch, HADOOP-15262.006.patch, HADOOP-15262.007.patch
>
>
> Currently, rename() operation renames files in series. This will be slow if a 
> directory contains many files. So we can improve this by rename files in 
> parallel.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15262) AliyunOSS: rename() to move files in a directory in parallel

2018-03-17 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16403327#comment-16403327
 ] 

genericqa commented on HADOOP-15262:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 22m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
1s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} branch-2 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 
 2s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
22s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
19s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
29s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} branch-2 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
17s{color} | {color:green} hadoop-aliyun in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 40m 54s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:dbd69cb |
| JIRA Issue | HADOOP-15262 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12914992/HADOOP-15262-branch-2.001.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux ae930a9ef0a1 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | branch-2 / 204674f |
| maven | version: Apache Maven 3.3.9 
(bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-10T16:41:47+00:00) |
| Default Java | 1.7.0_151 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14325/testReport/ |
| Max. process+thread count | 66 (vs. ulimit of 1) |
| modules | C: hadoop-tools/hadoop-aliyun U: hadoop-tools/hadoop-aliyun |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14325/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> AliyunOSS: rename() to move files in a directory in parallel
> 
>
> Key: HADOOP-15262
> URL: https://issues.apache.org/jira/browse/HADOOP-15262
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Major
> Attachments: HADOOP-15262-branch-2.001.patch, HADOOP-15262.001.patch, 
> HADOOP-15262.002.patch, HADOOP-1526

[jira] [Updated] (HADOOP-15262) AliyunOSS: rename() to move files in a directory in parallel

2018-03-17 Thread wujinhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15262:
-
Attachment: HADOOP-15262-branch-2.001.patch

> AliyunOSS: rename() to move files in a directory in parallel
> 
>
> Key: HADOOP-15262
> URL: https://issues.apache.org/jira/browse/HADOOP-15262
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Major
> Attachments: HADOOP-15262-branch-2.001.patch, HADOOP-15262.001.patch, 
> HADOOP-15262.002.patch, HADOOP-15262.003.patch, HADOOP-15262.004.patch, 
> HADOOP-15262.005.patch, HADOOP-15262.006.patch, HADOOP-15262.007.patch
>
>
> Currently, rename() operation renames files in series. This will be slow if a 
> directory contains many files. So we can improve this by rename files in 
> parallel.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org