date:20170706

[jira] [Comment Edited] (HADOOP-14444) New implementation of ftp and sftp filesystems

2017-07-06 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16077520#comment-16077520
 ] 

Hongyuan Li edited comment on HADOOP-1 at 7/7/17 3:24 AM:
--

socket is complex, i don't like to open a new socket just to seek.
Jsch and commons-net has plenty of examples, so if you want to make full use of 
it, you should deep into their implements.Also, commons-net's setTimeOut like 
method may stuck in some situations when the network environment is very poor.


was (Author: hongyuan li):
socket is complex, i don't like to open a new socket just to seek.

> New implementation of ftp and sftp filesystems
> --
>
> Key: HADOOP-1
> URL: https://issues.apache.org/jira/browse/HADOOP-1
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs
>Affects Versions: 2.8.0
>Reporter: Lukas Waldmann
>Assignee: Lukas Waldmann
> Attachments: HADOOP-1.2.patch, HADOOP-1.3.patch, 
> HADOOP-1.4.patch, HADOOP-1.5.patch, HADOOP-1.patch
>
>
> Current implementation of FTP and SFTP filesystems have severe limitations 
> and performance issues when dealing with high number of files. Mine patch 
> solve those issues and integrate both filesystems such a way that most of the 
> core functionality is common for both and therefore simplifying the 
> maintainability.
> The core features:
> * Support for HTTP/SOCKS proxies
> * Support for passive FTP
> * Support of connection pooling - new connection is not created for every 
> single command but reused from the pool.
> For huge number of files it shows order of magnitude performance improvement 
> over not pooled connections.
> * Caching of directory trees. For ftp you always need to list whole directory 
> whenever you ask information about particular file.
> Again for huge number of files it shows order of magnitude performance 
> improvement over not cached connections.
> * Support of keep alive (NOOP) messages to avoid connection drops
> * Support for Unix style or regexp wildcard glob - useful for listing a 
> particular files across whole directory tree
> * Support for reestablishing broken ftp data transfers - can happen 
> surprisingly often



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-14444) New implementation of ftp and sftp filesystems

2017-07-06 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16077520#comment-16077520
 ] 

Hongyuan Li commented on HADOOP-1:
--

socket is complex, i don't like to open a new socket just to seek.

> New implementation of ftp and sftp filesystems
> --
>
> Key: HADOOP-1
> URL: https://issues.apache.org/jira/browse/HADOOP-1
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs
>Affects Versions: 2.8.0
>Reporter: Lukas Waldmann
>Assignee: Lukas Waldmann
> Attachments: HADOOP-1.2.patch, HADOOP-1.3.patch, 
> HADOOP-1.4.patch, HADOOP-1.5.patch, HADOOP-1.patch
>
>
> Current implementation of FTP and SFTP filesystems have severe limitations 
> and performance issues when dealing with high number of files. Mine patch 
> solve those issues and integrate both filesystems such a way that most of the 
> core functionality is common for both and therefore simplifying the 
> maintainability.
> The core features:
> * Support for HTTP/SOCKS proxies
> * Support for passive FTP
> * Support of connection pooling - new connection is not created for every 
> single command but reused from the pool.
> For huge number of files it shows order of magnitude performance improvement 
> over not pooled connections.
> * Caching of directory trees. For ftp you always need to list whole directory 
> whenever you ask information about particular file.
> Again for huge number of files it shows order of magnitude performance 
> improvement over not cached connections.
> * Support of keep alive (NOOP) messages to avoid connection drops
> * Support for Unix style or regexp wildcard glob - useful for listing a 
> particular files across whole directory tree
> * Support for reestablishing broken ftp data transfers - can happen 
> surprisingly often



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HADOOP-14623) KafkaSink#init should set acks to 1,not 0

2017-07-06 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076628#comment-16076628
 ] 

Hongyuan Li edited comment on HADOOP-14623 at 7/7/17 3:19 AM:
--

futuremore, flush method is to confirm that data has been written.

*Update/Crorrection*
sorry, it is the {{putMetrics}} method.
in {{KafkaSink}}#{{putMetrics}} , code lists below， which makes me have a 
different opinion:
{code}
……
Future future = producer.send(data);
jsonLines.setLength(0);
try {
  future.get();
} catch (InterruptedException e) {
  throw new MetricsException("Error sending data", e);
} catch (ExecutionException e) {
  throw new MetricsException("Error sending data", e);
}

……
{code}


was (Author: hongyuan li):
futuremore, flush method is to confirm that data has been written.

> KafkaSink#init should set acks to 1,not 0
> -
>
> Key: HADOOP-14623
> URL: https://issues.apache.org/jira/browse/HADOOP-14623
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
> Attachments: HADOOP-14623-001.patch
>
>
> {{KafkaSink}}#{{init}}  should set ack to *1* to make sure the message has 
> been written to the broker at least.
> current code list below:
> {code}
>   
> props.put("request.required.acks", "0");
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-14548) S3Guard: issues running parallel tests w/ S3N

2017-07-06 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16077510#comment-16077510
 ] 

Hadoop QA commented on HADOOP-14548:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
10s{color} | {color:blue} Docker mode activated. {color} |
| {color:green} 1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green} 1{color} | {color:green} mvninstall {color} | {color:green} 18m 
35s{color} | {color:green} HADOOP-13345 passed {color} |
| {color:green} 1{color} | {color:green} mvnsite {color} | {color:green}  0m 
25s{color} | {color:green} HADOOP-13345 passed {color} |
| {color:green} 1{color} | {color:green} mvnsite {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 4 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green} 1{color} | {color:green} asflicense {color} | {color:green}  0m 
18s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 20m  9s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | HADOOP-14548 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12876017/HADOOP-14548-HADOOP-13345.001.patch
 |
| Optional Tests |  asflicense  mvnsite  |
| uname | Linux c2fd40511cca 3.13.0-117-generic #164-Ubuntu SMP Fri Apr 7 
11:05:26 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | HADOOP-13345 / 309b8c0 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/12727/artifact/patchprocess/whitespace-eol.txt
 |
| modules | C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/12727/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> S3Guard: issues running parallel tests w/ S3N 
> --
>
> Key: HADOOP-14548
> URL: https://issues.apache.org/jira/browse/HADOOP-14548
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Aaron Fabbri
>Assignee: Aaron Fabbri
>Priority: Minor
> Fix For: HADOOP-13345
>
> Attachments: HADOOP-14548-HADOOP-13345.001.patch
>
>
> In general, running S3Guard and parallel tests with S3A and S3N contract 
> tests enabled is asking for trouble:  S3Guard code assumes there are not 
> other non-S3Guard clients modifying the bucket.
> Goal of this JIRA is to:
> - Discuss current failures running `mvn verify -Dparallel-tests -Ds3guard 
> -Ddynamo` with S3A and S3N contract tests configured.
> - Identify any failures here that are worth looking into.
> - Document (or enforce) that people should not do this, or should expect 
> failures if they do.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-14457) create() does not notify metadataStore of parent directories or ensure they're not existing files

2017-07-06 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16077503#comment-16077503
 ] 

Hadoop QA commented on HADOOP-14457:


| (/) *{color:green} 1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
9s{color} | {color:blue} Docker mode activated. {color} |
| {color:green} 1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green} 1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green} 1{color} | {color:green} mvninstall {color} | {color:green} 17m 
30s{color} | {color:green} HADOOP-13345 passed {color} |
| {color:green} 1{color} | {color:green} compile {color} | {color:green}  0m 
22s{color} | {color:green} HADOOP-13345 passed {color} |
| {color:green} 1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green} HADOOP-13345 passed {color} |
| {color:green} 1{color} | {color:green} mvnsite {color} | {color:green}  0m 
25s{color} | {color:green} HADOOP-13345 passed {color} |
| {color:green} 1{color} | {color:green} findbugs {color} | {color:green}  0m 
36s{color} | {color:green} HADOOP-13345 passed {color} |
| {color:green} 1{color} | {color:green} javadoc {color} | {color:green}  0m 
16s{color} | {color:green} HADOOP-13345 passed {color} |
| {color:green} 1{color} | {color:green} mvninstall {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green} 1{color} | {color:green} compile {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green} 1{color} | {color:green} javac {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 14s{color} | {color:orange} hadoop-tools/hadoop-aws: The patch generated 19 
new   20 unchanged - 0 fixed = 39 total (was 20) {color} |
| {color:green} 1{color} | {color:green} mvnsite {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green} 1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green} 1{color} | {color:green} findbugs {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green} 1{color} | {color:green} javadoc {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
| {color:green} 1{color} | {color:green} unit {color} | {color:green}  0m 
45s{color} | {color:green} hadoop-aws in the patch passed. {color} |
| {color:green} 1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 27m 19s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | HADOOP-14457 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12876015/HADOOP-14457-HADOOP-13345.010.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 539573c05fba 3.13.0-117-generic #164-Ubuntu SMP Fri Apr 7 
11:05:26 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | HADOOP-13345 / 309b8c0 |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/12726/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-aws.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/12726/testReport/ |
| modules | C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/12726/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> create() does not notify metadataStore of parent directories or ensure 
> they're not existing files
> -
>
> Key: HADOOP-14457
> URL: https://issues.apache.org/jira/browse/HADOOP-14457
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
> Attachments: HADOOP-14457-HADOOP-13345.001.patch, 
> HADOOP-14457-HADOOP-13345.002.patch,

[jira] [Commented] (HADOOP-14468) S3Guard: make short-circuit getFileStatus() configurable

2017-07-06 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16077499#comment-16077499
 ] 

Aaron Fabbri commented on HADOOP-14468:
---

{quote}
That said, looking at all the places we call getFileStatus, it'd be a useful 
little sanity check all round.
{quote}
Yeah. It would be interesting to collect statistics on long-running clusters on 
how often inconsistency happens.

Sounds like we're ok with the behavior of failing after open().  Your example 
of deleted file or inconsistency causing similar behavior is a good point.  
I'll leave this as minor priority for now and focus on HADOOP-14467 first.

> S3Guard: make short-circuit getFileStatus() configurable
> 
>
> Key: HADOOP-14468
> URL: https://issues.apache.org/jira/browse/HADOOP-14468
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Aaron Fabbri
>Assignee: Aaron Fabbri
>Priority: Minor
>
> Currently, when S3Guard is enabled, getFileStatus() will skip S3 if it gets a 
> result from the MetadataStore (e.g. dynamodb) first.
> I would like to add a new parameter 
> {{fs.s3a.metadatastore.getfilestatus.authoritative}} which, when true, keeps 
> the current behavior.  When false, S3AFileSystem will check both S3 and the 
> MetadataStore.
> I'm not sure yet if we want to have this behavior the same for all callers of 
> getFileStatus(), or if we only want to check both S3 and MetadataStore for 
> some internal callers such as open().



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-14548) S3Guard: issues running parallel tests w/ S3N

2017-07-06 Thread Aaron Fabbri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Fabbri updated HADOOP-14548:
--
Fix Version/s: HADOOP-13345
   Status: Patch Available  (was: Open)

> S3Guard: issues running parallel tests w/ S3N 
> --
>
> Key: HADOOP-14548
> URL: https://issues.apache.org/jira/browse/HADOOP-14548
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Aaron Fabbri
>Assignee: Aaron Fabbri
>Priority: Minor
> Fix For: HADOOP-13345
>
> Attachments: HADOOP-14548-HADOOP-13345.001.patch
>
>
> In general, running S3Guard and parallel tests with S3A and S3N contract 
> tests enabled is asking for trouble:  S3Guard code assumes there are not 
> other non-S3Guard clients modifying the bucket.
> Goal of this JIRA is to:
> - Discuss current failures running `mvn verify -Dparallel-tests -Ds3guard 
> -Ddynamo` with S3A and S3N contract tests configured.
> - Identify any failures here that are worth looking into.
> - Document (or enforce) that people should not do this, or should expect 
> failures if they do.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-14548) S3Guard: issues running parallel tests w/ S3N

2017-07-06 Thread Aaron Fabbri (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Fabbri updated HADOOP-14548:
--
Attachment: HADOOP-14548-HADOOP-13345.001.patch

Attaching documentation-only patch (v1).

It looks like the main work here would be to make the S3N tests less flaky.  I 
view this as a low priority, however, so am just adding guidance to the S3A 
testing and S3Guard docs.

> S3Guard: issues running parallel tests w/ S3N 
> --
>
> Key: HADOOP-14548
> URL: https://issues.apache.org/jira/browse/HADOOP-14548
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Aaron Fabbri
>Assignee: Aaron Fabbri
>Priority: Minor
> Attachments: HADOOP-14548-HADOOP-13345.001.patch
>
>
> In general, running S3Guard and parallel tests with S3A and S3N contract 
> tests enabled is asking for trouble:  S3Guard code assumes there are not 
> other non-S3Guard clients modifying the bucket.
> Goal of this JIRA is to:
> - Discuss current failures running `mvn verify -Dparallel-tests -Ds3guard 
> -Ddynamo` with S3A and S3N contract tests configured.
> - Identify any failures here that are worth looking into.
> - Document (or enforce) that people should not do this, or should expect 
> failures if they do.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-14457) create() does not notify metadataStore of parent directories or ensure they're not existing files

2017-07-06 Thread Sean Mackrory (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Mackrory updated HADOOP-14457:
---
Attachment: HADOOP-14457-HADOOP-13345.010.patch

So attaching a patch that follows the design discussed by [~fabbri] and I in 
our last round of comments. This still requires that clients pass in the full 
list of directories that need to be created, and still has DynamoDB enforce 
that itself by ensuring that that set also includes the parents of everything 
in the set. Note that it does this in-memory, without regard for what already 
may exist in the database. It then writes it all in one batch. This is safe for 
now, since directories either exist or they don't and there isn't other 
metadata we store with them, but if we do start storing metadata (like 
authoritative bits) on each directory, I see no other option but to go back to 
doing it 1 level at a time, probably with a round-trip each (unless we can 
somehow request the entire lineage at once, but I don't see how we could do 
that either). That would still line up with [~ste...@apache.org]'s point about 
being a single round-trip when the parent does exist, and only becoming 
excessively slow when it doesn't.

In addition to my usual S3N failures, I started seeing the following failures 
intermittently today - I would see these perhaps 1-in-4 test runs. So far only 
with DynamoDB (with and without authoritative mode), but with it happening 
rarely I can't be sure it's DynamoDB-specific. I reverted my change but 
continue to see the failures:

{code}
testConsistentRenameAfterDelete(org.apache.hadoop.fs.s3a.ITestS3GuardListConsistency)
  Time elapsed: 11.925 sec  <<< FAILURE!
java.lang.AssertionError: Recently renamed dir should not be visible
at org.junit.Assert.fail(Assert.java:88)
at 
org.apache.hadoop.fs.s3a.ITestS3GuardListConsistency.testConsistentRenameAfterDelete(ITestS3GuardListConsistency.java:237)

testConsistentListAfterDelete(org.apache.hadoop.fs.s3a.ITestS3GuardListConsistency)
  Time elapsed: 7.383 sec  <<< FAILURE!
java.lang.AssertionError: null
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertFalse(Assert.java:64)
at org.junit.Assert.assertFalse(Assert.java:74)
at 
org.apache.hadoop.fs.s3a.ITestS3GuardListConsistency.testConsistentListAfterDelete(ITestS3GuardListConsistency.java:191)
{code}

> create() does not notify metadataStore of parent directories or ensure 
> they're not existing files
> -
>
> Key: HADOOP-14457
> URL: https://issues.apache.org/jira/browse/HADOOP-14457
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
> Attachments: HADOOP-14457-HADOOP-13345.001.patch, 
> HADOOP-14457-HADOOP-13345.002.patch, HADOOP-14457-HADOOP-13345.003.patch, 
> HADOOP-14457-HADOOP-13345.004.patch, HADOOP-14457-HADOOP-13345.005.patch, 
> HADOOP-14457-HADOOP-13345.006.patch, HADOOP-14457-HADOOP-13345.007.patch, 
> HADOOP-14457-HADOOP-13345.008.patch, HADOOP-14457-HADOOP-13345.009.patch, 
> HADOOP-14457-HADOOP-13345.010.patch
>
>
> Not a great test yet, but it at least reliably demonstrates the issue. 
> LocalMetadataStore will sometimes erroneously report that a directory is 
> empty with isAuthoritative = true when it *definitely* has children the 
> metadatastore should know about. It doesn't appear to happen if the children 
> are just directory. The fact that it's returning an empty listing is 
> concerning, but the fact that it says it's authoritative *might* be a second 
> bug.
> {code}
> diff --git 
> a/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
>  
> b/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
> index 78b3970..1821d19 100644
> --- 
> a/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
> +++ 
> b/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
> @@ -965,7 +965,7 @@ public boolean hasMetadataStore() {
>}
>  
>@VisibleForTesting
> -  MetadataStore getMetadataStore() {
> +  public MetadataStore getMetadataStore() {
>  return metadataStore;
>}
>  
> diff --git 
> a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractRename.java
>  
> b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractRename.java
> index 4339649..881bdc9 100644
> --- 
> a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractRename.java
> +++ 
>

[jira] [Commented] (HADOOP-14620) S3A authentication failure for regions other than us-east-1

2017-07-06 Thread Aaron Fabbri (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16077403#comment-16077403
 ] 

Aaron Fabbri commented on HADOOP-14620:
---

Looks like you are on a fairly old version.  Can you retest with hadoop trunk?  
I'm guessing it will work.

> S3A authentication failure for regions other than us-east-1
> ---
>
> Key: HADOOP-14620
> URL: https://issues.apache.org/jira/browse/HADOOP-14620
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.8.0, 2.7.3
>Reporter: Ilya Fourmanov
> Attachments: s3-403.txt
>
>
> hadoop fs s3a:// operations fail authentication for s3 buckets hosted in 
> regions other than default us-east-1
> Steps to reproduce:
> # create s3 bucket in eu-west-1
> # Using IAM instance profile or fs.s3a.access.key/fs.s3a.secret.key run 
> following command:
> {code}
> hadoop --loglevel DEBUG  -D fs.s3a.endpoint=s3.eu-west-1.amazonaws.com  -ls  
> s3a://your-eu-west-1-hosted-bucket/ 
> {code}
> Expected behaviour:
> You will see listing of the bucket
> Actual behaviour:
> You will get 403 Authentication Denied response for AWS S3.
> Reason is mismatch in string to sign as defined in 
> http://docs.aws.amazon.com/AmazonS3/latest/dev/RESTAuthentication.html 
> provided by hadoop and expected by AWS. 
> If you use https://aws.amazon.com/code/199 to analyse StringToSignBytes 
> returned by AWS, you will see that AWS expects CanonicalizedResource to be in 
> form  
> /your-eu-west-1-hosted-bucket{color:red}.s3.eu-west-1.amazonaws.com{color}/.
> Hadoop provides it as /your-eu-west-1-hosted-bucket/
> Note that AWS documentation doesn't explicitly state that endpoint or full 
> dns address should be appended to CanonicalizedResource however practice 
> shows it is actually required.
> I've also submitted this to AWS for them to correct behaviour or 
> documentation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-14553) Add (parallelized) integration tests to hadoop-azure

2017-07-06 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16077314#comment-16077314
 ] 

Hadoop QA commented on HADOOP-14553:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
8s{color} | {color:blue} Docker mode activated. {color} |
| {color:green} 1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green} 1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 73 new or modified test 
files. {color} |
| {color:green} 1{color} | {color:green} mvninstall {color} | {color:green} 13m 
13s{color} | {color:green} trunk passed {color} |
| {color:green} 1{color} | {color:green} compile {color} | {color:green}  0m 
18s{color} | {color:green} trunk passed {color} |
| {color:green} 1{color} | {color:green} checkstyle {color} | {color:green}  0m 
17s{color} | {color:green} trunk passed {color} |
| {color:green} 1{color} | {color:green} mvnsite {color} | {color:green}  0m 
20s{color} | {color:green} trunk passed {color} |
| {color:green} 1{color} | {color:green} findbugs {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
| {color:green} 1{color} | {color:green} javadoc {color} | {color:green}  0m 
13s{color} | {color:green} trunk passed {color} |
| {color:green} 1{color} | {color:green} mvninstall {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
| {color:green} 1{color} | {color:green} compile {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:green} 1{color} | {color:green} javac {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 15s{color} | {color:orange} hadoop-tools/hadoop-azure: The patch generated 
123 new   193 unchanged - 114 fixed = 316 total (was 307) {color} |
| {color:green} 1{color} | {color:green} mvnsite {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 16 line(s) that end in whitespace. Use 
git apply --whitespace=fix <>. Refer 
https://git-scm.com/docs/git-apply {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
1s{color} | {color:red} The patch 1 line(s) with tabs. {color} |
| {color:green} 1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green} 1{color} | {color:green} findbugs {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green} 1{color} | {color:green} javadoc {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:green} 1{color} | {color:green} unit {color} | {color:green}  0m 
42s{color} | {color:green} hadoop-azure in the patch passed. {color} |
| {color:green} 1{color} | {color:green} asflicense {color} | {color:green}  0m 
16s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 19m  7s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | HADOOP-14553 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12875991/HADOOP-14553-005.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  xml  findbugs  checkstyle  |
| uname | Linux f13b37e8bf35 3.13.0-117-generic #164-Ubuntu SMP Fri Apr 7 
11:05:26 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 7576a68 |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/12725/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-azure.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/12725/artifact/patchprocess/whitespace-eol.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/12725/artifact/patchprocess/whitespace-tabs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/12725/testReport/ |
| modules | C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/12725/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Add (parallelized) integration tests to

[jira] [Updated] (HADOOP-14553) Add (parallelized) integration tests to hadoop-azure

2017-07-06 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-14553:

Status: Patch Available  (was: Open)

> Add (parallelized) integration tests to hadoop-azure
> 
>
> Key: HADOOP-14553
> URL: https://issues.apache.org/jira/browse/HADOOP-14553
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 2.9.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: HADOOP-14553-001.patch, HADOOP-14553-002.patch, 
> HADOOP-14553-003.patch, HADOOP-14553-004.patch, HADOOP-14553-005.patch
>
>
> The Azure tests are slow to run as they are serialized, as they are all 
> called Test* there's no clear differentiation from unit tests which Jenkins 
> can run, and integration tests which it can't.
> Move the azure tests {{Test*}} to integration tests {{ITest*}}, parallelize 
> (which includes having separate paths for every test suite). The code in 
> hadoop-aws's POM  show what to do.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-14553) Add (parallelized) integration tests to hadoop-azure

2017-07-06 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-14553:

Attachment: HADOOP-14553-005.patch

Patch 005

Move a lot more to the parallel phase of the tests. Some of these are 
explicitly using root relative paths, as they need it (no parent directory 
probes/connect). This will work provided all the tests are set up to use names 
guaranteed to be unique across all test suites/methods.

Testing: each moved test worked well alone; a full bulk test failed (lots of 
timeouts), when run over slow network.& 8 parallel tests

I think I'll bump up the default timeout from 30s to something bigger, even 
though its a sign that there's not enough B/W for 8 tests together. Making the 
timeout 600s would be less brittle over slow connections.



> Add (parallelized) integration tests to hadoop-azure
> 
>
> Key: HADOOP-14553
> URL: https://issues.apache.org/jira/browse/HADOOP-14553
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 2.9.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: HADOOP-14553-001.patch, HADOOP-14553-002.patch, 
> HADOOP-14553-003.patch, HADOOP-14553-004.patch, HADOOP-14553-005.patch
>
>
> The Azure tests are slow to run as they are serialized, as they are all 
> called Test* there's no clear differentiation from unit tests which Jenkins 
> can run, and integration tests which it can't.
> Move the azure tests {{Test*}} to integration tests {{ITest*}}, parallelize 
> (which includes having separate paths for every test suite). The code in 
> hadoop-aws's POM  show what to do.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-14553) Add (parallelized) integration tests to hadoop-azure

2017-07-06 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-14553:

Status: Open  (was: Patch Available)

> Add (parallelized) integration tests to hadoop-azure
> 
>
> Key: HADOOP-14553
> URL: https://issues.apache.org/jira/browse/HADOOP-14553
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 2.9.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: HADOOP-14553-001.patch, HADOOP-14553-002.patch, 
> HADOOP-14553-003.patch, HADOOP-14553-004.patch
>
>
> The Azure tests are slow to run as they are serialized, as they are all 
> called Test* there's no clear differentiation from unit tests which Jenkins 
> can run, and integration tests which it can't.
> Move the azure tests {{Test*}} to integration tests {{ITest*}}, parallelize 
> (which includes having separate paths for every test suite). The code in 
> hadoop-aws's POM  show what to do.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13435) Add thread local mechanism for aggregating file system storage stats

2017-07-06 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16077199#comment-16077199
 ] 

Hadoop QA commented on HADOOP-13435:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
8s{color} | {color:blue} Docker mode activated. {color} |
| {color:green} 1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green} 1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
15s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green} 1{color} | {color:green} mvninstall {color} | {color:green} 13m 
11s{color} | {color:green} trunk passed {color} |
| {color:green} 1{color} | {color:green} compile {color} | {color:green} 13m 
28s{color} | {color:green} trunk passed {color} |
| {color:green} 1{color} | {color:green} checkstyle {color} | {color:green}  1m 
53s{color} | {color:green} trunk passed {color} |
| {color:green} 1{color} | {color:green} mvnsite {color} | {color:green}  2m  
5s{color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
48s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs in trunk has 10 extant 
Findbugs warnings. {color} |
| {color:green} 1{color} | {color:green} javadoc {color} | {color:green}  1m 
39s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
16s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green} 1{color} | {color:green} mvninstall {color} | {color:green}  1m 
30s{color} | {color:green} the patch passed {color} |
| {color:green} 1{color} | {color:green} compile {color} | {color:green} 10m 
10s{color} | {color:green} the patch passed {color} |
| {color:green} 1{color} | {color:green} javac {color} | {color:green} 10m 
10s{color} | {color:green} the patch passed {color} |
| {color:green} 1{color} | {color:green} checkstyle {color} | {color:green}  1m 
56s{color} | {color:green} the patch passed {color} |
| {color:green} 1{color} | {color:green} mvnsite {color} | {color:green}  2m  
7s{color} | {color:green} the patch passed {color} |
| {color:green} 1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green} 1{color} | {color:green} findbugs {color} | {color:green}  3m 
31s{color} | {color:green} the patch passed {color} |
| {color:green} 1{color} | {color:green} javadoc {color} | {color:green}  1m 
45s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  8m  5s{color} 
| {color:red} hadoop-common in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 68m 10s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green} 1{color} | {color:green} asflicense {color} | {color:green}  0m 
40s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}155m 46s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.security.TestKDiag |
|   | hadoop.hdfs.tools.TestDFSZKFailoverController |
|   | hadoop.hdfs.TestDFSStripedInputStreamWithRandomECPolicy |
| Timed out junit tests | 
org.apache.hadoop.hdfs.server.blockmanagement.TestBlockStatsMXBean |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | HADOOP-13435 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12875965/HADOOP-13435.004.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 00099b530712 3.13.0-117-generic #164-Ubuntu SMP Fri Apr 7 
11:05:26 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 7576a68 |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
| findbugs | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/12724/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html
 |
| unit | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/12724/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/12724/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results |

[jira] [Updated] (HADOOP-12802) local FileContext does not rename .crc file

2017-07-06 Thread Andras Bokor (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-12802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andras Bokor updated HADOOP-12802:
--
Target Version/s: 3.0.0-alpha4

> local FileContext does not rename .crc file
> ---
>
> Key: HADOOP-12802
> URL: https://issues.apache.org/jira/browse/HADOOP-12802
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.7.2, 3.0.0-alpha1
>Reporter: Youngjoon Kim
>Assignee: Andras Bokor
>
> After run the following code, "old" file is renamed to "new", but ".old.crc" 
> is not renamed to ".new.crc"
> {code}
> Path oldPath = new Path("/tmp/old");
> Path newPath = new Path("/tmp/new");
> Configuration conf = new Configuration();
> FileContext fc = FileContext.getLocalFSFileContext(conf);
> FSDataOutputStream out = fc.create(oldPath, EnumSet.of(CreateFlag.CREATE));
> out.close();
> fc.rename(oldPath, newPath);
> {code}
> On the other hand, local FileSystem successfully renames .crc file.
> {code}
> Path oldPath = new Path("/tmp/old");
> Path newPath = new Path("/tmp/new");
> Configuration conf = new Configuration();
> FileSystem fs = FileSystem.getLocal(conf);
> FSDataOutputStream out = fs.create(oldPath);
> out.close();
> fs.rename(oldPath, newPath);
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-14548) S3Guard: issues running parallel tests w/ S3N

2017-07-06 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16077153#comment-16077153
 ] 

Steve Loughran commented on HADOOP-14548:
-

S3n tests will all skip if you don't provide the binding endpoint, new feature 
with the move to JUnit4 everywhere

> S3Guard: issues running parallel tests w/ S3N 
> --
>
> Key: HADOOP-14548
> URL: https://issues.apache.org/jira/browse/HADOOP-14548
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Aaron Fabbri
>Assignee: Aaron Fabbri
>Priority: Minor
>
> In general, running S3Guard and parallel tests with S3A and S3N contract 
> tests enabled is asking for trouble:  S3Guard code assumes there are not 
> other non-S3Guard clients modifying the bucket.
> Goal of this JIRA is to:
> - Discuss current failures running `mvn verify -Dparallel-tests -Ds3guard 
> -Ddynamo` with S3A and S3N contract tests configured.
> - Identify any failures here that are worth looking into.
> - Document (or enforce) that people should not do this, or should expect 
> failures if they do.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-13435) Add thread local mechanism for aggregating file system storage stats

2017-07-06 Thread Mingliang Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HADOOP-13435:
---
Attachment: HADOOP-13435.004.patch

> Add thread local mechanism for aggregating file system storage stats
> 
>
> Key: HADOOP-13435
> URL: https://issues.apache.org/jira/browse/HADOOP-13435
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HADOOP-13435.000.patch, HADOOP-13435.001.patch, 
> HADOOP-13435.002.patch, HADOOP-13435.003.patch, HADOOP-13435.004.patch
>
>
> As discussed in [HADOOP-13032], this is to add thread local mechanism for 
> aggregating file system storage stats. This class will also be used in 
> [HADOOP-13031], which is to separate the distance-oriented rack-aware read 
> bytes logic from {{FileSystemStorageStatistics}} to new 
> DFSRackAwareStorageStatistics as it's DFS-specific. After this patch, the 
> {{FileSystemStorageStatistics}} can live without the to-be-removed 
> {{FileSystem$Statistics}} implementation.
> A unit test should also be added.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-14627) Enable new features of ADLS SDK (MSI, Device Code auth)

2017-07-06 Thread John Zhuge (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076950#comment-16076950
 ] 

John Zhuge commented on HADOOP-14627:
-

[~ASikaria] It does not hurt to call ```getPaswordString``` for non-secret 
properties. It is nice to group all related properties together in one place, 
the cred store. Of course, the downside is that non-secret properties are 
placed into cred store which can be confusing.

[~steve_l] What do you think?


> Enable new features of ADLS SDK (MSI, Device Code auth)
> ---
>
> Key: HADOOP-14627
> URL: https://issues.apache.org/jira/browse/HADOOP-14627
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/adl
> Environment: MSI Change applies only to Hadoop running in an Azure VM
>Reporter: Atul Sikaria
>Assignee: Atul Sikaria
> Attachments: HADOOP-14627-001.patch
>
>
> This change is to upgrade the Hadoop ADLS connector to enable new auth 
> features exposed by the ADLS Java SDK.
> Specifically:
> MSI Tokens: MSI (Managed Service Identity) is a way to provide an identity to 
> an Azure Service. In the case of VMs, they can be used to give an identity to 
> a VM deployment. This simplifies managing Service Principals, since the creds 
> don’t have to be managed in core-site files anymore. The way this works is 
> that during VM deployment, the ARM (Azure Resource Manager) template needs to 
> be modified to enable MSI. Once deployed, the MSI extension runs a service on 
> the VM that exposes a token endpoint to http://localhost at a port specified 
> in the template. The SDK has a new TokenProvider to fetch the token from this 
> local endpoint. This change would expose that TokenProvider as an auth option.
> DeviceCode auth: This enables a token to be obtained from an interactive 
> login. The user is given a URL and a token to use on the login screen. User 
> can use the token to login from any device. Once the login is done, the token 
> that is obtained is in the name of the user who logged in. Note that because 
> of the interactive login involved, this is not very suitable for job 
> scenarios, but can work for ad-hoc scenarios like running “hdfs dfs” commands.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-14468) S3Guard: make short-circuit getFileStatus() configurable

2017-07-06 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076845#comment-16076845
 ] 

Steve Loughran commented on HADOOP-14468:
-

That said, looking at all the places we call getFileStatus, it'd be a useful 
little sanity check all round. 

> S3Guard: make short-circuit getFileStatus() configurable
> 
>
> Key: HADOOP-14468
> URL: https://issues.apache.org/jira/browse/HADOOP-14468
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Aaron Fabbri
>Assignee: Aaron Fabbri
>Priority: Minor
>
> Currently, when S3Guard is enabled, getFileStatus() will skip S3 if it gets a 
> result from the MetadataStore (e.g. dynamodb) first.
> I would like to add a new parameter 
> {{fs.s3a.metadatastore.getfilestatus.authoritative}} which, when true, keeps 
> the current behavior.  When false, S3AFileSystem will check both S3 and the 
> MetadataStore.
> I'm not sure yet if we want to have this behavior the same for all callers of 
> getFileStatus(), or if we only want to check both S3 and MetadataStore for 
> some internal callers such as open().



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-14467) S3Guard: Improve FNFE message when opening a stream

2017-07-06 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076839#comment-16076839
 ] 

Steve Loughran commented on HADOOP-14467:
-

without s3guard, this can arise if 

# a file is deleted between the open() returning a reference and the first 
read() call. 
# a file is deleted during a read sequence and a new partial GET of a file is 
made (new seek, new block in the fadvise=random mode).
#  also if a file is deleted while a sequential GET was already in progress and 
a subsequent read() causes this to surface (issue: how does it surface?). If 
it's a read error we'll try and re-open the connection, which should escalate 
it to condition (2)

We can certainly write tests for the first two of these; the final one is 
probably driven by buffer settings in the infrastructure (or indeed, could be 
used to determine what those buffer sizes are)

s3guard adds a new failure: file is in the DDB, but not in the FS.  This will 
surface as a similar situation to #1 above.

Maybe that's something which the FS itself should be made aware of, in a metric 
or callback. There's some incrementing of statistics in the {{S3AInputStream}}, 
but it could actually invoke some callback on the S3A FS to say "we've got a 
failure on read #0 of blob s3a://bucket/file1", which can then trigger other 
actions if the FS is s3guard. It could also think about a callback if the first 
read triggered an EOF as well, as that could be a sign of the file length not 
being what DDB thinks it is.

> S3Guard: Improve FNFE message when opening a stream
> ---
>
> Key: HADOOP-14467
> URL: https://issues.apache.org/jira/browse/HADOOP-14467
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Aaron Fabbri
>Assignee: Aaron Fabbri
>Priority: Minor
>
> Following up on the [discussion on 
> HADOOP-13345|https://issues.apache.org/jira/browse/HADOOP-13345?focusedCommentId=16030050=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16030050],
>  because S3Guard can serve getFileStatus() from the MetadataStore without 
> doing a HEAD on S3, a FileNotFound error on a file due to S3 GET 
> inconsistency does not happen on open(), but on the first read of the stream. 
>  We may add retries to the S3 client in the future, but for now we should 
> have an exception message that indicates this may be due to inconsistency 
> (assuming it isn't a more straightforward case like someone deleting the 
> object out from under you).
> This is expected to be a rare case, since the S3 service is now mostly 
> consistent for GET.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-14627) Enable new features of ADLS SDK (MSI, Device Code auth)

2017-07-06 Thread Atul Sikaria (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076836#comment-16076836
 ] 

Atul Sikaria commented on HADOOP-14627:
---

Thanks [~ste...@apache.org] for reviewing. Will do a test, and fix the casing 
on the property name. Note on the test that it is only meaningful if run from 
within an Azure VM - MSI service will not be present anywhere else.

The properties are not secrets - ok to have in cleartext.


> Enable new features of ADLS SDK (MSI, Device Code auth)
> ---
>
> Key: HADOOP-14627
> URL: https://issues.apache.org/jira/browse/HADOOP-14627
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/adl
> Environment: MSI Change applies only to Hadoop running in an Azure VM
>Reporter: Atul Sikaria
>Assignee: Atul Sikaria
> Attachments: HADOOP-14627-001.patch
>
>
> This change is to upgrade the Hadoop ADLS connector to enable new auth 
> features exposed by the ADLS Java SDK.
> Specifically:
> MSI Tokens: MSI (Managed Service Identity) is a way to provide an identity to 
> an Azure Service. In the case of VMs, they can be used to give an identity to 
> a VM deployment. This simplifies managing Service Principals, since the creds 
> don’t have to be managed in core-site files anymore. The way this works is 
> that during VM deployment, the ARM (Azure Resource Manager) template needs to 
> be modified to enable MSI. Once deployed, the MSI extension runs a service on 
> the VM that exposes a token endpoint to http://localhost at a port specified 
> in the template. The SDK has a new TokenProvider to fetch the token from this 
> local endpoint. This change would expose that TokenProvider as an auth option.
> DeviceCode auth: This enables a token to be obtained from an interactive 
> login. The user is given a URL and a token to use on the login screen. User 
> can use the token to login from any device. Once the login is done, the token 
> that is obtained is in the name of the user who logged in. Note that because 
> of the interactive login involved, this is not very suitable for job 
> scenarios, but can work for ad-hoc scenarios like running “hdfs dfs” commands.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-14468) S3Guard: make short-circuit getFileStatus() configurable

2017-07-06 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076817#comment-16076817
 ] 

Steve Loughran commented on HADOOP-14468:
-

The general code path for file IO is one of two things

sequential from the start (gzip unzip, CSV, text, avro)
{code}
instream= fs.open(path)
instream.read()
{code}

or: seek and read, explicit or in a readFully() call. This is the codepath in: 
.snappy files, examining columnar stored data in ORC, Parquet, ...
{code}
instream= fs.open(path)
instream.seek(somewhere)
instream.read(bytes)
instream.seek(somewhere-else)
...
{code}

Either way, there's usually a read() call very shortly after the open, which is 
when any missing file will surface, so we don't need to overreact —just make 
sure that the error message which surfaces ona 404 on the first open of a file 
is propagated up in a way which is meaningful and consistent with what people 
normally expect. HADOOP-14467 looks at that.

Doing it as a fallback for troubleshooting/monitoring is something  to consider 
though.

> S3Guard: make short-circuit getFileStatus() configurable
> 
>
> Key: HADOOP-14468
> URL: https://issues.apache.org/jira/browse/HADOOP-14468
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Aaron Fabbri
>Assignee: Aaron Fabbri
>Priority: Minor
>
> Currently, when S3Guard is enabled, getFileStatus() will skip S3 if it gets a 
> result from the MetadataStore (e.g. dynamodb) first.
> I would like to add a new parameter 
> {{fs.s3a.metadatastore.getfilestatus.authoritative}} which, when true, keeps 
> the current behavior.  When false, S3AFileSystem will check both S3 and the 
> MetadataStore.
> I'm not sure yet if we want to have this behavior the same for all callers of 
> getFileStatus(), or if we only want to check both S3 and MetadataStore for 
> some internal callers such as open().



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13761) S3Guard: implement retries

2017-07-06 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076805#comment-16076805
 ] 

Steve Loughran commented on HADOOP-13761:
-

we need to implement retry logic in all AWS calls which bypass the xfer 
manager, so that transient failures (503/throttle, connection timeout) can get 
retried. The core code is in the HADOOP-13786 branch; it just needs rollout to 
the existing methods and policies to deal with s3guard failures: when to fail, 
when to retry. And, for DDB: when to fall back to the blobstore, which is a 
different recovery strategy to the rest

> S3Guard: implement retries 
> ---
>
> Key: HADOOP-13761
> URL: https://issues.apache.org/jira/browse/HADOOP-13761
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: HADOOP-13345
>Reporter: Aaron Fabbri
>
> Following the S3AFileSystem integration patch in HADOOP-13651, we need to add 
> retry logic.
> In HADOOP-13651, I added TODO comments in most of the places retry loops are 
> needed, including:
> - open(path).  If MetadataStore reflects recent create/move of file path, but 
> we fail to read it from S3, retry.
> - delete(path).  If deleteObject() on S3 fails, but MetadataStore shows the 
> file exists, retry.
> - rename(src,dest).  If source path is not visible in S3 yet, retry.
> - listFiles(). Skip for now. Not currently implemented in S3Guard. I will 
> create a separate JIRA for this as it will likely require interface changes 
> (i.e. prefix or subtree scan).
> We may miss some cases initially and we should do failure injection testing 
> to make sure we're covered.  Failure injection tests can be a separate JIRA 
> to make this easier to review.
> We also need basic configuration parameters around retry policy.  There 
> should be a way to specify maximum retry duration, as some applications would 
> prefer to receive an error eventually, than waiting indefinitely.  We should 
> also be keeping statistics when inconsistency is detected and we enter a 
> retry loop.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-14576) DynamoDB tables may leave ACTIVE state after initial connection

2017-07-06 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076792#comment-16076792
 ] 

Steve Loughran commented on HADOOP-14576:
-

this parallel rename -it is how hive implements a commit? As if so, if we can 
move it off rename/(copy & delete) as its commit strategy, then that could make 
it go away.

Assuming it is a commit operation, we would presumably like it to complete, 
even if that took 1+ attempt to go through. Which means: something which should 
be retried.

I'm adding retry policy in the HADOOP-13786 commit code (and more fault 
injection into the inconsistent client); we can use its handling and tests as a 
basis for this. That branch/patch handles 503 throttled responses from S3 with 
backoff and retry (and a shorter policy for other failures considered 
recoverable)DDB state changes could be treated as another error to handle 
under the throttle policy.



> DynamoDB tables may leave ACTIVE state after initial connection
> ---
>
> Key: HADOOP-14576
> URL: https://issues.apache.org/jira/browse/HADOOP-14576
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: HADOOP-13345
>Reporter: Sean Mackrory
>
> We currently only anticipate tables not being in the ACTIVE state when first 
> connecting. It is possible for a table to be in the ACTIVE state and move to 
> an UPDATING state during partitioning events. Attempts to read or write 
> during that time will result in an AmazonServerException getting thrown. We 
> should try to handle that better...



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-14577) ITestS3AInconsistency.testGetFileStatus failing in -DS3guard test runs

2017-07-06 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-14577:

Summary: ITestS3AInconsistency.testGetFileStatus failing in -DS3guard test 
runs  (was: ITestS3AInconsistency.testGetFileStatus failing)

> ITestS3AInconsistency.testGetFileStatus failing in -DS3guard test runs
> --
>
> Key: HADOOP-14577
> URL: https://issues.apache.org/jira/browse/HADOOP-14577
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Sean Mackrory
>
> This test is failing for me when run individually or in parallel (with 
> -Ds3guard). Even if I revert back to the commit that introduced it. I thought 
> I had successful test runs on that before and haven't changed anything in my 
> test configuration.
> {code}Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 4.671 
> sec <<< FAILURE! - in org.apache.hadoop.fs.s3a.ITestS3AInconsistency
> testGetFileStatus(org.apache.hadoop.fs.s3a.ITestS3AInconsistency)  Time 
> elapsed: 4.475 sec  <<< FAILURE!
> java.lang.AssertionError: S3Guard failed to list parent of inconsistent child.
> at org.junit.Assert.fail(Assert.java:88)
> at 
> org.apache.hadoop.fs.s3a.ITestS3AInconsistency.testGetFileStatus(ITestS3AInconsistency.java:83){code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-14577) ITestS3AInconsistency.testGetFileStatus failing

2017-07-06 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076688#comment-16076688
 ] 

Steve Loughran commented on HADOOP-14577:
-

I was about to say worksforme, but once I do -Ds3guard it fails for me too
{code}
---
 T E S T S
---
Running org.apache.hadoop.fs.s3a.ITestS3AInconsistency
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 2.856 sec <<< 
FAILURE! - in org.apache.hadoop.fs.s3a.ITestS3AInconsistency
testGetFileStatus(org.apache.hadoop.fs.s3a.ITestS3AInconsistency)  Time 
elapsed: 2.763 sec  <<< FAILURE!
java.lang.AssertionError: S3Guard failed to list parent of inconsistent child.
at org.junit.Assert.fail(Assert.java:88)
at 
org.apache.hadoop.fs.s3a.ITestS3AInconsistency.testGetFileStatus(ITestS3AInconsistency.java:83)


Results :

Failed tests: 
  ITestS3AInconsistency.testGetFileStatus:83->Assert.fail:88 S3Guard failed to 
list parent of inconsistent child.

{code}

> ITestS3AInconsistency.testGetFileStatus failing
> ---
>
> Key: HADOOP-14577
> URL: https://issues.apache.org/jira/browse/HADOOP-14577
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Sean Mackrory
>
> This test is failing for me when run individually or in parallel (with 
> -Ds3guard). Even if I revert back to the commit that introduced it. I thought 
> I had successful test runs on that before and haven't changed anything in my 
> test configuration.
> {code}Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 4.671 
> sec <<< FAILURE! - in org.apache.hadoop.fs.s3a.ITestS3AInconsistency
> testGetFileStatus(org.apache.hadoop.fs.s3a.ITestS3AInconsistency)  Time 
> elapsed: 4.475 sec  <<< FAILURE!
> java.lang.AssertionError: S3Guard failed to list parent of inconsistent child.
> at org.junit.Assert.fail(Assert.java:88)
> at 
> org.apache.hadoop.fs.s3a.ITestS3AInconsistency.testGetFileStatus(ITestS3AInconsistency.java:83){code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-14499) Findbugs warning in LocalMetadataStore

2017-07-06 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-14499:

   Resolution: Fixed
Fix Version/s: HADOOP-13345
   Status: Resolved  (was: Patch Available)

+1
committed, thanks

> Findbugs warning in LocalMetadataStore
> --
>
> Key: HADOOP-14499
> URL: https://issues.apache.org/jira/browse/HADOOP-14499
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
> Fix For: HADOOP-13345
>
> Attachments: HADOOP-14499-HADOOP-13345.001.patch, 
> HADOOP-14499-HADOOP-13345.002.patch, HADOOP-14499-HADOOP-13345.003.patch
>
>
> First saw this raised by Yetus on HADOOP-14433:
> {code}
> Bug type UC_USELESS_OBJECT (click for details)
> In class org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore
> In method org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore.prune(long)
> Value ancestors
> Type java.util.LinkedList
> At LocalMetadataStore.java:[line 300]
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-14623) KafkaSink#init should set acks to 1,not 0

2017-07-06 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076628#comment-16076628
 ] 

Hongyuan Li commented on HADOOP-14623:
--

futuremore, flush method is to confirm that data has been written.

> KafkaSink#init should set acks to 1,not 0
> -
>
> Key: HADOOP-14623
> URL: https://issues.apache.org/jira/browse/HADOOP-14623
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
> Attachments: HADOOP-14623-001.patch
>
>
> {{KafkaSink}}#{{init}}  should set ack to *1* to make sure the message has 
> been written to the broker at least.
> current code list below:
> {code}
>   
> props.put("request.required.acks", "0");
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HADOOP-14475) Metrics of S3A don't print out when enable it in Hadoop metrics property file

2017-07-06 Thread Yonger (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076511#comment-16076511
 ] 

Yonger edited comment on HADOOP-14475 at 7/6/17 1:42 PM:
-

@steve the method you mentioned give an empty url to skip the landsat-pds tests 
is not work, also I upload the gz file into my bucket according to the guide, 
but it failed too.
when giving the empty string, error message:
Tests run: 9, Failures: 0, Errors: 6, Skipped: 0, Time elapsed: 0.325 sec <<< 
FAILURE! - in org.apache.hadoop.fs.s3a.TestS3AAWSCredentialsProvider
testInstantiationChain(org.apache.hadoop.fs.s3a.TestS3AAWSCredentialsProvider)  
Time elapsed: 0.018 sec  <<< ERROR!
java.lang.IllegalArgumentException: Can not create a Path from an empty string
at org.apache.hadoop.fs.Path.checkPathArg(Path.java:163)
at org.apache.hadoop.fs.Path.(Path.java:175)
at 
org.apache.hadoop.fs.s3a.TestS3AAWSCredentialsProvider.testInstantiationChain(TestS3AAWSCredentialsProvider.java:92)

and if i use default value and upload the gz file, which give me a error 
message with code 403.




was (Author: iyonger):
[~stevea] the method you mentioned give an empty url to skip the landsat-pds 
tests is not work, also I upload the gz file into my bucket according to the 
guide, but it failed too.
when giving the empty string, error message:
Tests run: 9, Failures: 0, Errors: 6, Skipped: 0, Time elapsed: 0.325 sec <<< 
FAILURE! - in org.apache.hadoop.fs.s3a.TestS3AAWSCredentialsProvider
testInstantiationChain(org.apache.hadoop.fs.s3a.TestS3AAWSCredentialsProvider)  
Time elapsed: 0.018 sec  <<< ERROR!
java.lang.IllegalArgumentException: Can not create a Path from an empty string
at org.apache.hadoop.fs.Path.checkPathArg(Path.java:163)
at org.apache.hadoop.fs.Path.(Path.java:175)
at 
org.apache.hadoop.fs.s3a.TestS3AAWSCredentialsProvider.testInstantiationChain(TestS3AAWSCredentialsProvider.java:92)

and if i use default value and upload the gz file, which give me a error 
message with code 403.



> Metrics of S3A don't print out  when enable it in Hadoop metrics property file
> --
>
> Key: HADOOP-14475
> URL: https://issues.apache.org/jira/browse/HADOOP-14475
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.8.0
> Environment: uname -a
> Linux client01 4.4.0-74-generic #95-Ubuntu SMP Wed Apr 12 09:50:34 UTC 2017 
> x86_64 x86_64 x86_64 GNU/Linux
>  cat /etc/issue
> Ubuntu 16.04.2 LTS \n \l
>Reporter: Yonger
>Assignee: Yonger
> Attachments: failsafe-report-s3a-it.html, 
> failsafe-report-s3a-scale.html, HADOOP-14475.002.patch, s3a-metrics.patch1, 
> stdout.zip
>
>
> *.sink.file.class=org.apache.hadoop.metrics2.sink.FileSink
> #*.sink.file.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink
> #*.sink.influxdb.url=http:/xx
> #*.sink.influxdb.influxdb_port=8086
> #*.sink.influxdb.database=hadoop
> #*.sink.influxdb.influxdb_username=hadoop
> #*.sink.influxdb.influxdb_password=hadoop
> #*.sink.ingluxdb.cluster=c1
> *.period=10
> #namenode.sink.influxdb.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink
> #S3AFileSystem.sink.influxdb.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink
> S3AFileSystem.sink.file.filename=s3afilesystem-metrics.out
> I can't find the out put file even i run a MR job which should be used s3.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-14475) Metrics of S3A don't print out when enable it in Hadoop metrics property file

2017-07-06 Thread Yonger (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076511#comment-16076511
 ] 

Yonger commented on HADOOP-14475:
-

[~stevea] the method you mentioned give an empty url to skip the landsat-pds 
tests is not work, also I upload the gz file into my bucket according to the 
guide, but it failed too.
when giving the empty string, error message:
Tests run: 9, Failures: 0, Errors: 6, Skipped: 0, Time elapsed: 0.325 sec <<< 
FAILURE! - in org.apache.hadoop.fs.s3a.TestS3AAWSCredentialsProvider
testInstantiationChain(org.apache.hadoop.fs.s3a.TestS3AAWSCredentialsProvider)  
Time elapsed: 0.018 sec  <<< ERROR!
java.lang.IllegalArgumentException: Can not create a Path from an empty string
at org.apache.hadoop.fs.Path.checkPathArg(Path.java:163)
at org.apache.hadoop.fs.Path.(Path.java:175)
at 
org.apache.hadoop.fs.s3a.TestS3AAWSCredentialsProvider.testInstantiationChain(TestS3AAWSCredentialsProvider.java:92)

and if i use default value and upload the gz file, which give me a error 
message with code 403.



> Metrics of S3A don't print out  when enable it in Hadoop metrics property file
> --
>
> Key: HADOOP-14475
> URL: https://issues.apache.org/jira/browse/HADOOP-14475
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.8.0
> Environment: uname -a
> Linux client01 4.4.0-74-generic #95-Ubuntu SMP Wed Apr 12 09:50:34 UTC 2017 
> x86_64 x86_64 x86_64 GNU/Linux
>  cat /etc/issue
> Ubuntu 16.04.2 LTS \n \l
>Reporter: Yonger
>Assignee: Yonger
> Attachments: failsafe-report-s3a-it.html, 
> failsafe-report-s3a-scale.html, HADOOP-14475.002.patch, s3a-metrics.patch1, 
> stdout.zip
>
>
> *.sink.file.class=org.apache.hadoop.metrics2.sink.FileSink
> #*.sink.file.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink
> #*.sink.influxdb.url=http:/xx
> #*.sink.influxdb.influxdb_port=8086
> #*.sink.influxdb.database=hadoop
> #*.sink.influxdb.influxdb_username=hadoop
> #*.sink.influxdb.influxdb_password=hadoop
> #*.sink.ingluxdb.cluster=c1
> *.period=10
> #namenode.sink.influxdb.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink
> #S3AFileSystem.sink.influxdb.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink
> S3AFileSystem.sink.file.filename=s3afilesystem-metrics.out
> I can't find the out put file even i run a MR job which should be used s3.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-14553) Add (parallelized) integration tests to hadoop-azure

2017-07-06 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076504#comment-16076504
 ] 

Steve Loughran commented on HADOOP-14553:
-

{code}
-public class TestNativeAzureFileSystemContractMocked extends
+/**
+ * Mocked testing of FileSystemContractBaseTest.
+ * This isn't an IT, but making it so makes it a lot faster for now.
+ */
+public class ITestNativeAzureFileSystemContractMocked extends
{code}
bq.  why is it faster as ITest?

its  not that the test finishes fast, it's just as something slow, running it 
in parallel meant the test run took less time.

I want to do another iteration of this and 

* rename Test* which requires credentails to being an ITest —but just list them 
in the sequential section
* leave the other tests alone
* change the test profile in the POM to run the normal test profile without 
looking for an auth-keys file

Goal: Jenkins/yetus to run the unit tests; move everything else to integration 
tests sooner rather than later, and so allow for 1+ followup which parallelised 
the remaining tests, or in the case of the big native test suite, split it up.


Regarding commonality between S3A test runner and the new stuff, yes, I did 
copy and past S3ATestUtils in, which you would have noticed. Trouble is: I 
don't know what commonality we really have right now. 

> Add (parallelized) integration tests to hadoop-azure
> 
>
> Key: HADOOP-14553
> URL: https://issues.apache.org/jira/browse/HADOOP-14553
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 2.9.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: HADOOP-14553-001.patch, HADOOP-14553-002.patch, 
> HADOOP-14553-003.patch, HADOOP-14553-004.patch
>
>
> The Azure tests are slow to run as they are serialized, as they are all 
> called Test* there's no clear differentiation from unit tests which Jenkins 
> can run, and integration tests which it can't.
> Move the azure tests {{Test*}} to integration tests {{ITest*}}, parallelize 
> (which includes having separate paths for every test suite). The code in 
> hadoop-aws's POM  show what to do.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-14627) Enable new features of ADLS SDK (MSI, Device Code auth)

2017-07-06 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076496#comment-16076496
 ] 

Steve Loughran commented on HADOOP-14627:
-

* A test would still be good, if just to verify that attempting to use the new 
auth mechanism fails if the configuration is missing any required property. 
* New {{fs.adl.oauth2.msi.TenantGuid}} should be all lower case, for 
consistency with (nearly) everything else
* Is this property a secret which should be stored in hadoop credentials files 
& retrieved with Configuration.getPassword()?

> Enable new features of ADLS SDK (MSI, Device Code auth)
> ---
>
> Key: HADOOP-14627
> URL: https://issues.apache.org/jira/browse/HADOOP-14627
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/adl
> Environment: MSI Change applies only to Hadoop running in an Azure VM
>Reporter: Atul Sikaria
>Assignee: Atul Sikaria
> Attachments: HADOOP-14627-001.patch
>
>
> This change is to upgrade the Hadoop ADLS connector to enable new auth 
> features exposed by the ADLS Java SDK.
> Specifically:
> MSI Tokens: MSI (Managed Service Identity) is a way to provide an identity to 
> an Azure Service. In the case of VMs, they can be used to give an identity to 
> a VM deployment. This simplifies managing Service Principals, since the creds 
> don’t have to be managed in core-site files anymore. The way this works is 
> that during VM deployment, the ARM (Azure Resource Manager) template needs to 
> be modified to enable MSI. Once deployed, the MSI extension runs a service on 
> the VM that exposes a token endpoint to http://localhost at a port specified 
> in the template. The SDK has a new TokenProvider to fetch the token from this 
> local endpoint. This change would expose that TokenProvider as an auth option.
> DeviceCode auth: This enables a token to be obtained from an interactive 
> login. The user is given a URL and a token to use on the login screen. User 
> can use the token to login from any device. Once the login is done, the token 
> that is obtained is in the name of the user who logged in. Note that because 
> of the interactive login involved, this is not very suitable for job 
> scenarios, but can work for ad-hoc scenarios like running “hdfs dfs” commands.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Resolved] (HADOOP-8740) Build target to generate findbugs html output

2017-07-06 Thread Andras Bokor (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-8740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andras Bokor resolved HADOOP-8740.
--
Resolution: Invalid

> Build target to generate findbugs html output
> -
>
> Key: HADOOP-8740
> URL: https://issues.apache.org/jira/browse/HADOOP-8740
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build
>Reporter: Eli Collins
>Assignee: Andras Bokor
>
> It would be useful if there was a build target or flag to generate findbugs 
> output. It would depend on {{mvn compile findbugs:findbugs}} and run 
> {{$FINDBUGS_HOME/bin/convertXmlToText -html ../path/to/findbugsXml.xml 
> findbugs.html}} to generate findbugs.html in the target directory.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13414) Hide Jetty Server version header in HTTP responses

2017-07-06 Thread Surendra Singh Lilhore (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076160#comment-16076160
 ] 

Surendra Singh Lilhore commented on HADOOP-13414:
-

Thanks [~vinayrpet] for review and commit. 

> Hide Jetty Server version header in HTTP responses
> --
>
> Key: HADOOP-13414
> URL: https://issues.apache.org/jira/browse/HADOOP-13414
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: security
>Reporter: Vinayakumar B
>Assignee: Surendra Singh Lilhore
> Fix For: 2.9.0, 3.0.0-alpha4, 2.8.2
>
> Attachments: Aftrerfix.png, BeforeFix.png, HADOOP-13414-001.patch, 
> HADOOP-13414-002.patch, HADOOP-13414-branch-2.patch
>
>
> Hide Jetty Server version in HTTP Response header. Some security analyzers 
> would think this as an issue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-14623) KafkaSink#init should set acks to 1,not 0

2017-07-06 Thread Hongyuan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076154#comment-16076154
 ] 

Hongyuan Li commented on HADOOP-14623:
--

i don't think so, setting it to 1 does not means that it will block.However, i 
think that Ganglia knows the frquency of data lossed, but kafka does not. What 
you have said under estimate kafka.Kafka has more power.Compared to complete 
sync of setting acks to -1, setting acks to 1 is a better choice.

> KafkaSink#init should set acks to 1,not 0
> -
>
> Key: HADOOP-14623
> URL: https://issues.apache.org/jira/browse/HADOOP-14623
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, tools
>Affects Versions: 3.0.0-alpha3
>Reporter: Hongyuan Li
>Assignee: Hongyuan Li
> Attachments: HADOOP-14623-001.patch
>
>
> {{KafkaSink}}#{{init}}  should set ack to *1* to make sure the message has 
> been written to the broker at least.
> current code list below:
> {code}
>   
> props.put("request.required.acks", "0");
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-14624) Add GenericTestUtils.DelayAnswer that accept slf4j logger API

2017-07-06 Thread Wenxin He (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16075984#comment-16075984
 ] 

Wenxin He commented on HADOOP-14624:


17 new warnings in javac are caused by new deprecated method 
{{DelayAnswer(Log)}}.

> Add GenericTestUtils.DelayAnswer that accept slf4j logger API
> -
>
> Key: HADOOP-14624
> URL: https://issues.apache.org/jira/browse/HADOOP-14624
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Wenxin He
>Assignee: Wenxin He
> Attachments: HADOOP-14624.001.patch, HADOOP-14624.002.patch
>
>
> Split from HADOOP-14539.
> Now GenericTestUtils.DelayAnswer only accepts commons-logging logger API. Now 
> we are migrating the APIs to slf4j, slf4j logger API should be accepted as 
> well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

37 matches

Mail list logo