[jira] [Comment Edited] (HADOOP-14444) New implementation of ftp and sftp filesystems
[ https://issues.apache.org/jira/browse/HADOOP-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16077520#comment-16077520 ] Hongyuan Li edited comment on HADOOP-1 at 7/7/17 3:24 AM: -- socket is complex, i don't like to open a new socket just to seek. Jsch and commons-net has plenty of examples, so if you want to make full use of it, you should deep into their implements.Also, commons-net's setTimeOut like method may stuck in some situations when the network environment is very poor. was (Author: hongyuan li): socket is complex, i don't like to open a new socket just to seek. > New implementation of ftp and sftp filesystems > -- > > Key: HADOOP-1 > URL: https://issues.apache.org/jira/browse/HADOOP-1 > Project: Hadoop Common > Issue Type: New Feature > Components: fs >Affects Versions: 2.8.0 >Reporter: Lukas Waldmann >Assignee: Lukas Waldmann > Attachments: HADOOP-1.2.patch, HADOOP-1.3.patch, > HADOOP-1.4.patch, HADOOP-1.5.patch, HADOOP-1.patch > > > Current implementation of FTP and SFTP filesystems have severe limitations > and performance issues when dealing with high number of files. Mine patch > solve those issues and integrate both filesystems such a way that most of the > core functionality is common for both and therefore simplifying the > maintainability. > The core features: > * Support for HTTP/SOCKS proxies > * Support for passive FTP > * Support of connection pooling - new connection is not created for every > single command but reused from the pool. > For huge number of files it shows order of magnitude performance improvement > over not pooled connections. > * Caching of directory trees. For ftp you always need to list whole directory > whenever you ask information about particular file. > Again for huge number of files it shows order of magnitude performance > improvement over not cached connections. > * Support of keep alive (NOOP) messages to avoid connection drops > * Support for Unix style or regexp wildcard glob - useful for listing a > particular files across whole directory tree > * Support for reestablishing broken ftp data transfers - can happen > surprisingly often -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14444) New implementation of ftp and sftp filesystems
[ https://issues.apache.org/jira/browse/HADOOP-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16077520#comment-16077520 ] Hongyuan Li commented on HADOOP-1: -- socket is complex, i don't like to open a new socket just to seek. > New implementation of ftp and sftp filesystems > -- > > Key: HADOOP-1 > URL: https://issues.apache.org/jira/browse/HADOOP-1 > Project: Hadoop Common > Issue Type: New Feature > Components: fs >Affects Versions: 2.8.0 >Reporter: Lukas Waldmann >Assignee: Lukas Waldmann > Attachments: HADOOP-1.2.patch, HADOOP-1.3.patch, > HADOOP-1.4.patch, HADOOP-1.5.patch, HADOOP-1.patch > > > Current implementation of FTP and SFTP filesystems have severe limitations > and performance issues when dealing with high number of files. Mine patch > solve those issues and integrate both filesystems such a way that most of the > core functionality is common for both and therefore simplifying the > maintainability. > The core features: > * Support for HTTP/SOCKS proxies > * Support for passive FTP > * Support of connection pooling - new connection is not created for every > single command but reused from the pool. > For huge number of files it shows order of magnitude performance improvement > over not pooled connections. > * Caching of directory trees. For ftp you always need to list whole directory > whenever you ask information about particular file. > Again for huge number of files it shows order of magnitude performance > improvement over not cached connections. > * Support of keep alive (NOOP) messages to avoid connection drops > * Support for Unix style or regexp wildcard glob - useful for listing a > particular files across whole directory tree > * Support for reestablishing broken ftp data transfers - can happen > surprisingly often -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-14623) KafkaSink#init should set acks to 1,not 0
[ https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076628#comment-16076628 ] Hongyuan Li edited comment on HADOOP-14623 at 7/7/17 3:19 AM: -- futuremore, flush method is to confirm that data has been written. *Update/Crorrection* sorry, it is the {{putMetrics}} method. in {{KafkaSink}}#{{putMetrics}} , code lists below, which makes me have a different opinion: {code} …… Future future = producer.send(data); jsonLines.setLength(0); try { future.get(); } catch (InterruptedException e) { throw new MetricsException("Error sending data", e); } catch (ExecutionException e) { throw new MetricsException("Error sending data", e); } …… {code} was (Author: hongyuan li): futuremore, flush method is to confirm that data has been written. > KafkaSink#init should set acks to 1,not 0 > - > > Key: HADOOP-14623 > URL: https://issues.apache.org/jira/browse/HADOOP-14623 > Project: Hadoop Common > Issue Type: Bug > Components: common, tools >Affects Versions: 3.0.0-alpha3 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > Attachments: HADOOP-14623-001.patch > > > {{KafkaSink}}#{{init}} should set ack to *1* to make sure the message has > been written to the broker at least. > current code list below: > {code} > > props.put("request.required.acks", "0"); > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14548) S3Guard: issues running parallel tests w/ S3N
[ https://issues.apache.org/jira/browse/HADOOP-14548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16077510#comment-16077510 ] Hadoop QA commented on HADOOP-14548: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s{color} | {color:blue} Docker mode activated. {color} | | {color:green} 1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green} 1{color} | {color:green} mvninstall {color} | {color:green} 18m 35s{color} | {color:green} HADOOP-13345 passed {color} | | {color:green} 1{color} | {color:green} mvnsite {color} | {color:green} 0m 25s{color} | {color:green} HADOOP-13345 passed {color} | | {color:green} 1{color} | {color:green} mvnsite {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 4 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green} 1{color} | {color:green} asflicense {color} | {color:green} 0m 18s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 20m 9s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:14b5c93 | | JIRA Issue | HADOOP-14548 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12876017/HADOOP-14548-HADOOP-13345.001.patch | | Optional Tests | asflicense mvnsite | | uname | Linux c2fd40511cca 3.13.0-117-generic #164-Ubuntu SMP Fri Apr 7 11:05:26 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | HADOOP-13345 / 309b8c0 | | whitespace | https://builds.apache.org/job/PreCommit-HADOOP-Build/12727/artifact/patchprocess/whitespace-eol.txt | | modules | C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/12727/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > S3Guard: issues running parallel tests w/ S3N > -- > > Key: HADOOP-14548 > URL: https://issues.apache.org/jira/browse/HADOOP-14548 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Aaron Fabbri >Assignee: Aaron Fabbri >Priority: Minor > Fix For: HADOOP-13345 > > Attachments: HADOOP-14548-HADOOP-13345.001.patch > > > In general, running S3Guard and parallel tests with S3A and S3N contract > tests enabled is asking for trouble: S3Guard code assumes there are not > other non-S3Guard clients modifying the bucket. > Goal of this JIRA is to: > - Discuss current failures running `mvn verify -Dparallel-tests -Ds3guard > -Ddynamo` with S3A and S3N contract tests configured. > - Identify any failures here that are worth looking into. > - Document (or enforce) that people should not do this, or should expect > failures if they do. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14457) create() does not notify metadataStore of parent directories or ensure they're not existing files
[ https://issues.apache.org/jira/browse/HADOOP-14457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16077503#comment-16077503 ] Hadoop QA commented on HADOOP-14457: | (/) *{color:green} 1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 9s{color} | {color:blue} Docker mode activated. {color} | | {color:green} 1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green} 1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green} 1{color} | {color:green} mvninstall {color} | {color:green} 17m 30s{color} | {color:green} HADOOP-13345 passed {color} | | {color:green} 1{color} | {color:green} compile {color} | {color:green} 0m 22s{color} | {color:green} HADOOP-13345 passed {color} | | {color:green} 1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s{color} | {color:green} HADOOP-13345 passed {color} | | {color:green} 1{color} | {color:green} mvnsite {color} | {color:green} 0m 25s{color} | {color:green} HADOOP-13345 passed {color} | | {color:green} 1{color} | {color:green} findbugs {color} | {color:green} 0m 36s{color} | {color:green} HADOOP-13345 passed {color} | | {color:green} 1{color} | {color:green} javadoc {color} | {color:green} 0m 16s{color} | {color:green} HADOOP-13345 passed {color} | | {color:green} 1{color} | {color:green} mvninstall {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | | {color:green} 1{color} | {color:green} compile {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:green} 1{color} | {color:green} javac {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 14s{color} | {color:orange} hadoop-tools/hadoop-aws: The patch generated 19 new 20 unchanged - 0 fixed = 39 total (was 20) {color} | | {color:green} 1{color} | {color:green} mvnsite {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green} 1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green} 1{color} | {color:green} findbugs {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green} 1{color} | {color:green} javadoc {color} | {color:green} 0m 14s{color} | {color:green} the patch passed {color} | | {color:green} 1{color} | {color:green} unit {color} | {color:green} 0m 45s{color} | {color:green} hadoop-aws in the patch passed. {color} | | {color:green} 1{color} | {color:green} asflicense {color} | {color:green} 0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 27m 19s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:14b5c93 | | JIRA Issue | HADOOP-14457 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12876015/HADOOP-14457-HADOOP-13345.010.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 539573c05fba 3.13.0-117-generic #164-Ubuntu SMP Fri Apr 7 11:05:26 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | HADOOP-13345 / 309b8c0 | | Default Java | 1.8.0_131 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-HADOOP-Build/12726/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-aws.txt | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/12726/testReport/ | | modules | C: hadoop-tools/hadoop-aws U: hadoop-tools/hadoop-aws | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/12726/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > create() does not notify metadataStore of parent directories or ensure > they're not existing files > - > > Key: HADOOP-14457 > URL: https://issues.apache.org/jira/browse/HADOOP-14457 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Sean Mackrory >Assignee: Sean Mackrory > Attachments: HADOOP-14457-HADOOP-13345.001.patch, > HADOOP-14457-HADOOP-13345.002.patch,
[jira] [Commented] (HADOOP-14468) S3Guard: make short-circuit getFileStatus() configurable
[ https://issues.apache.org/jira/browse/HADOOP-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16077499#comment-16077499 ] Aaron Fabbri commented on HADOOP-14468: --- {quote} That said, looking at all the places we call getFileStatus, it'd be a useful little sanity check all round. {quote} Yeah. It would be interesting to collect statistics on long-running clusters on how often inconsistency happens. Sounds like we're ok with the behavior of failing after open(). Your example of deleted file or inconsistency causing similar behavior is a good point. I'll leave this as minor priority for now and focus on HADOOP-14467 first. > S3Guard: make short-circuit getFileStatus() configurable > > > Key: HADOOP-14468 > URL: https://issues.apache.org/jira/browse/HADOOP-14468 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Aaron Fabbri >Assignee: Aaron Fabbri >Priority: Minor > > Currently, when S3Guard is enabled, getFileStatus() will skip S3 if it gets a > result from the MetadataStore (e.g. dynamodb) first. > I would like to add a new parameter > {{fs.s3a.metadatastore.getfilestatus.authoritative}} which, when true, keeps > the current behavior. When false, S3AFileSystem will check both S3 and the > MetadataStore. > I'm not sure yet if we want to have this behavior the same for all callers of > getFileStatus(), or if we only want to check both S3 and MetadataStore for > some internal callers such as open(). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14548) S3Guard: issues running parallel tests w/ S3N
[ https://issues.apache.org/jira/browse/HADOOP-14548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Fabbri updated HADOOP-14548: -- Fix Version/s: HADOOP-13345 Status: Patch Available (was: Open) > S3Guard: issues running parallel tests w/ S3N > -- > > Key: HADOOP-14548 > URL: https://issues.apache.org/jira/browse/HADOOP-14548 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Aaron Fabbri >Assignee: Aaron Fabbri >Priority: Minor > Fix For: HADOOP-13345 > > Attachments: HADOOP-14548-HADOOP-13345.001.patch > > > In general, running S3Guard and parallel tests with S3A and S3N contract > tests enabled is asking for trouble: S3Guard code assumes there are not > other non-S3Guard clients modifying the bucket. > Goal of this JIRA is to: > - Discuss current failures running `mvn verify -Dparallel-tests -Ds3guard > -Ddynamo` with S3A and S3N contract tests configured. > - Identify any failures here that are worth looking into. > - Document (or enforce) that people should not do this, or should expect > failures if they do. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14548) S3Guard: issues running parallel tests w/ S3N
[ https://issues.apache.org/jira/browse/HADOOP-14548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Fabbri updated HADOOP-14548: -- Attachment: HADOOP-14548-HADOOP-13345.001.patch Attaching documentation-only patch (v1). It looks like the main work here would be to make the S3N tests less flaky. I view this as a low priority, however, so am just adding guidance to the S3A testing and S3Guard docs. > S3Guard: issues running parallel tests w/ S3N > -- > > Key: HADOOP-14548 > URL: https://issues.apache.org/jira/browse/HADOOP-14548 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Aaron Fabbri >Assignee: Aaron Fabbri >Priority: Minor > Attachments: HADOOP-14548-HADOOP-13345.001.patch > > > In general, running S3Guard and parallel tests with S3A and S3N contract > tests enabled is asking for trouble: S3Guard code assumes there are not > other non-S3Guard clients modifying the bucket. > Goal of this JIRA is to: > - Discuss current failures running `mvn verify -Dparallel-tests -Ds3guard > -Ddynamo` with S3A and S3N contract tests configured. > - Identify any failures here that are worth looking into. > - Document (or enforce) that people should not do this, or should expect > failures if they do. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14457) create() does not notify metadataStore of parent directories or ensure they're not existing files
[ https://issues.apache.org/jira/browse/HADOOP-14457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Mackrory updated HADOOP-14457: --- Attachment: HADOOP-14457-HADOOP-13345.010.patch So attaching a patch that follows the design discussed by [~fabbri] and I in our last round of comments. This still requires that clients pass in the full list of directories that need to be created, and still has DynamoDB enforce that itself by ensuring that that set also includes the parents of everything in the set. Note that it does this in-memory, without regard for what already may exist in the database. It then writes it all in one batch. This is safe for now, since directories either exist or they don't and there isn't other metadata we store with them, but if we do start storing metadata (like authoritative bits) on each directory, I see no other option but to go back to doing it 1 level at a time, probably with a round-trip each (unless we can somehow request the entire lineage at once, but I don't see how we could do that either). That would still line up with [~ste...@apache.org]'s point about being a single round-trip when the parent does exist, and only becoming excessively slow when it doesn't. In addition to my usual S3N failures, I started seeing the following failures intermittently today - I would see these perhaps 1-in-4 test runs. So far only with DynamoDB (with and without authoritative mode), but with it happening rarely I can't be sure it's DynamoDB-specific. I reverted my change but continue to see the failures: {code} testConsistentRenameAfterDelete(org.apache.hadoop.fs.s3a.ITestS3GuardListConsistency) Time elapsed: 11.925 sec <<< FAILURE! java.lang.AssertionError: Recently renamed dir should not be visible at org.junit.Assert.fail(Assert.java:88) at org.apache.hadoop.fs.s3a.ITestS3GuardListConsistency.testConsistentRenameAfterDelete(ITestS3GuardListConsistency.java:237) testConsistentListAfterDelete(org.apache.hadoop.fs.s3a.ITestS3GuardListConsistency) Time elapsed: 7.383 sec <<< FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertFalse(Assert.java:64) at org.junit.Assert.assertFalse(Assert.java:74) at org.apache.hadoop.fs.s3a.ITestS3GuardListConsistency.testConsistentListAfterDelete(ITestS3GuardListConsistency.java:191) {code} > create() does not notify metadataStore of parent directories or ensure > they're not existing files > - > > Key: HADOOP-14457 > URL: https://issues.apache.org/jira/browse/HADOOP-14457 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Sean Mackrory >Assignee: Sean Mackrory > Attachments: HADOOP-14457-HADOOP-13345.001.patch, > HADOOP-14457-HADOOP-13345.002.patch, HADOOP-14457-HADOOP-13345.003.patch, > HADOOP-14457-HADOOP-13345.004.patch, HADOOP-14457-HADOOP-13345.005.patch, > HADOOP-14457-HADOOP-13345.006.patch, HADOOP-14457-HADOOP-13345.007.patch, > HADOOP-14457-HADOOP-13345.008.patch, HADOOP-14457-HADOOP-13345.009.patch, > HADOOP-14457-HADOOP-13345.010.patch > > > Not a great test yet, but it at least reliably demonstrates the issue. > LocalMetadataStore will sometimes erroneously report that a directory is > empty with isAuthoritative = true when it *definitely* has children the > metadatastore should know about. It doesn't appear to happen if the children > are just directory. The fact that it's returning an empty listing is > concerning, but the fact that it says it's authoritative *might* be a second > bug. > {code} > diff --git > a/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java > > b/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java > index 78b3970..1821d19 100644 > --- > a/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java > +++ > b/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java > @@ -965,7 +965,7 @@ public boolean hasMetadataStore() { >} > >@VisibleForTesting > - MetadataStore getMetadataStore() { > + public MetadataStore getMetadataStore() { > return metadataStore; >} > > diff --git > a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractRename.java > > b/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractRename.java > index 4339649..881bdc9 100644 > --- > a/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractRename.java > +++ >
[jira] [Commented] (HADOOP-14620) S3A authentication failure for regions other than us-east-1
[ https://issues.apache.org/jira/browse/HADOOP-14620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16077403#comment-16077403 ] Aaron Fabbri commented on HADOOP-14620: --- Looks like you are on a fairly old version. Can you retest with hadoop trunk? I'm guessing it will work. > S3A authentication failure for regions other than us-east-1 > --- > > Key: HADOOP-14620 > URL: https://issues.apache.org/jira/browse/HADOOP-14620 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 2.8.0, 2.7.3 >Reporter: Ilya Fourmanov > Attachments: s3-403.txt > > > hadoop fs s3a:// operations fail authentication for s3 buckets hosted in > regions other than default us-east-1 > Steps to reproduce: > # create s3 bucket in eu-west-1 > # Using IAM instance profile or fs.s3a.access.key/fs.s3a.secret.key run > following command: > {code} > hadoop --loglevel DEBUG -D fs.s3a.endpoint=s3.eu-west-1.amazonaws.com -ls > s3a://your-eu-west-1-hosted-bucket/ > {code} > Expected behaviour: > You will see listing of the bucket > Actual behaviour: > You will get 403 Authentication Denied response for AWS S3. > Reason is mismatch in string to sign as defined in > http://docs.aws.amazon.com/AmazonS3/latest/dev/RESTAuthentication.html > provided by hadoop and expected by AWS. > If you use https://aws.amazon.com/code/199 to analyse StringToSignBytes > returned by AWS, you will see that AWS expects CanonicalizedResource to be in > form > /your-eu-west-1-hosted-bucket{color:red}.s3.eu-west-1.amazonaws.com{color}/. > Hadoop provides it as /your-eu-west-1-hosted-bucket/ > Note that AWS documentation doesn't explicitly state that endpoint or full > dns address should be appended to CanonicalizedResource however practice > shows it is actually required. > I've also submitted this to AWS for them to correct behaviour or > documentation. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14553) Add (parallelized) integration tests to hadoop-azure
[ https://issues.apache.org/jira/browse/HADOOP-14553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16077314#comment-16077314 ] Hadoop QA commented on HADOOP-14553: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 8s{color} | {color:blue} Docker mode activated. {color} | | {color:green} 1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green} 1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 73 new or modified test files. {color} | | {color:green} 1{color} | {color:green} mvninstall {color} | {color:green} 13m 13s{color} | {color:green} trunk passed {color} | | {color:green} 1{color} | {color:green} compile {color} | {color:green} 0m 18s{color} | {color:green} trunk passed {color} | | {color:green} 1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s{color} | {color:green} trunk passed {color} | | {color:green} 1{color} | {color:green} mvnsite {color} | {color:green} 0m 20s{color} | {color:green} trunk passed {color} | | {color:green} 1{color} | {color:green} findbugs {color} | {color:green} 0m 27s{color} | {color:green} trunk passed {color} | | {color:green} 1{color} | {color:green} javadoc {color} | {color:green} 0m 13s{color} | {color:green} trunk passed {color} | | {color:green} 1{color} | {color:green} mvninstall {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | | {color:green} 1{color} | {color:green} compile {color} | {color:green} 0m 16s{color} | {color:green} the patch passed {color} | | {color:green} 1{color} | {color:green} javac {color} | {color:green} 0m 16s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 15s{color} | {color:orange} hadoop-tools/hadoop-azure: The patch generated 123 new 193 unchanged - 114 fixed = 316 total (was 307) {color} | | {color:green} 1{color} | {color:green} mvnsite {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 16 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 1s{color} | {color:red} The patch 1 line(s) with tabs. {color} | | {color:green} 1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green} 1{color} | {color:green} findbugs {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:green} 1{color} | {color:green} javadoc {color} | {color:green} 0m 11s{color} | {color:green} the patch passed {color} | | {color:green} 1{color} | {color:green} unit {color} | {color:green} 0m 42s{color} | {color:green} hadoop-azure in the patch passed. {color} | | {color:green} 1{color} | {color:green} asflicense {color} | {color:green} 0m 16s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 19m 7s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:14b5c93 | | JIRA Issue | HADOOP-14553 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12875991/HADOOP-14553-005.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit xml findbugs checkstyle | | uname | Linux f13b37e8bf35 3.13.0-117-generic #164-Ubuntu SMP Fri Apr 7 11:05:26 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 7576a68 | | Default Java | 1.8.0_131 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-HADOOP-Build/12725/artifact/patchprocess/diff-checkstyle-hadoop-tools_hadoop-azure.txt | | whitespace | https://builds.apache.org/job/PreCommit-HADOOP-Build/12725/artifact/patchprocess/whitespace-eol.txt | | whitespace | https://builds.apache.org/job/PreCommit-HADOOP-Build/12725/artifact/patchprocess/whitespace-tabs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/12725/testReport/ | | modules | C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/12725/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Add (parallelized) integration tests to
[jira] [Updated] (HADOOP-14553) Add (parallelized) integration tests to hadoop-azure
[ https://issues.apache.org/jira/browse/HADOOP-14553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-14553: Status: Patch Available (was: Open) > Add (parallelized) integration tests to hadoop-azure > > > Key: HADOOP-14553 > URL: https://issues.apache.org/jira/browse/HADOOP-14553 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 2.9.0 >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: HADOOP-14553-001.patch, HADOOP-14553-002.patch, > HADOOP-14553-003.patch, HADOOP-14553-004.patch, HADOOP-14553-005.patch > > > The Azure tests are slow to run as they are serialized, as they are all > called Test* there's no clear differentiation from unit tests which Jenkins > can run, and integration tests which it can't. > Move the azure tests {{Test*}} to integration tests {{ITest*}}, parallelize > (which includes having separate paths for every test suite). The code in > hadoop-aws's POM show what to do. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14553) Add (parallelized) integration tests to hadoop-azure
[ https://issues.apache.org/jira/browse/HADOOP-14553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-14553: Attachment: HADOOP-14553-005.patch Patch 005 Move a lot more to the parallel phase of the tests. Some of these are explicitly using root relative paths, as they need it (no parent directory probes/connect). This will work provided all the tests are set up to use names guaranteed to be unique across all test suites/methods. Testing: each moved test worked well alone; a full bulk test failed (lots of timeouts), when run over slow network.& 8 parallel tests I think I'll bump up the default timeout from 30s to something bigger, even though its a sign that there's not enough B/W for 8 tests together. Making the timeout 600s would be less brittle over slow connections. > Add (parallelized) integration tests to hadoop-azure > > > Key: HADOOP-14553 > URL: https://issues.apache.org/jira/browse/HADOOP-14553 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 2.9.0 >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: HADOOP-14553-001.patch, HADOOP-14553-002.patch, > HADOOP-14553-003.patch, HADOOP-14553-004.patch, HADOOP-14553-005.patch > > > The Azure tests are slow to run as they are serialized, as they are all > called Test* there's no clear differentiation from unit tests which Jenkins > can run, and integration tests which it can't. > Move the azure tests {{Test*}} to integration tests {{ITest*}}, parallelize > (which includes having separate paths for every test suite). The code in > hadoop-aws's POM show what to do. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14553) Add (parallelized) integration tests to hadoop-azure
[ https://issues.apache.org/jira/browse/HADOOP-14553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-14553: Status: Open (was: Patch Available) > Add (parallelized) integration tests to hadoop-azure > > > Key: HADOOP-14553 > URL: https://issues.apache.org/jira/browse/HADOOP-14553 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 2.9.0 >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: HADOOP-14553-001.patch, HADOOP-14553-002.patch, > HADOOP-14553-003.patch, HADOOP-14553-004.patch > > > The Azure tests are slow to run as they are serialized, as they are all > called Test* there's no clear differentiation from unit tests which Jenkins > can run, and integration tests which it can't. > Move the azure tests {{Test*}} to integration tests {{ITest*}}, parallelize > (which includes having separate paths for every test suite). The code in > hadoop-aws's POM show what to do. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13435) Add thread local mechanism for aggregating file system storage stats
[ https://issues.apache.org/jira/browse/HADOOP-13435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16077199#comment-16077199 ] Hadoop QA commented on HADOOP-13435: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 8s{color} | {color:blue} Docker mode activated. {color} | | {color:green} 1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green} 1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green} 1{color} | {color:green} mvninstall {color} | {color:green} 13m 11s{color} | {color:green} trunk passed {color} | | {color:green} 1{color} | {color:green} compile {color} | {color:green} 13m 28s{color} | {color:green} trunk passed {color} | | {color:green} 1{color} | {color:green} checkstyle {color} | {color:green} 1m 53s{color} | {color:green} trunk passed {color} | | {color:green} 1{color} | {color:green} mvnsite {color} | {color:green} 2m 5s{color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 48s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs in trunk has 10 extant Findbugs warnings. {color} | | {color:green} 1{color} | {color:green} javadoc {color} | {color:green} 1m 39s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green} 1{color} | {color:green} mvninstall {color} | {color:green} 1m 30s{color} | {color:green} the patch passed {color} | | {color:green} 1{color} | {color:green} compile {color} | {color:green} 10m 10s{color} | {color:green} the patch passed {color} | | {color:green} 1{color} | {color:green} javac {color} | {color:green} 10m 10s{color} | {color:green} the patch passed {color} | | {color:green} 1{color} | {color:green} checkstyle {color} | {color:green} 1m 56s{color} | {color:green} the patch passed {color} | | {color:green} 1{color} | {color:green} mvnsite {color} | {color:green} 2m 7s{color} | {color:green} the patch passed {color} | | {color:green} 1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green} 1{color} | {color:green} findbugs {color} | {color:green} 3m 31s{color} | {color:green} the patch passed {color} | | {color:green} 1{color} | {color:green} javadoc {color} | {color:green} 1m 45s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 5s{color} | {color:red} hadoop-common in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 68m 10s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green} 1{color} | {color:green} asflicense {color} | {color:green} 0m 40s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}155m 46s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.security.TestKDiag | | | hadoop.hdfs.tools.TestDFSZKFailoverController | | | hadoop.hdfs.TestDFSStripedInputStreamWithRandomECPolicy | | Timed out junit tests | org.apache.hadoop.hdfs.server.blockmanagement.TestBlockStatsMXBean | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:14b5c93 | | JIRA Issue | HADOOP-13435 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12875965/HADOOP-13435.004.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 00099b530712 3.13.0-117-generic #164-Ubuntu SMP Fri Apr 7 11:05:26 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 7576a68 | | Default Java | 1.8.0_131 | | findbugs | v3.1.0-RC1 | | findbugs | https://builds.apache.org/job/PreCommit-HADOOP-Build/12724/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html | | unit | https://builds.apache.org/job/PreCommit-HADOOP-Build/12724/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common.txt | | unit | https://builds.apache.org/job/PreCommit-HADOOP-Build/12724/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results |
[jira] [Updated] (HADOOP-12802) local FileContext does not rename .crc file
[ https://issues.apache.org/jira/browse/HADOOP-12802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andras Bokor updated HADOOP-12802: -- Target Version/s: 3.0.0-alpha4 > local FileContext does not rename .crc file > --- > > Key: HADOOP-12802 > URL: https://issues.apache.org/jira/browse/HADOOP-12802 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 2.7.2, 3.0.0-alpha1 >Reporter: Youngjoon Kim >Assignee: Andras Bokor > > After run the following code, "old" file is renamed to "new", but ".old.crc" > is not renamed to ".new.crc" > {code} > Path oldPath = new Path("/tmp/old"); > Path newPath = new Path("/tmp/new"); > Configuration conf = new Configuration(); > FileContext fc = FileContext.getLocalFSFileContext(conf); > FSDataOutputStream out = fc.create(oldPath, EnumSet.of(CreateFlag.CREATE)); > out.close(); > fc.rename(oldPath, newPath); > {code} > On the other hand, local FileSystem successfully renames .crc file. > {code} > Path oldPath = new Path("/tmp/old"); > Path newPath = new Path("/tmp/new"); > Configuration conf = new Configuration(); > FileSystem fs = FileSystem.getLocal(conf); > FSDataOutputStream out = fs.create(oldPath); > out.close(); > fs.rename(oldPath, newPath); > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14548) S3Guard: issues running parallel tests w/ S3N
[ https://issues.apache.org/jira/browse/HADOOP-14548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16077153#comment-16077153 ] Steve Loughran commented on HADOOP-14548: - S3n tests will all skip if you don't provide the binding endpoint, new feature with the move to JUnit4 everywhere > S3Guard: issues running parallel tests w/ S3N > -- > > Key: HADOOP-14548 > URL: https://issues.apache.org/jira/browse/HADOOP-14548 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Aaron Fabbri >Assignee: Aaron Fabbri >Priority: Minor > > In general, running S3Guard and parallel tests with S3A and S3N contract > tests enabled is asking for trouble: S3Guard code assumes there are not > other non-S3Guard clients modifying the bucket. > Goal of this JIRA is to: > - Discuss current failures running `mvn verify -Dparallel-tests -Ds3guard > -Ddynamo` with S3A and S3N contract tests configured. > - Identify any failures here that are worth looking into. > - Document (or enforce) that people should not do this, or should expect > failures if they do. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13435) Add thread local mechanism for aggregating file system storage stats
[ https://issues.apache.org/jira/browse/HADOOP-13435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HADOOP-13435: --- Attachment: HADOOP-13435.004.patch > Add thread local mechanism for aggregating file system storage stats > > > Key: HADOOP-13435 > URL: https://issues.apache.org/jira/browse/HADOOP-13435 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Attachments: HADOOP-13435.000.patch, HADOOP-13435.001.patch, > HADOOP-13435.002.patch, HADOOP-13435.003.patch, HADOOP-13435.004.patch > > > As discussed in [HADOOP-13032], this is to add thread local mechanism for > aggregating file system storage stats. This class will also be used in > [HADOOP-13031], which is to separate the distance-oriented rack-aware read > bytes logic from {{FileSystemStorageStatistics}} to new > DFSRackAwareStorageStatistics as it's DFS-specific. After this patch, the > {{FileSystemStorageStatistics}} can live without the to-be-removed > {{FileSystem$Statistics}} implementation. > A unit test should also be added. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14627) Enable new features of ADLS SDK (MSI, Device Code auth)
[ https://issues.apache.org/jira/browse/HADOOP-14627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076950#comment-16076950 ] John Zhuge commented on HADOOP-14627: - [~ASikaria] It does not hurt to call ```getPaswordString``` for non-secret properties. It is nice to group all related properties together in one place, the cred store. Of course, the downside is that non-secret properties are placed into cred store which can be confusing. [~steve_l] What do you think? > Enable new features of ADLS SDK (MSI, Device Code auth) > --- > > Key: HADOOP-14627 > URL: https://issues.apache.org/jira/browse/HADOOP-14627 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/adl > Environment: MSI Change applies only to Hadoop running in an Azure VM >Reporter: Atul Sikaria >Assignee: Atul Sikaria > Attachments: HADOOP-14627-001.patch > > > This change is to upgrade the Hadoop ADLS connector to enable new auth > features exposed by the ADLS Java SDK. > Specifically: > MSI Tokens: MSI (Managed Service Identity) is a way to provide an identity to > an Azure Service. In the case of VMs, they can be used to give an identity to > a VM deployment. This simplifies managing Service Principals, since the creds > don’t have to be managed in core-site files anymore. The way this works is > that during VM deployment, the ARM (Azure Resource Manager) template needs to > be modified to enable MSI. Once deployed, the MSI extension runs a service on > the VM that exposes a token endpoint to http://localhost at a port specified > in the template. The SDK has a new TokenProvider to fetch the token from this > local endpoint. This change would expose that TokenProvider as an auth option. > DeviceCode auth: This enables a token to be obtained from an interactive > login. The user is given a URL and a token to use on the login screen. User > can use the token to login from any device. Once the login is done, the token > that is obtained is in the name of the user who logged in. Note that because > of the interactive login involved, this is not very suitable for job > scenarios, but can work for ad-hoc scenarios like running “hdfs dfs” commands. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14468) S3Guard: make short-circuit getFileStatus() configurable
[ https://issues.apache.org/jira/browse/HADOOP-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076845#comment-16076845 ] Steve Loughran commented on HADOOP-14468: - That said, looking at all the places we call getFileStatus, it'd be a useful little sanity check all round. > S3Guard: make short-circuit getFileStatus() configurable > > > Key: HADOOP-14468 > URL: https://issues.apache.org/jira/browse/HADOOP-14468 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Aaron Fabbri >Assignee: Aaron Fabbri >Priority: Minor > > Currently, when S3Guard is enabled, getFileStatus() will skip S3 if it gets a > result from the MetadataStore (e.g. dynamodb) first. > I would like to add a new parameter > {{fs.s3a.metadatastore.getfilestatus.authoritative}} which, when true, keeps > the current behavior. When false, S3AFileSystem will check both S3 and the > MetadataStore. > I'm not sure yet if we want to have this behavior the same for all callers of > getFileStatus(), or if we only want to check both S3 and MetadataStore for > some internal callers such as open(). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14467) S3Guard: Improve FNFE message when opening a stream
[ https://issues.apache.org/jira/browse/HADOOP-14467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076839#comment-16076839 ] Steve Loughran commented on HADOOP-14467: - without s3guard, this can arise if # a file is deleted between the open() returning a reference and the first read() call. # a file is deleted during a read sequence and a new partial GET of a file is made (new seek, new block in the fadvise=random mode). # also if a file is deleted while a sequential GET was already in progress and a subsequent read() causes this to surface (issue: how does it surface?). If it's a read error we'll try and re-open the connection, which should escalate it to condition (2) We can certainly write tests for the first two of these; the final one is probably driven by buffer settings in the infrastructure (or indeed, could be used to determine what those buffer sizes are) s3guard adds a new failure: file is in the DDB, but not in the FS. This will surface as a similar situation to #1 above. Maybe that's something which the FS itself should be made aware of, in a metric or callback. There's some incrementing of statistics in the {{S3AInputStream}}, but it could actually invoke some callback on the S3A FS to say "we've got a failure on read #0 of blob s3a://bucket/file1", which can then trigger other actions if the FS is s3guard. It could also think about a callback if the first read triggered an EOF as well, as that could be a sign of the file length not being what DDB thinks it is. > S3Guard: Improve FNFE message when opening a stream > --- > > Key: HADOOP-14467 > URL: https://issues.apache.org/jira/browse/HADOOP-14467 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Aaron Fabbri >Assignee: Aaron Fabbri >Priority: Minor > > Following up on the [discussion on > HADOOP-13345|https://issues.apache.org/jira/browse/HADOOP-13345?focusedCommentId=16030050=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16030050], > because S3Guard can serve getFileStatus() from the MetadataStore without > doing a HEAD on S3, a FileNotFound error on a file due to S3 GET > inconsistency does not happen on open(), but on the first read of the stream. > We may add retries to the S3 client in the future, but for now we should > have an exception message that indicates this may be due to inconsistency > (assuming it isn't a more straightforward case like someone deleting the > object out from under you). > This is expected to be a rare case, since the S3 service is now mostly > consistent for GET. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14627) Enable new features of ADLS SDK (MSI, Device Code auth)
[ https://issues.apache.org/jira/browse/HADOOP-14627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076836#comment-16076836 ] Atul Sikaria commented on HADOOP-14627: --- Thanks [~ste...@apache.org] for reviewing. Will do a test, and fix the casing on the property name. Note on the test that it is only meaningful if run from within an Azure VM - MSI service will not be present anywhere else. The properties are not secrets - ok to have in cleartext. > Enable new features of ADLS SDK (MSI, Device Code auth) > --- > > Key: HADOOP-14627 > URL: https://issues.apache.org/jira/browse/HADOOP-14627 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/adl > Environment: MSI Change applies only to Hadoop running in an Azure VM >Reporter: Atul Sikaria >Assignee: Atul Sikaria > Attachments: HADOOP-14627-001.patch > > > This change is to upgrade the Hadoop ADLS connector to enable new auth > features exposed by the ADLS Java SDK. > Specifically: > MSI Tokens: MSI (Managed Service Identity) is a way to provide an identity to > an Azure Service. In the case of VMs, they can be used to give an identity to > a VM deployment. This simplifies managing Service Principals, since the creds > don’t have to be managed in core-site files anymore. The way this works is > that during VM deployment, the ARM (Azure Resource Manager) template needs to > be modified to enable MSI. Once deployed, the MSI extension runs a service on > the VM that exposes a token endpoint to http://localhost at a port specified > in the template. The SDK has a new TokenProvider to fetch the token from this > local endpoint. This change would expose that TokenProvider as an auth option. > DeviceCode auth: This enables a token to be obtained from an interactive > login. The user is given a URL and a token to use on the login screen. User > can use the token to login from any device. Once the login is done, the token > that is obtained is in the name of the user who logged in. Note that because > of the interactive login involved, this is not very suitable for job > scenarios, but can work for ad-hoc scenarios like running “hdfs dfs” commands. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14468) S3Guard: make short-circuit getFileStatus() configurable
[ https://issues.apache.org/jira/browse/HADOOP-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076817#comment-16076817 ] Steve Loughran commented on HADOOP-14468: - The general code path for file IO is one of two things sequential from the start (gzip unzip, CSV, text, avro) {code} instream= fs.open(path) instream.read() {code} or: seek and read, explicit or in a readFully() call. This is the codepath in: .snappy files, examining columnar stored data in ORC, Parquet, ... {code} instream= fs.open(path) instream.seek(somewhere) instream.read(bytes) instream.seek(somewhere-else) ... {code} Either way, there's usually a read() call very shortly after the open, which is when any missing file will surface, so we don't need to overreact —just make sure that the error message which surfaces ona 404 on the first open of a file is propagated up in a way which is meaningful and consistent with what people normally expect. HADOOP-14467 looks at that. Doing it as a fallback for troubleshooting/monitoring is something to consider though. > S3Guard: make short-circuit getFileStatus() configurable > > > Key: HADOOP-14468 > URL: https://issues.apache.org/jira/browse/HADOOP-14468 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Aaron Fabbri >Assignee: Aaron Fabbri >Priority: Minor > > Currently, when S3Guard is enabled, getFileStatus() will skip S3 if it gets a > result from the MetadataStore (e.g. dynamodb) first. > I would like to add a new parameter > {{fs.s3a.metadatastore.getfilestatus.authoritative}} which, when true, keeps > the current behavior. When false, S3AFileSystem will check both S3 and the > MetadataStore. > I'm not sure yet if we want to have this behavior the same for all callers of > getFileStatus(), or if we only want to check both S3 and MetadataStore for > some internal callers such as open(). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13761) S3Guard: implement retries
[ https://issues.apache.org/jira/browse/HADOOP-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076805#comment-16076805 ] Steve Loughran commented on HADOOP-13761: - we need to implement retry logic in all AWS calls which bypass the xfer manager, so that transient failures (503/throttle, connection timeout) can get retried. The core code is in the HADOOP-13786 branch; it just needs rollout to the existing methods and policies to deal with s3guard failures: when to fail, when to retry. And, for DDB: when to fall back to the blobstore, which is a different recovery strategy to the rest > S3Guard: implement retries > --- > > Key: HADOOP-13761 > URL: https://issues.apache.org/jira/browse/HADOOP-13761 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: HADOOP-13345 >Reporter: Aaron Fabbri > > Following the S3AFileSystem integration patch in HADOOP-13651, we need to add > retry logic. > In HADOOP-13651, I added TODO comments in most of the places retry loops are > needed, including: > - open(path). If MetadataStore reflects recent create/move of file path, but > we fail to read it from S3, retry. > - delete(path). If deleteObject() on S3 fails, but MetadataStore shows the > file exists, retry. > - rename(src,dest). If source path is not visible in S3 yet, retry. > - listFiles(). Skip for now. Not currently implemented in S3Guard. I will > create a separate JIRA for this as it will likely require interface changes > (i.e. prefix or subtree scan). > We may miss some cases initially and we should do failure injection testing > to make sure we're covered. Failure injection tests can be a separate JIRA > to make this easier to review. > We also need basic configuration parameters around retry policy. There > should be a way to specify maximum retry duration, as some applications would > prefer to receive an error eventually, than waiting indefinitely. We should > also be keeping statistics when inconsistency is detected and we enter a > retry loop. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14576) DynamoDB tables may leave ACTIVE state after initial connection
[ https://issues.apache.org/jira/browse/HADOOP-14576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076792#comment-16076792 ] Steve Loughran commented on HADOOP-14576: - this parallel rename -it is how hive implements a commit? As if so, if we can move it off rename/(copy & delete) as its commit strategy, then that could make it go away. Assuming it is a commit operation, we would presumably like it to complete, even if that took 1+ attempt to go through. Which means: something which should be retried. I'm adding retry policy in the HADOOP-13786 commit code (and more fault injection into the inconsistent client); we can use its handling and tests as a basis for this. That branch/patch handles 503 throttled responses from S3 with backoff and retry (and a shorter policy for other failures considered recoverable)DDB state changes could be treated as another error to handle under the throttle policy. > DynamoDB tables may leave ACTIVE state after initial connection > --- > > Key: HADOOP-14576 > URL: https://issues.apache.org/jira/browse/HADOOP-14576 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: HADOOP-13345 >Reporter: Sean Mackrory > > We currently only anticipate tables not being in the ACTIVE state when first > connecting. It is possible for a table to be in the ACTIVE state and move to > an UPDATING state during partitioning events. Attempts to read or write > during that time will result in an AmazonServerException getting thrown. We > should try to handle that better... -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14577) ITestS3AInconsistency.testGetFileStatus failing in -DS3guard test runs
[ https://issues.apache.org/jira/browse/HADOOP-14577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-14577: Summary: ITestS3AInconsistency.testGetFileStatus failing in -DS3guard test runs (was: ITestS3AInconsistency.testGetFileStatus failing) > ITestS3AInconsistency.testGetFileStatus failing in -DS3guard test runs > -- > > Key: HADOOP-14577 > URL: https://issues.apache.org/jira/browse/HADOOP-14577 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Sean Mackrory > > This test is failing for me when run individually or in parallel (with > -Ds3guard). Even if I revert back to the commit that introduced it. I thought > I had successful test runs on that before and haven't changed anything in my > test configuration. > {code}Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 4.671 > sec <<< FAILURE! - in org.apache.hadoop.fs.s3a.ITestS3AInconsistency > testGetFileStatus(org.apache.hadoop.fs.s3a.ITestS3AInconsistency) Time > elapsed: 4.475 sec <<< FAILURE! > java.lang.AssertionError: S3Guard failed to list parent of inconsistent child. > at org.junit.Assert.fail(Assert.java:88) > at > org.apache.hadoop.fs.s3a.ITestS3AInconsistency.testGetFileStatus(ITestS3AInconsistency.java:83){code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14577) ITestS3AInconsistency.testGetFileStatus failing
[ https://issues.apache.org/jira/browse/HADOOP-14577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076688#comment-16076688 ] Steve Loughran commented on HADOOP-14577: - I was about to say worksforme, but once I do -Ds3guard it fails for me too {code} --- T E S T S --- Running org.apache.hadoop.fs.s3a.ITestS3AInconsistency Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 2.856 sec <<< FAILURE! - in org.apache.hadoop.fs.s3a.ITestS3AInconsistency testGetFileStatus(org.apache.hadoop.fs.s3a.ITestS3AInconsistency) Time elapsed: 2.763 sec <<< FAILURE! java.lang.AssertionError: S3Guard failed to list parent of inconsistent child. at org.junit.Assert.fail(Assert.java:88) at org.apache.hadoop.fs.s3a.ITestS3AInconsistency.testGetFileStatus(ITestS3AInconsistency.java:83) Results : Failed tests: ITestS3AInconsistency.testGetFileStatus:83->Assert.fail:88 S3Guard failed to list parent of inconsistent child. {code} > ITestS3AInconsistency.testGetFileStatus failing > --- > > Key: HADOOP-14577 > URL: https://issues.apache.org/jira/browse/HADOOP-14577 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Sean Mackrory > > This test is failing for me when run individually or in parallel (with > -Ds3guard). Even if I revert back to the commit that introduced it. I thought > I had successful test runs on that before and haven't changed anything in my > test configuration. > {code}Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 4.671 > sec <<< FAILURE! - in org.apache.hadoop.fs.s3a.ITestS3AInconsistency > testGetFileStatus(org.apache.hadoop.fs.s3a.ITestS3AInconsistency) Time > elapsed: 4.475 sec <<< FAILURE! > java.lang.AssertionError: S3Guard failed to list parent of inconsistent child. > at org.junit.Assert.fail(Assert.java:88) > at > org.apache.hadoop.fs.s3a.ITestS3AInconsistency.testGetFileStatus(ITestS3AInconsistency.java:83){code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14499) Findbugs warning in LocalMetadataStore
[ https://issues.apache.org/jira/browse/HADOOP-14499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-14499: Resolution: Fixed Fix Version/s: HADOOP-13345 Status: Resolved (was: Patch Available) +1 committed, thanks > Findbugs warning in LocalMetadataStore > -- > > Key: HADOOP-14499 > URL: https://issues.apache.org/jira/browse/HADOOP-14499 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Sean Mackrory >Assignee: Sean Mackrory > Fix For: HADOOP-13345 > > Attachments: HADOOP-14499-HADOOP-13345.001.patch, > HADOOP-14499-HADOOP-13345.002.patch, HADOOP-14499-HADOOP-13345.003.patch > > > First saw this raised by Yetus on HADOOP-14433: > {code} > Bug type UC_USELESS_OBJECT (click for details) > In class org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore > In method org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore.prune(long) > Value ancestors > Type java.util.LinkedList > At LocalMetadataStore.java:[line 300] > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14623) KafkaSink#init should set acks to 1,not 0
[ https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076628#comment-16076628 ] Hongyuan Li commented on HADOOP-14623: -- futuremore, flush method is to confirm that data has been written. > KafkaSink#init should set acks to 1,not 0 > - > > Key: HADOOP-14623 > URL: https://issues.apache.org/jira/browse/HADOOP-14623 > Project: Hadoop Common > Issue Type: Bug > Components: common, tools >Affects Versions: 3.0.0-alpha3 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > Attachments: HADOOP-14623-001.patch > > > {{KafkaSink}}#{{init}} should set ack to *1* to make sure the message has > been written to the broker at least. > current code list below: > {code} > > props.put("request.required.acks", "0"); > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-14475) Metrics of S3A don't print out when enable it in Hadoop metrics property file
[ https://issues.apache.org/jira/browse/HADOOP-14475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076511#comment-16076511 ] Yonger edited comment on HADOOP-14475 at 7/6/17 1:42 PM: - @steve the method you mentioned give an empty url to skip the landsat-pds tests is not work, also I upload the gz file into my bucket according to the guide, but it failed too. when giving the empty string, error message: Tests run: 9, Failures: 0, Errors: 6, Skipped: 0, Time elapsed: 0.325 sec <<< FAILURE! - in org.apache.hadoop.fs.s3a.TestS3AAWSCredentialsProvider testInstantiationChain(org.apache.hadoop.fs.s3a.TestS3AAWSCredentialsProvider) Time elapsed: 0.018 sec <<< ERROR! java.lang.IllegalArgumentException: Can not create a Path from an empty string at org.apache.hadoop.fs.Path.checkPathArg(Path.java:163) at org.apache.hadoop.fs.Path.(Path.java:175) at org.apache.hadoop.fs.s3a.TestS3AAWSCredentialsProvider.testInstantiationChain(TestS3AAWSCredentialsProvider.java:92) and if i use default value and upload the gz file, which give me a error message with code 403. was (Author: iyonger): [~stevea] the method you mentioned give an empty url to skip the landsat-pds tests is not work, also I upload the gz file into my bucket according to the guide, but it failed too. when giving the empty string, error message: Tests run: 9, Failures: 0, Errors: 6, Skipped: 0, Time elapsed: 0.325 sec <<< FAILURE! - in org.apache.hadoop.fs.s3a.TestS3AAWSCredentialsProvider testInstantiationChain(org.apache.hadoop.fs.s3a.TestS3AAWSCredentialsProvider) Time elapsed: 0.018 sec <<< ERROR! java.lang.IllegalArgumentException: Can not create a Path from an empty string at org.apache.hadoop.fs.Path.checkPathArg(Path.java:163) at org.apache.hadoop.fs.Path.(Path.java:175) at org.apache.hadoop.fs.s3a.TestS3AAWSCredentialsProvider.testInstantiationChain(TestS3AAWSCredentialsProvider.java:92) and if i use default value and upload the gz file, which give me a error message with code 403. > Metrics of S3A don't print out when enable it in Hadoop metrics property file > -- > > Key: HADOOP-14475 > URL: https://issues.apache.org/jira/browse/HADOOP-14475 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 2.8.0 > Environment: uname -a > Linux client01 4.4.0-74-generic #95-Ubuntu SMP Wed Apr 12 09:50:34 UTC 2017 > x86_64 x86_64 x86_64 GNU/Linux > cat /etc/issue > Ubuntu 16.04.2 LTS \n \l >Reporter: Yonger >Assignee: Yonger > Attachments: failsafe-report-s3a-it.html, > failsafe-report-s3a-scale.html, HADOOP-14475.002.patch, s3a-metrics.patch1, > stdout.zip > > > *.sink.file.class=org.apache.hadoop.metrics2.sink.FileSink > #*.sink.file.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink > #*.sink.influxdb.url=http:/xx > #*.sink.influxdb.influxdb_port=8086 > #*.sink.influxdb.database=hadoop > #*.sink.influxdb.influxdb_username=hadoop > #*.sink.influxdb.influxdb_password=hadoop > #*.sink.ingluxdb.cluster=c1 > *.period=10 > #namenode.sink.influxdb.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink > #S3AFileSystem.sink.influxdb.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink > S3AFileSystem.sink.file.filename=s3afilesystem-metrics.out > I can't find the out put file even i run a MR job which should be used s3. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14475) Metrics of S3A don't print out when enable it in Hadoop metrics property file
[ https://issues.apache.org/jira/browse/HADOOP-14475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076511#comment-16076511 ] Yonger commented on HADOOP-14475: - [~stevea] the method you mentioned give an empty url to skip the landsat-pds tests is not work, also I upload the gz file into my bucket according to the guide, but it failed too. when giving the empty string, error message: Tests run: 9, Failures: 0, Errors: 6, Skipped: 0, Time elapsed: 0.325 sec <<< FAILURE! - in org.apache.hadoop.fs.s3a.TestS3AAWSCredentialsProvider testInstantiationChain(org.apache.hadoop.fs.s3a.TestS3AAWSCredentialsProvider) Time elapsed: 0.018 sec <<< ERROR! java.lang.IllegalArgumentException: Can not create a Path from an empty string at org.apache.hadoop.fs.Path.checkPathArg(Path.java:163) at org.apache.hadoop.fs.Path.(Path.java:175) at org.apache.hadoop.fs.s3a.TestS3AAWSCredentialsProvider.testInstantiationChain(TestS3AAWSCredentialsProvider.java:92) and if i use default value and upload the gz file, which give me a error message with code 403. > Metrics of S3A don't print out when enable it in Hadoop metrics property file > -- > > Key: HADOOP-14475 > URL: https://issues.apache.org/jira/browse/HADOOP-14475 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 2.8.0 > Environment: uname -a > Linux client01 4.4.0-74-generic #95-Ubuntu SMP Wed Apr 12 09:50:34 UTC 2017 > x86_64 x86_64 x86_64 GNU/Linux > cat /etc/issue > Ubuntu 16.04.2 LTS \n \l >Reporter: Yonger >Assignee: Yonger > Attachments: failsafe-report-s3a-it.html, > failsafe-report-s3a-scale.html, HADOOP-14475.002.patch, s3a-metrics.patch1, > stdout.zip > > > *.sink.file.class=org.apache.hadoop.metrics2.sink.FileSink > #*.sink.file.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink > #*.sink.influxdb.url=http:/xx > #*.sink.influxdb.influxdb_port=8086 > #*.sink.influxdb.database=hadoop > #*.sink.influxdb.influxdb_username=hadoop > #*.sink.influxdb.influxdb_password=hadoop > #*.sink.ingluxdb.cluster=c1 > *.period=10 > #namenode.sink.influxdb.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink > #S3AFileSystem.sink.influxdb.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink > S3AFileSystem.sink.file.filename=s3afilesystem-metrics.out > I can't find the out put file even i run a MR job which should be used s3. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14553) Add (parallelized) integration tests to hadoop-azure
[ https://issues.apache.org/jira/browse/HADOOP-14553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076504#comment-16076504 ] Steve Loughran commented on HADOOP-14553: - {code} -public class TestNativeAzureFileSystemContractMocked extends +/** + * Mocked testing of FileSystemContractBaseTest. + * This isn't an IT, but making it so makes it a lot faster for now. + */ +public class ITestNativeAzureFileSystemContractMocked extends {code} bq. why is it faster as ITest? its not that the test finishes fast, it's just as something slow, running it in parallel meant the test run took less time. I want to do another iteration of this and * rename Test* which requires credentails to being an ITest —but just list them in the sequential section * leave the other tests alone * change the test profile in the POM to run the normal test profile without looking for an auth-keys file Goal: Jenkins/yetus to run the unit tests; move everything else to integration tests sooner rather than later, and so allow for 1+ followup which parallelised the remaining tests, or in the case of the big native test suite, split it up. Regarding commonality between S3A test runner and the new stuff, yes, I did copy and past S3ATestUtils in, which you would have noticed. Trouble is: I don't know what commonality we really have right now. > Add (parallelized) integration tests to hadoop-azure > > > Key: HADOOP-14553 > URL: https://issues.apache.org/jira/browse/HADOOP-14553 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 2.9.0 >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: HADOOP-14553-001.patch, HADOOP-14553-002.patch, > HADOOP-14553-003.patch, HADOOP-14553-004.patch > > > The Azure tests are slow to run as they are serialized, as they are all > called Test* there's no clear differentiation from unit tests which Jenkins > can run, and integration tests which it can't. > Move the azure tests {{Test*}} to integration tests {{ITest*}}, parallelize > (which includes having separate paths for every test suite). The code in > hadoop-aws's POM show what to do. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14627) Enable new features of ADLS SDK (MSI, Device Code auth)
[ https://issues.apache.org/jira/browse/HADOOP-14627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076496#comment-16076496 ] Steve Loughran commented on HADOOP-14627: - * A test would still be good, if just to verify that attempting to use the new auth mechanism fails if the configuration is missing any required property. * New {{fs.adl.oauth2.msi.TenantGuid}} should be all lower case, for consistency with (nearly) everything else * Is this property a secret which should be stored in hadoop credentials files & retrieved with Configuration.getPassword()? > Enable new features of ADLS SDK (MSI, Device Code auth) > --- > > Key: HADOOP-14627 > URL: https://issues.apache.org/jira/browse/HADOOP-14627 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/adl > Environment: MSI Change applies only to Hadoop running in an Azure VM >Reporter: Atul Sikaria >Assignee: Atul Sikaria > Attachments: HADOOP-14627-001.patch > > > This change is to upgrade the Hadoop ADLS connector to enable new auth > features exposed by the ADLS Java SDK. > Specifically: > MSI Tokens: MSI (Managed Service Identity) is a way to provide an identity to > an Azure Service. In the case of VMs, they can be used to give an identity to > a VM deployment. This simplifies managing Service Principals, since the creds > don’t have to be managed in core-site files anymore. The way this works is > that during VM deployment, the ARM (Azure Resource Manager) template needs to > be modified to enable MSI. Once deployed, the MSI extension runs a service on > the VM that exposes a token endpoint to http://localhost at a port specified > in the template. The SDK has a new TokenProvider to fetch the token from this > local endpoint. This change would expose that TokenProvider as an auth option. > DeviceCode auth: This enables a token to be obtained from an interactive > login. The user is given a URL and a token to use on the login screen. User > can use the token to login from any device. Once the login is done, the token > that is obtained is in the name of the user who logged in. Note that because > of the interactive login involved, this is not very suitable for job > scenarios, but can work for ad-hoc scenarios like running “hdfs dfs” commands. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-8740) Build target to generate findbugs html output
[ https://issues.apache.org/jira/browse/HADOOP-8740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andras Bokor resolved HADOOP-8740. -- Resolution: Invalid > Build target to generate findbugs html output > - > > Key: HADOOP-8740 > URL: https://issues.apache.org/jira/browse/HADOOP-8740 > Project: Hadoop Common > Issue Type: Improvement > Components: build >Reporter: Eli Collins >Assignee: Andras Bokor > > It would be useful if there was a build target or flag to generate findbugs > output. It would depend on {{mvn compile findbugs:findbugs}} and run > {{$FINDBUGS_HOME/bin/convertXmlToText -html ../path/to/findbugsXml.xml > findbugs.html}} to generate findbugs.html in the target directory. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13414) Hide Jetty Server version header in HTTP responses
[ https://issues.apache.org/jira/browse/HADOOP-13414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076160#comment-16076160 ] Surendra Singh Lilhore commented on HADOOP-13414: - Thanks [~vinayrpet] for review and commit. > Hide Jetty Server version header in HTTP responses > -- > > Key: HADOOP-13414 > URL: https://issues.apache.org/jira/browse/HADOOP-13414 > Project: Hadoop Common > Issue Type: Bug > Components: security >Reporter: Vinayakumar B >Assignee: Surendra Singh Lilhore > Fix For: 2.9.0, 3.0.0-alpha4, 2.8.2 > > Attachments: Aftrerfix.png, BeforeFix.png, HADOOP-13414-001.patch, > HADOOP-13414-002.patch, HADOOP-13414-branch-2.patch > > > Hide Jetty Server version in HTTP Response header. Some security analyzers > would think this as an issue. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14623) KafkaSink#init should set acks to 1,not 0
[ https://issues.apache.org/jira/browse/HADOOP-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16076154#comment-16076154 ] Hongyuan Li commented on HADOOP-14623: -- i don't think so, setting it to 1 does not means that it will block.However, i think that Ganglia knows the frquency of data lossed, but kafka does not. What you have said under estimate kafka.Kafka has more power.Compared to complete sync of setting acks to -1, setting acks to 1 is a better choice. > KafkaSink#init should set acks to 1,not 0 > - > > Key: HADOOP-14623 > URL: https://issues.apache.org/jira/browse/HADOOP-14623 > Project: Hadoop Common > Issue Type: Bug > Components: common, tools >Affects Versions: 3.0.0-alpha3 >Reporter: Hongyuan Li >Assignee: Hongyuan Li > Attachments: HADOOP-14623-001.patch > > > {{KafkaSink}}#{{init}} should set ack to *1* to make sure the message has > been written to the broker at least. > current code list below: > {code} > > props.put("request.required.acks", "0"); > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14624) Add GenericTestUtils.DelayAnswer that accept slf4j logger API
[ https://issues.apache.org/jira/browse/HADOOP-14624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16075984#comment-16075984 ] Wenxin He commented on HADOOP-14624: 17 new warnings in javac are caused by new deprecated method {{DelayAnswer(Log)}}. > Add GenericTestUtils.DelayAnswer that accept slf4j logger API > - > > Key: HADOOP-14624 > URL: https://issues.apache.org/jira/browse/HADOOP-14624 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Wenxin He >Assignee: Wenxin He > Attachments: HADOOP-14624.001.patch, HADOOP-14624.002.patch > > > Split from HADOOP-14539. > Now GenericTestUtils.DelayAnswer only accepts commons-logging logger API. Now > we are migrating the APIs to slf4j, slf4j logger API should be accepted as > well. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org