[jira] [Updated] (HADOOP-15003) Merge S3A committers into trunk: Yetus patch checker

2017-11-02 Thread Aaron Fabbri (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Fabbri updated HADOOP-15003:
--
Status: Patch Available  (was: Open)

> Merge S3A committers into trunk: Yetus patch checker
> 
>
> Key: HADOOP-15003
> URL: https://issues.apache.org/jira/browse/HADOOP-15003
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-13786-041.patch, HADOOP-13786-042.patch
>
>
> This is a Yetus only JIRA created to have Yetus review the 
> HADOOP-13786/HADOOP-14971 patch as a .patch file, as the review PR 
> [https://github.com/apache/hadoop/pull/282] is stopping this happening in 
> HADOOP-14971.
> Reviews should go into the PR/other task



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15013) Fix ResourceEstimator findbugs issues

2017-11-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236984#comment-16236984
 ] 

Hudson commented on HADOOP-15013:
-

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13181 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13181/])
HADOOP-15013. Fix ResourceEstimator findbugs issues. (asuresh) (arun suresh: 
rev 53c0fb7efebfac4a79f5cce2dd42cf00411d51e7)
* (edit) 
hadoop-tools/hadoop-resourceestimator/src/main/java/org/apache/hadoop/resourceestimator/translator/impl/BaseLogParser.java
* (edit) 
hadoop-tools/hadoop-resourceestimator/src/main/java/org/apache/hadoop/resourceestimator/service/ResourceEstimatorService.java
* (edit) 
hadoop-tools/hadoop-resourceestimator/src/main/java/org/apache/hadoop/resourceestimator/translator/impl/LogParserUtil.java
* (edit) 
hadoop-tools/hadoop-resourceestimator/src/test/java/org/apache/hadoop/resourceestimator/service/TestResourceEstimatorService.java


> Fix ResourceEstimator findbugs issues
> -
>
> Key: HADOOP-15013
> URL: https://issues.apache.org/jira/browse/HADOOP-15013
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.9.0, 3.1.0
>Reporter: Allen Wittenauer
>Assignee: Arun Suresh
>Priority: Blocker
> Fix For: 2.9.0, 3.0.0, 3.1.0
>
> Attachments: YARN-7431.001.patch
>
>
> Just see any recent report.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13514) Upgrade maven surefire plugin to 2.19.1

2017-11-02 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236972#comment-16236972
 ] 

Allen Wittenauer commented on HADOOP-13514:
---

Latest version of surefire is 2.20.1, released in September.

I've been spending a few weeks looking at our usage on the ASF build machines.  
We typically have anywhere from 8 to 40 zombie JVMs running around after the 
hadoop-common, hadoop-hdfs, and hadoop-mapreduce allegedly complete. These do 
nothing but eat resources. As a result, timeouts are pretty normal and problems 
that are claimed to be "environmental" are mostly self-inflicted by badly 
written tests that really only get any stress on the build machines.

I'm fairly convinced that HDFS-12711 can be prevented with SUREFIRE-773, which 
was closed as a dupe of SUREFIRE-524.  At this point, I think we absolutely 
need to make this a priority to get into at least trunk and branch-2 ASAP.  

[It's easy to say... "this wasn't a problem before!"... I'm not so convinced it 
wasn't.  I think we just didn't see it as often.  But new tests are doing new 
things that eat more resources and stay around longer which impacts more runs.  
Additionally, running unit tests under Docker guaranteed that when the 
container died, so did all of these stale JVMs.  ]


> Upgrade maven surefire plugin to 2.19.1
> ---
>
> Key: HADOOP-13514
> URL: https://issues.apache.org/jira/browse/HADOOP-13514
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 2.8.0
>Reporter: Ewan Higgs
>Assignee: Akira Ajisaka
>Priority: Major
> Attachments: HADOOP-13514-addendum.01.patch, 
> HADOOP-13514-testing.001.patch, HADOOP-13514-testing.002.patch, 
> HADOOP-13514-testing.003.patch, HADOOP-13514-testing.004.patch, 
> HADOOP-13514.002.patch, HADOOP-13514.003.patch, HADOOP-13514.004.patch, 
> HADOOP-13514.005.patch, surefire-2.19.patch
>
>
> A lot of people working on Hadoop don't want to run all the tests when they 
> develop; only the bits they're working on. Surefire 2.19 introduced more 
> useful test filters which let us run a subset of the tests that brings the 
> build time down from 'come back tomorrow' to 'grab a coffee'.
> For instance, if I only care about the S3 adaptor, I might run:
> {code}
> mvn test -Dmaven.javadoc.skip=true -Pdist,native -Djava.awt.headless=true 
> \"-Dtest=org.apache.hadoop.fs.*, org.apache.hadoop.hdfs.*, 
> org.apache.hadoop.fs.s3a.*\"
> {code}
> We can work around this by specifying the surefire version on the command 
> line but it would be better, imo, to just update the default surefire used.
> {code}
> mvn test -Dmaven.javadoc.skip=true -Pdist,native -Djava.awt.headless=true 
> \"-Dtest=org.apache.hadoop.fs.*, org.apache.hadoop.hdfs.*, 
> org.apache.hadoop.fs.s3a.*\" -Dmaven-surefire-plugin.version=2.19.1
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Assigned] (HADOOP-10768) Optimize Hadoop RPC encryption performance

2017-11-02 Thread Dapeng Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dapeng Sun reassigned HADOOP-10768:
---

Assignee: Dapeng Sun  (was: Dian Fu)

> Optimize Hadoop RPC encryption performance
> --
>
> Key: HADOOP-10768
> URL: https://issues.apache.org/jira/browse/HADOOP-10768
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: performance, security
>Affects Versions: 3.0.0-alpha1
>Reporter: Yi Liu
>Assignee: Dapeng Sun
>Priority: Major
> Attachments: HADOOP-10768.001.patch, HADOOP-10768.002.patch, 
> HADOOP-10768.003.patch, Optimize Hadoop RPC encryption performance.pdf
>
>
> Hadoop RPC encryption is enabled by setting {{hadoop.rpc.protection}} to 
> "privacy". It utilized SASL {{GSSAPI}} and {{DIGEST-MD5}} mechanisms for 
> secure authentication and data protection. Even {{GSSAPI}} supports using 
> AES, but without AES-NI support by default, so the encryption is slow and 
> will become bottleneck.
> After discuss with [~atm], [~tucu00] and [~umamaheswararao], we can do the 
> same optimization as in HDFS-6606. Use AES-NI with more than *20x* speedup.
> On the other hand, RPC message is small, but RPC is frequent and there may be 
> lots of RPC calls in one connection, we needs to setup benchmark to see real 
> improvement and then make a trade-off. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-10768) Optimize Hadoop RPC encryption performance

2017-11-02 Thread Dapeng Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236945#comment-16236945
 ] 

Dapeng Sun commented on HADOOP-10768:
-

Discussed with [~dian.fu], I would like to pick up this JIRA. I will uploaded a 
new patch when I finished.

> Optimize Hadoop RPC encryption performance
> --
>
> Key: HADOOP-10768
> URL: https://issues.apache.org/jira/browse/HADOOP-10768
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: performance, security
>Affects Versions: 3.0.0-alpha1
>Reporter: Yi Liu
>Assignee: Dian Fu
>Priority: Major
> Attachments: HADOOP-10768.001.patch, HADOOP-10768.002.patch, 
> HADOOP-10768.003.patch, Optimize Hadoop RPC encryption performance.pdf
>
>
> Hadoop RPC encryption is enabled by setting {{hadoop.rpc.protection}} to 
> "privacy". It utilized SASL {{GSSAPI}} and {{DIGEST-MD5}} mechanisms for 
> secure authentication and data protection. Even {{GSSAPI}} supports using 
> AES, but without AES-NI support by default, so the encryption is slow and 
> will become bottleneck.
> After discuss with [~atm], [~tucu00] and [~umamaheswararao], we can do the 
> same optimization as in HDFS-6606. Use AES-NI with more than *20x* speedup.
> On the other hand, RPC message is small, but RPC is frequent and there may be 
> lots of RPC calls in one connection, we needs to setup benchmark to see real 
> improvement and then make a trade-off. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14971) Merge S3A committers into trunk

2017-11-02 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236906#comment-16236906
 ] 

Aaron Fabbri commented on HADOOP-14971:
---

Another random thing I noticed:
{noformat}
  case 443:
ioe = new AWSNoResponseException(message, ase);
break;
{noformat}

What is response code 443?  Have you seen this in the wild?

> Merge S3A committers into trunk
> ---
>
> Key: HADOOP-14971
> URL: https://issues.apache.org/jira/browse/HADOOP-14971
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-13786-040.patch, HADOOP-13786-041.patch
>
>
> Merge the HADOOP-13786 committer into trunk. This branch is being set up as a 
> github PR for review there & to keep it out the mailboxes of the watchers on 
> the main JIRA



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15013) Fix ResourceEstimator findbugs issues

2017-11-02 Thread Subru Krishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236883#comment-16236883
 ] 

Subru Krishnan commented on HADOOP-15013:
-

Thanks [~asuresh] for the quick turnaround, [~curino] for the review and [~aw] 
for bringing this to our attention.

> Fix ResourceEstimator findbugs issues
> -
>
> Key: HADOOP-15013
> URL: https://issues.apache.org/jira/browse/HADOOP-15013
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.9.0, 3.1.0
>Reporter: Allen Wittenauer
>Assignee: Arun Suresh
>Priority: Blocker
> Fix For: 2.9.0, 3.0.0, 3.1.0
>
> Attachments: YARN-7431.001.patch
>
>
> Just see any recent report.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15013) Fix ResourceEstimator findbugs issues

2017-11-02 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated HADOOP-15013:
-
   Resolution: Fixed
Fix Version/s: 3.1.0
   3.0.0
   2.9.0
   Status: Resolved  (was: Patch Available)

Thanks [~subru] and [~curino].
Committed to trunk, branch-2, branch-2.9 and branch-3.0 (oh so many branches !!)

> Fix ResourceEstimator findbugs issues
> -
>
> Key: HADOOP-15013
> URL: https://issues.apache.org/jira/browse/HADOOP-15013
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.9.0, 3.1.0
>Reporter: Allen Wittenauer
>Assignee: Arun Suresh
>Priority: Blocker
> Fix For: 2.9.0, 3.0.0, 3.1.0
>
> Attachments: YARN-7431.001.patch
>
>
> Just see any recent report.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15013) Fix ResourceEstimator findbugs issues

2017-11-02 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated HADOOP-15013:
-
Summary: Fix ResourceEstimator findbugs issues  (was: Fix Resource 
Estimator findbugs issues)

> Fix ResourceEstimator findbugs issues
> -
>
> Key: HADOOP-15013
> URL: https://issues.apache.org/jira/browse/HADOOP-15013
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.9.0, 3.1.0
>Reporter: Allen Wittenauer
>Assignee: Arun Suresh
>Priority: Blocker
> Attachments: YARN-7431.001.patch
>
>
> Just see any recent report.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Moved] (HADOOP-15013) resource estimator has findbugs problems

2017-11-02 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh moved YARN-7431 to HADOOP-15013:


Affects Version/s: (was: 3.1.0)
   (was: 2.9.0)
   3.1.0
   2.9.0
 Target Version/s: 2.9.0, 3.0.0, 3.1.0  (was: 2.9.0, 3.0.0, 3.1.0)
  Component/s: (was: resourcemanager)
  Key: HADOOP-15013  (was: YARN-7431)
  Project: Hadoop Common  (was: Hadoop YARN)

> resource estimator has findbugs problems
> 
>
> Key: HADOOP-15013
> URL: https://issues.apache.org/jira/browse/HADOOP-15013
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.9.0, 3.1.0
>Reporter: Allen Wittenauer
>Assignee: Arun Suresh
>Priority: Blocker
> Attachments: YARN-7431.001.patch
>
>
> Just see any recent report.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15013) Fix Resource Estimator findbugs issues

2017-11-02 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated HADOOP-15013:
-
Summary: Fix Resource Estimator findbugs issues  (was: resource estimator 
has findbugs problems)

> Fix Resource Estimator findbugs issues
> --
>
> Key: HADOOP-15013
> URL: https://issues.apache.org/jira/browse/HADOOP-15013
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.9.0, 3.1.0
>Reporter: Allen Wittenauer
>Assignee: Arun Suresh
>Priority: Blocker
> Attachments: YARN-7431.001.patch
>
>
> Just see any recent report.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14600) LocatedFileStatus constructor forces RawLocalFS to exec a process to get the permissions

2017-11-02 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236837#comment-16236837
 ] 

Chris Douglas commented on HADOOP-14600:


Just skimmed the patch, but this line jumped out:
{{noformat}}
+cleanup:
+  if (ret == NULL) {
+if (path)
+  (*env)->ReleaseStringChars(env, j_path, (const jchar*) path);
{{noformat}}
Shouldn't {{path}} be released if not null, even if {{ret != NULL}} ?

Checkstyle output is gone, but it looks like {{Helper.java}} is not indented 
correctly. Other than that, this looks good.


> LocatedFileStatus constructor forces RawLocalFS to exec a process to get the 
> permissions
> 
>
> Key: HADOOP-14600
> URL: https://issues.apache.org/jira/browse/HADOOP-14600
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 2.7.3
> Environment: file:// in a dir with many files
>Reporter: Steve Loughran
>Assignee: Ping Liu
>Priority: Major
> Attachments: HADOOP-14600.001.patch, HADOOP-14600.002.patch, 
> HADOOP-14600.003.patch, HADOOP-14600.004.patch, HADOOP-14600.005.patch, 
> TestRawLocalFileSystemContract.java
>
>
> Reported in SPARK-21137. a {{FileSystem.listStatus}} call really craws 
> against the local FS, because {{FileStatus.getPemissions}} call forces  
> {{DeprecatedRawLocalFileStatus}} tp spawn a process to read the real UGI 
> values.
> That is: for every other FS, what's a field lookup or even a no-op, on the 
> local FS it's a process exec/spawn, with all the costs. This gets expensive 
> if you have many files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-14600) LocatedFileStatus constructor forces RawLocalFS to exec a process to get the permissions

2017-11-02 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236837#comment-16236837
 ] 

Chris Douglas edited comment on HADOOP-14600 at 11/2/17 11:51 PM:
--

Just skimmed the patch, but this line jumped out:
{noformat}
+cleanup:
+  if (ret == NULL) {
+if (path)
+  (*env)->ReleaseStringChars(env, j_path, (const jchar*) path);
{noformat}
Shouldn't {{path}} be released if not null, even if {{ret != NULL}} ?

Checkstyle output is gone, but it looks like {{Helper.java}} is not indented 
correctly. Other than that, this looks good.



was (Author: chris.douglas):
Just skimmed the patch, but this line jumped out:
{{noformat}}
+cleanup:
+  if (ret == NULL) {
+if (path)
+  (*env)->ReleaseStringChars(env, j_path, (const jchar*) path);
{{noformat}}
Shouldn't {{path}} be released if not null, even if {{ret != NULL}} ?

Checkstyle output is gone, but it looks like {{Helper.java}} is not indented 
correctly. Other than that, this looks good.


> LocatedFileStatus constructor forces RawLocalFS to exec a process to get the 
> permissions
> 
>
> Key: HADOOP-14600
> URL: https://issues.apache.org/jira/browse/HADOOP-14600
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 2.7.3
> Environment: file:// in a dir with many files
>Reporter: Steve Loughran
>Assignee: Ping Liu
>Priority: Major
> Attachments: HADOOP-14600.001.patch, HADOOP-14600.002.patch, 
> HADOOP-14600.003.patch, HADOOP-14600.004.patch, HADOOP-14600.005.patch, 
> TestRawLocalFileSystemContract.java
>
>
> Reported in SPARK-21137. a {{FileSystem.listStatus}} call really craws 
> against the local FS, because {{FileStatus.getPemissions}} call forces  
> {{DeprecatedRawLocalFileStatus}} tp spawn a process to read the real UGI 
> values.
> That is: for every other FS, what's a field lookup or even a no-op, on the 
> local FS it's a process exec/spawn, with all the costs. This gets expensive 
> if you have many files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14989) metrics2 JMX cache refresh result in inconsistent Mutable(Stat|Rate) values

2017-11-02 Thread Erik Krogen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236743#comment-16236743
 ] 

Erik Krogen commented on HADOOP-14989:
--

Perfect, thank you [~eyang]! That information & the pointers to previous JIRAs 
is very helpful. I will plan further and update...

> metrics2 JMX cache refresh result in inconsistent Mutable(Stat|Rate) values
> ---
>
> Key: HADOOP-14989
> URL: https://issues.apache.org/jira/browse/HADOOP-14989
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: metrics
>Affects Versions: 2.6.5
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Critical
> Attachments: HADOOP-14989.test.patch
>
>
> While doing some digging in the metrics2 system recently, we noticed that the 
> way {{MutableStat}} values are collected (and thus {{MutableRate}}, since it 
> is based off of {{MutableStat}}) mean that every time the value is 
> snapshotted, all previous information is lost. So every time a JMX cache 
> refresh occurs, it resets the {{MutableStat}}, meaning that all configured 
> metrics sinks do not consider the previous statistics in their emitted 
> values. The same behavior is true if you configured multiple sink periods.
> {{MutableStat}}, to compute its average value, maintains a total value since 
> last snapshot, as well as operation count since last snapshot. Upon 
> snapshotting, the average is calculated as (total / opCount) and placed into 
> a gauge metric, and total / operation count are cleared. So the average value 
> represents the average since the last snapshot. If we have only a single sink 
> period ever snapshotting, this would result in the expected behavior that the 
> value is the average over the reporting period. However, if multiple sink 
> periods are configured, or if the JMX cache is refreshed, this is another 
> snapshot operation. So, for example, if you have a FileSink configured at a 
> 60 second interval and your JMX cache refreshes itself 1 second before the 
> FileSink period fires, the values emitted to your FileSink only represent 
> averages _over the last one second_.
> A few ways to solve this issue:
> * Make {{MutableRate}} manage its own average refresh, similar to 
> {{MutableQuantiles}}, which has a refresh thread and saves a snapshot of the 
> last quantile values that it will serve up until the next refresh. Given how 
> many {{MutableRate}} metrics there are, a thread per metric is not really 
> feasible, but could be done on e.g. a per-source basis. This has some 
> downsides: if multiple sinks are configured with different periods, what is 
> the right refresh period for the {{MutableRate}}? 
> * Make {{MutableRate}} emit two counters, one for total and one for operation 
> count, rather than an average gauge and an operation count counter. The 
> average could then be calculated downstream from this information. This is 
> cumbersome for operators and not backwards compatible. To improve on both of 
> those downsides, we could have it keep the current behavior but 
> _additionally_ emit the total as a counter. The snapshotted average is 
> probably sufficient in the common case (we've been using it for years), and 
> when more guaranteed accuracy is required, the average could be derived from 
> the total and operation count.
> The two above suggestions will fix this for both JMX and multiple sink 
> periods, but may be overkill. Multiple sink periods are probably not 
> necessary though we should at least document the behavior.
> Open to suggestions & input here.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-14987) Improve KMSClientProvider log around delegation token checking

2017-11-02 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236717#comment-16236717
 ] 

Xiao Chen edited comment on HADOOP-14987 at 11/2/17 10:35 PM:
--

IIRC technically you can review my patch to make it legit. It's just we cannot 
self +1. 

(It seems there's a checkstyle too... I can fix that tonight if you didn't beat 
me to it)


was (Author: xiaochen):
IIRC technically you can review my patch to make it legit. It's just we cannot 
self +1. 

> Improve KMSClientProvider log around delegation token checking
> --
>
> Key: HADOOP-14987
> URL: https://issues.apache.org/jira/browse/HADOOP-14987
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 2.7.3
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
> Attachments: HADOOP-14987.001.patch, HADOOP-14987.002.patch, 
> HADOOP-14987.003.patch, HADOOP-14987.004.patch, HADOOP-14987.005.patch
>
>
> KMSClientProvider#containsKmsDt uses SecurityUtil.buildTokenService(addr) to 
> build the key to look for KMS-DT from the UGI's token map. The token lookup 
> key here varies depending  on the KMSClientProvider's configuration value for 
> hadoop.security.token.service.use_ip. In certain cases, the token obtained 
> with non-matching hadoop.security.token.service.use_ip setting will not be 
> recognized by KMSClientProvider. This ticket is opened to improve logs for 
> troubleshooting KMS delegation token related issues like this.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14987) Improve KMSClientProvider log around delegation token checking

2017-11-02 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236717#comment-16236717
 ] 

Xiao Chen commented on HADOOP-14987:


IIRC technically you can review my patch to make it legit. It's just we cannot 
self +1. 

> Improve KMSClientProvider log around delegation token checking
> --
>
> Key: HADOOP-14987
> URL: https://issues.apache.org/jira/browse/HADOOP-14987
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 2.7.3
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
> Attachments: HADOOP-14987.001.patch, HADOOP-14987.002.patch, 
> HADOOP-14987.003.patch, HADOOP-14987.004.patch, HADOOP-14987.005.patch
>
>
> KMSClientProvider#containsKmsDt uses SecurityUtil.buildTokenService(addr) to 
> build the key to look for KMS-DT from the UGI's token map. The token lookup 
> key here varies depending  on the KMSClientProvider's configuration value for 
> hadoop.security.token.service.use_ip. In certain cases, the token obtained 
> with non-matching hadoop.security.token.service.use_ip setting will not be 
> recognized by KMSClientProvider. This ticket is opened to improve logs for 
> troubleshooting KMS delegation token related issues like this.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15003) Merge S3A committers into trunk: Yetus patch checker

2017-11-02 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15003:

Status: Open  (was: Patch Available)

> Merge S3A committers into trunk: Yetus patch checker
> 
>
> Key: HADOOP-15003
> URL: https://issues.apache.org/jira/browse/HADOOP-15003
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-13786-041.patch, HADOOP-13786-042.patch
>
>
> This is a Yetus only JIRA created to have Yetus review the 
> HADOOP-13786/HADOOP-14971 patch as a .patch file, as the review PR 
> [https://github.com/apache/hadoop/pull/282] is stopping this happening in 
> HADOOP-14971.
> Reviews should go into the PR/other task



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15003) Merge S3A committers into trunk: Yetus patch checker

2017-11-02 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15003:

Attachment: HADOOP-13786-042.patch

Patch 042, checkstyle

> Merge S3A committers into trunk: Yetus patch checker
> 
>
> Key: HADOOP-15003
> URL: https://issues.apache.org/jira/browse/HADOOP-15003
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-13786-041.patch, HADOOP-13786-042.patch
>
>
> This is a Yetus only JIRA created to have Yetus review the 
> HADOOP-13786/HADOOP-14971 patch as a .patch file, as the review PR 
> [https://github.com/apache/hadoop/pull/282] is stopping this happening in 
> HADOOP-14971.
> Reviews should go into the PR/other task



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15010) hadoop-resourceestimator's assembly buries it

2017-11-02 Thread Subru Krishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236651#comment-16236651
 ] 

Subru Krishnan commented on HADOOP-15010:
-

Resource Estimator currently follows the same structure as SLS 
(https://github.com/apache/hadoop/tree/trunk/hadoop-tools/hadoop-sls) as SLS is 
used quite extensively for internal developer analysis and this is quite 
similar. 

[~aw], setting the target version to 3.1.0 as the clean up has a transitive 
dependency on HADOOP-9902. Thanks.

> hadoop-resourceestimator's assembly buries it
> -
>
> Key: HADOOP-15010
> URL: https://issues.apache.org/jira/browse/HADOOP-15010
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: build, tools
>Affects Versions: 2.9.0, 3.1.0
>Reporter: Allen Wittenauer
>Priority: Blocker
>
> There's zero reason for this layout:
> {code}
> hadoop-3.1.0-SNAPSHOT/share/hadoop/tools/resourceestimator
>  - bin
>  - conf
>  - data
> {code}
> Buried that far back, it might as well not exist.
> Propose:
> a) HADOOP-15009 to eliminate bin
> b) Move conf file into etc/hadoop
> c) keep data where it's at



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15009) hadoop-resourceestimator's shell scripts are a mess

2017-11-02 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated HADOOP-15009:

Target Version/s: 3.1.0

> hadoop-resourceestimator's shell scripts are a mess
> ---
>
> Key: HADOOP-15009
> URL: https://issues.apache.org/jira/browse/HADOOP-15009
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: scripts, tools
>Affects Versions: 3.1.0
>Reporter: Allen Wittenauer
>Priority: Blocker
>
> #1:
> There's no reason for estimator.sh to exist.  Just make it a subcommand under 
> yarn or whatever.  
> #2:
> In it's current form, it's missing a BUNCH of boilerplate that makes certain 
> functionality completely fail.
> #3
> start/stop-estimator.sh is full of copypasta that doesn't actually do 
> anything/work correctly.  Additionally, if estimator.sh doesn't exist, 
> neither does this since yarn --daemon start/stop will do everything as 
> necessary.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15010) hadoop-resourceestimator's assembly buries it

2017-11-02 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated HADOOP-15010:

Target Version/s: 3.1.0

> hadoop-resourceestimator's assembly buries it
> -
>
> Key: HADOOP-15010
> URL: https://issues.apache.org/jira/browse/HADOOP-15010
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: build, tools
>Affects Versions: 2.9.0, 3.1.0
>Reporter: Allen Wittenauer
>Priority: Blocker
>
> There's zero reason for this layout:
> {code}
> hadoop-3.1.0-SNAPSHOT/share/hadoop/tools/resourceestimator
>  - bin
>  - conf
>  - data
> {code}
> Buried that far back, it might as well not exist.
> Propose:
> a) HADOOP-15009 to eliminate bin
> b) Move conf file into etc/hadoop
> c) keep data where it's at



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14872) CryptoInputStream should implement unbuffer

2017-11-02 Thread John Zhuge (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Zhuge updated HADOOP-14872:

Attachment: HADOOP-14872.013.patch

Thanks Xiao and Steve for the patience and great discussions.

Patch 013
* Incorporate Xiao and Steve's comments
* Add {{StreamCapabilitiesPolicy#unbuffer}} to implement the policy for 
unbuffer. Java 8 static method for interface CanUnbuffer could be a good choice 
but we can't use it due to backport concern.
* Use a table in FileSystem.md to list capabilities
* Plan to commit StreamCapabilities enhancements in HADOOP-15012 first, then 
check in CryptoInputStream changes here.

> CryptoInputStream should implement unbuffer
> ---
>
> Key: HADOOP-14872
> URL: https://issues.apache.org/jira/browse/HADOOP-14872
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Affects Versions: 2.6.4
>Reporter: John Zhuge
>Assignee: John Zhuge
>Priority: Major
> Attachments: HADOOP-14872.001.patch, HADOOP-14872.002.patch, 
> HADOOP-14872.003.patch, HADOOP-14872.004.patch, HADOOP-14872.005.patch, 
> HADOOP-14872.006.patch, HADOOP-14872.007.patch, HADOOP-14872.008.patch, 
> HADOOP-14872.009.patch, HADOOP-14872.010.patch, HADOOP-14872.011.patch, 
> HADOOP-14872.012.patch, HADOOP-14872.013.patch
>
>
> Discovered in IMPALA-5909.
> Opening an encrypted HDFS file returns a chain of wrapped input streams:
> {noformat}
> HdfsDataInputStream
>   CryptoInputStream
> DFSInputStream
> {noformat}
> If an application such as Impala or HBase calls HdfsDataInputStream#unbuffer, 
> FSDataInputStream#unbuffer will be called:
> {code:java}
> try {
>   ((CanUnbuffer)in).unbuffer();
> } catch (ClassCastException e) {
>   throw new UnsupportedOperationException("this stream does not " +
>   "support unbuffering.");
> }
> {code}
> If the {{in}} class does not implement CanUnbuffer, UOE will be thrown. If 
> the application is not careful, tons of UOEs will show up in logs.
> In comparison, opening an non-encrypted HDFS file returns this chain:
> {noformat}
> HdfsDataInputStream
>   DFSInputStream
> {noformat}
> DFSInputStream implements CanUnbuffer.
> It is good for CryptoInputStream to implement CanUnbuffer for 2 reasons:
> * Release buffer, cache, or any other resource when instructed
> * Able to call its wrapped DFSInputStream unbuffer
> * Avoid the UOE described above. Applications may not handle the UOE very 
> well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14987) Improve KMSClientProvider log around delegation token checking

2017-11-02 Thread Xiaoyu Yao (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236609#comment-16236609
 ] 

Xiaoyu Yao commented on HADOOP-14987:
-

Thanks [~xiaochen] for the update. Change looks good to me. Now that we both 
touch the patch, need to find other reviewers to +1 it. 

> Improve KMSClientProvider log around delegation token checking
> --
>
> Key: HADOOP-14987
> URL: https://issues.apache.org/jira/browse/HADOOP-14987
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 2.7.3
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
> Attachments: HADOOP-14987.001.patch, HADOOP-14987.002.patch, 
> HADOOP-14987.003.patch, HADOOP-14987.004.patch, HADOOP-14987.005.patch
>
>
> KMSClientProvider#containsKmsDt uses SecurityUtil.buildTokenService(addr) to 
> build the key to look for KMS-DT from the UGI's token map. The token lookup 
> key here varies depending  on the KMSClientProvider's configuration value for 
> hadoop.security.token.service.use_ip. In certain cases, the token obtained 
> with non-matching hadoop.security.token.service.use_ip setting will not be 
> recognized by KMSClientProvider. This ticket is opened to improve logs for 
> troubleshooting KMS delegation token related issues like this.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15011) Getting file not found exception while using distcp with s3a

2017-11-02 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236593#comment-16236593
 ] 

Steve Loughran commented on HADOOP-15011:
-

This is consistency, but not one you need s3guard. Looks more like 
HADOOP-13145; the stack is exactly the same as HADOOP-11487. Closing as a 
duplicate of those.

This was fixed a while back. What version of CDH are you using?

* Hadoop 2.8 or the recent HDP and CDH releases have the higher performance 
upload
* for config : 
[https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.2/bk_cloud-data-access/content/using-distcp.html]

make sure you aren't trying to use: --atomic  or any of the -poptions.

bq. I'm not seeing the throughput of 3gbps 

I'd be surprised if s3 gave you that. Anyway, it's a "maximum per node", not 
any guarantee of actual B/W.

Are you trying to write to S3 from a physical cluster, or inside EC2 itself. 

250 GB in 1h30 is 800 KB/s; 6-7 MBits. For a long-haul link, well, its 
conceivable that is the bandwidth. For in-EC2, its pretty bad.

It does a lot of throttling for writes to specific buckets and paths in it. You 
may find you get better performance by actually cranking back on how aggressive 
the bandwidth per node is, reducing the # of mappers. Try cutting it in half 
and seeing what happens. Then do it again.

bq.  WIth fast upload option, I'm writing the files to S3 using threads. Could 
you please help me in providing some tuning option for this.

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.2/bk_cloud-data-access/content/s3a-fast-upload.html

If want to benchmark your upload speed better, download and run 
https://github.com/steveloughran/cloudup ; for a bulk upload of local data. 
This isolates all network traffic for the upload, prioritises large files 
first, and shuffles the filenames to reduce throttling at the back end. Your 
bandwidth per node will not be > that

> Getting file not found exception while using distcp with s3a
> 
>
> Key: HADOOP-15011
> URL: https://issues.apache.org/jira/browse/HADOOP-15011
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Reporter: Logesh Rangan
>
> I'm using the distcp option to copy the huge files from Hadoop to S3. 
> Sometimes i'm getting the below error,
> *Command:* (Copying 378 GB data)
> _hadoop distcp -D HADOOP_OPTS=-Xmx12g -D HADOOP_CLIENT_OPTS='-Xmx12g 
> -XX:+UseParNewGC -XX:+UseConcMarkSweepGC 
> -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled' -D 
> 'mapreduce.map.memory.mb=12288' -D 'mapreduce.map.java.opts=-Xmx10g' -D 
> 'mapreduce.reduce.memory.mb=12288' -D 'mapreduce.reduce.java.opts=-Xmx10g' 
> '-Dfs.s3a.proxy.host=edhmgrn-prod.cloud.capitalone.com' 
> '-Dfs.s3a.proxy.port=8088' '-Dfs.s3a.access.key=XXX' 
> '-Dfs.s3a.secret.key=XXX' '-Dfs.s3a.connection.timeout=18' 
> '-Dfs.s3a.attempts.maximum=5' '-Dfs.s3a.fast.upload=true' 
> '-Dfs.s3a.fast.upload.buffer=array' '-Dfs.s3a.fast.upload.active.blocks=50' 
> '-Dfs.s3a.multipart.size=262144000' '-Dfs.s3a.threads.max=500' 
> '-Dfs.s3a.threads.keepalivetime=600' 
> '-Dfs.s3a.server-side-encryption-algorithm=AES256' -bandwidth 3072 -strategy 
> dynamic -m 220 -numListstatusThreads 30 /src/ s3a://bucket/dest
> _
> 17/11/01 12:23:27 INFO mapreduce.Job: Task Id : 
> attempt_1497120915913_2792335_m_000165_0, Status : FAILED
> Error: java.io.FileNotFoundException: No such file or directory: 
> s3a://bucketname/filename
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1132)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:78)
> at 
> org.apache.hadoop.tools.util.DistCpUtils.preserve(DistCpUtils.java:197)
> at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:256)
> at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:50)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1912)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> 17/11/01 12:28:32 INFO mapreduce.Job: Task Id : 
> attempt_1497120915913_2792335_m_10_0, Status : FAILED
> Error: java.io.IOException: File copy failed: hdfs://nameservice1/filena --> 
> s3a://cof-prod-lake-card/src/seam/acct_scores/acctmdlscore_card_cobna_anon_vldtd/instnc_id=2016102300/04_0_copy_6
>

[jira] [Resolved] (HADOOP-15011) Getting file not found exception while using distcp with s3a

2017-11-02 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-15011.
-
Resolution: Duplicate

> Getting file not found exception while using distcp with s3a
> 
>
> Key: HADOOP-15011
> URL: https://issues.apache.org/jira/browse/HADOOP-15011
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Reporter: Logesh Rangan
>
> I'm using the distcp option to copy the huge files from Hadoop to S3. 
> Sometimes i'm getting the below error,
> *Command:* (Copying 378 GB data)
> _hadoop distcp -D HADOOP_OPTS=-Xmx12g -D HADOOP_CLIENT_OPTS='-Xmx12g 
> -XX:+UseParNewGC -XX:+UseConcMarkSweepGC 
> -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled' -D 
> 'mapreduce.map.memory.mb=12288' -D 'mapreduce.map.java.opts=-Xmx10g' -D 
> 'mapreduce.reduce.memory.mb=12288' -D 'mapreduce.reduce.java.opts=-Xmx10g' 
> '-Dfs.s3a.proxy.host=edhmgrn-prod.cloud.capitalone.com' 
> '-Dfs.s3a.proxy.port=8088' '-Dfs.s3a.access.key=XXX' 
> '-Dfs.s3a.secret.key=XXX' '-Dfs.s3a.connection.timeout=18' 
> '-Dfs.s3a.attempts.maximum=5' '-Dfs.s3a.fast.upload=true' 
> '-Dfs.s3a.fast.upload.buffer=array' '-Dfs.s3a.fast.upload.active.blocks=50' 
> '-Dfs.s3a.multipart.size=262144000' '-Dfs.s3a.threads.max=500' 
> '-Dfs.s3a.threads.keepalivetime=600' 
> '-Dfs.s3a.server-side-encryption-algorithm=AES256' -bandwidth 3072 -strategy 
> dynamic -m 220 -numListstatusThreads 30 /src/ s3a://bucket/dest
> _
> 17/11/01 12:23:27 INFO mapreduce.Job: Task Id : 
> attempt_1497120915913_2792335_m_000165_0, Status : FAILED
> Error: java.io.FileNotFoundException: No such file or directory: 
> s3a://bucketname/filename
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1132)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:78)
> at 
> org.apache.hadoop.tools.util.DistCpUtils.preserve(DistCpUtils.java:197)
> at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:256)
> at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:50)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1912)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> 17/11/01 12:28:32 INFO mapreduce.Job: Task Id : 
> attempt_1497120915913_2792335_m_10_0, Status : FAILED
> Error: java.io.IOException: File copy failed: hdfs://nameservice1/filena --> 
> s3a://cof-prod-lake-card/src/seam/acct_scores/acctmdlscore_card_cobna_anon_vldtd/instnc_id=2016102300/04_0_copy_6
> at 
> org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:284)
> at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:252)
> at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:50)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1912)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.io.IOException: Couldn't run retriable-command: Copying 
> hdfs://nameservice1/filename to s3a://bucketname/filename
> at 
> org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:101)
> at 
> org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:280)
> ... 10 more
> Caused by: com.cloudera.com.amazonaws.AmazonClientException: Failed to parse 
> XML document with handler class 
> com.cloudera.com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler
> at 
> com.cloudera.com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.parseXmlInputStream(XmlResponsesSaxParser.java:164)
> at 
> com.cloudera.com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.parseListBucketObjectsResponse(XmlResponsesSaxParser.java:299)
> at 
> 

[jira] [Created] (HADOOP-15012) Enhance StreamCapabilities with READAHEAD, DROPBEHIND, and UNBUFFER

2017-11-02 Thread John Zhuge (JIRA)
John Zhuge created HADOOP-15012:
---

 Summary: Enhance StreamCapabilities with READAHEAD, DROPBEHIND, 
and UNBUFFER
 Key: HADOOP-15012
 URL: https://issues.apache.org/jira/browse/HADOOP-15012
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs
Affects Versions: 2.9.0
Reporter: John Zhuge
Priority: Major


A split from HADOOP-14872 to track changes that enhance StreamCapabilities 
class with READAHEAD, DROPBEHIND, and UNBUFFER capability.

Discussions and code reviews are done in HADOOP-14872.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14971) Merge S3A committers into trunk

2017-11-02 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236555#comment-16236555
 ] 

Steve Loughran commented on HADOOP-14971:
-

oh, there's a config setting there to turn off the checks. At least there ought 
to be. Always been a problem with HDFS & min free space. 

> Merge S3A committers into trunk
> ---
>
> Key: HADOOP-14971
> URL: https://issues.apache.org/jira/browse/HADOOP-14971
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-13786-040.patch, HADOOP-13786-041.patch
>
>
> Merge the HADOOP-13786 committer into trunk. This branch is being set up as a 
> github PR for review there & to keep it out the mailboxes of the watchers on 
> the main JIRA



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Moved] (HADOOP-15011) Getting file not found exception while using distcp with s3a

2017-11-02 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran moved HDFS-12753 to HADOOP-15011:


Component/s: (was: fs/s3)
 fs/s3
Key: HADOOP-15011  (was: HDFS-12753)
Project: Hadoop Common  (was: Hadoop HDFS)

> Getting file not found exception while using distcp with s3a
> 
>
> Key: HADOOP-15011
> URL: https://issues.apache.org/jira/browse/HADOOP-15011
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Reporter: Logesh Rangan
>
> I'm using the distcp option to copy the huge files from Hadoop to S3. 
> Sometimes i'm getting the below error,
> *Command:* (Copying 378 GB data)
> _hadoop distcp -D HADOOP_OPTS=-Xmx12g -D HADOOP_CLIENT_OPTS='-Xmx12g 
> -XX:+UseParNewGC -XX:+UseConcMarkSweepGC 
> -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled' -D 
> 'mapreduce.map.memory.mb=12288' -D 'mapreduce.map.java.opts=-Xmx10g' -D 
> 'mapreduce.reduce.memory.mb=12288' -D 'mapreduce.reduce.java.opts=-Xmx10g' 
> '-Dfs.s3a.proxy.host=edhmgrn-prod.cloud.capitalone.com' 
> '-Dfs.s3a.proxy.port=8088' '-Dfs.s3a.access.key=XXX' 
> '-Dfs.s3a.secret.key=XXX' '-Dfs.s3a.connection.timeout=18' 
> '-Dfs.s3a.attempts.maximum=5' '-Dfs.s3a.fast.upload=true' 
> '-Dfs.s3a.fast.upload.buffer=array' '-Dfs.s3a.fast.upload.active.blocks=50' 
> '-Dfs.s3a.multipart.size=262144000' '-Dfs.s3a.threads.max=500' 
> '-Dfs.s3a.threads.keepalivetime=600' 
> '-Dfs.s3a.server-side-encryption-algorithm=AES256' -bandwidth 3072 -strategy 
> dynamic -m 220 -numListstatusThreads 30 /src/ s3a://bucket/dest
> _
> 17/11/01 12:23:27 INFO mapreduce.Job: Task Id : 
> attempt_1497120915913_2792335_m_000165_0, Status : FAILED
> Error: java.io.FileNotFoundException: No such file or directory: 
> s3a://bucketname/filename
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1132)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:78)
> at 
> org.apache.hadoop.tools.util.DistCpUtils.preserve(DistCpUtils.java:197)
> at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:256)
> at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:50)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1912)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> 17/11/01 12:28:32 INFO mapreduce.Job: Task Id : 
> attempt_1497120915913_2792335_m_10_0, Status : FAILED
> Error: java.io.IOException: File copy failed: hdfs://nameservice1/filena --> 
> s3a://cof-prod-lake-card/src/seam/acct_scores/acctmdlscore_card_cobna_anon_vldtd/instnc_id=2016102300/04_0_copy_6
> at 
> org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:284)
> at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:252)
> at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:50)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1912)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.io.IOException: Couldn't run retriable-command: Copying 
> hdfs://nameservice1/filename to s3a://bucketname/filename
> at 
> org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:101)
> at 
> org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:280)
> ... 10 more
> Caused by: com.cloudera.com.amazonaws.AmazonClientException: Failed to parse 
> XML document with handler class 
> com.cloudera.com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler
> at 
> com.cloudera.com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.parseXmlInputStream(XmlResponsesSaxParser.java:164)
> at 
> 

[jira] [Created] (HADOOP-15010) hadoop-resourceestimator's assembly buries it

2017-11-02 Thread Allen Wittenauer (JIRA)
Allen Wittenauer created HADOOP-15010:
-

 Summary: hadoop-resourceestimator's assembly buries it
 Key: HADOOP-15010
 URL: https://issues.apache.org/jira/browse/HADOOP-15010
 Project: Hadoop Common
  Issue Type: Bug
  Components: build, tools
Affects Versions: 2.9.0, 3.1.0
Reporter: Allen Wittenauer
Priority: Blocker


There's zero reason for this layout:

{code}
hadoop-3.1.0-SNAPSHOT/share/hadoop/tools/resourceestimator
 - bin
 - conf
 - data
{code}

Buried that far back, it might as well not exist.

Propose:

a) HADOOP-15009 to eliminate bin
b) Move conf file into etc/hadoop
c) keep data where it's at



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15009) hadoop-resourceestimator's shell scripts are a mess

2017-11-02 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HADOOP-15009:
--
Description: 
#1:

There's no reason for estimator.sh to exist.  Just make it a subcommand under 
yarn or whatever.  

#2:

In it's current form, it's missing a BUNCH of boilerplate that makes certain 
functionality completely fail.

#3

start/stop-estimator.sh is full of copypasta that doesn't actually do 
anything/work correctly.  Additionally, if estimator.sh doesn't exist, neither 
does this since yarn --daemon start/stop will do everything as necessary.  


  was:
#1:

There's no reason for estimator.sh to exist.  Just make it a subcommand under 
yarn or whatever.  

#2:

It it's current form, it's missing a BUNCH of boilerplate that makes certain 
functionality completely fail.

#3

start/stop-estimator.sh is full of copypasta that doesn't actually do 
anything/work correctly.  Additionally, if estimator.sh doesn't exist, neither 
does this since yarn --daemon start/stop will do everything as necessary.  



> hadoop-resourceestimator's shell scripts are a mess
> ---
>
> Key: HADOOP-15009
> URL: https://issues.apache.org/jira/browse/HADOOP-15009
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: scripts, tools
>Affects Versions: 3.1.0
>Reporter: Allen Wittenauer
>Priority: Blocker
>
> #1:
> There's no reason for estimator.sh to exist.  Just make it a subcommand under 
> yarn or whatever.  
> #2:
> In it's current form, it's missing a BUNCH of boilerplate that makes certain 
> functionality completely fail.
> #3
> start/stop-estimator.sh is full of copypasta that doesn't actually do 
> anything/work correctly.  Additionally, if estimator.sh doesn't exist, 
> neither does this since yarn --daemon start/stop will do everything as 
> necessary.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15009) hadoop-resourceestimator's shell scripts are a mess

2017-11-02 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HADOOP-15009:
--
Component/s: tools

> hadoop-resourceestimator's shell scripts are a mess
> ---
>
> Key: HADOOP-15009
> URL: https://issues.apache.org/jira/browse/HADOOP-15009
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: scripts, tools
>Affects Versions: 3.1.0
>Reporter: Allen Wittenauer
>Priority: Critical
>
> #1:
> There's no reason for estimator.sh to exist.  Just make it a subcommand under 
> yarn or whatever.  
> #2:
> It it's current form, it's missing a BUNCH of boilerplate that makes certain 
> functionality completely fail.
> #3
> start/stop-estimator.sh is full of copypasta that doesn't actually do 
> anything/work correctly.  Additionally, if estimator.sh doesn't exist, 
> neither does this since yarn --daemon start/stop will do everything as 
> necessary.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15009) hadoop-resourceestimator's shell scripts are a mess

2017-11-02 Thread Allen Wittenauer (JIRA)
Allen Wittenauer created HADOOP-15009:
-

 Summary: hadoop-resourceestimator's shell scripts are a mess
 Key: HADOOP-15009
 URL: https://issues.apache.org/jira/browse/HADOOP-15009
 Project: Hadoop Common
  Issue Type: Bug
  Components: scripts
Affects Versions: 3.1.0
Reporter: Allen Wittenauer
Priority: Critical


#1:

There's no reason for estimator.sh to exist.  Just make it a subcommand under 
yarn or whatever.  

#2:

It it's current form, it's missing a BUNCH of boilerplate that makes certain 
functionality completely fail.

#3

start/stop-estimator.sh is full of copypasta that doesn't actually do 
anything/work correctly.  Additionally, if estimator.sh doesn't exist, neither 
does this since yarn --daemon start/stop will do everything as necessary.  




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15009) hadoop-resourceestimator's shell scripts are a mess

2017-11-02 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HADOOP-15009:
--
Priority: Blocker  (was: Critical)

> hadoop-resourceestimator's shell scripts are a mess
> ---
>
> Key: HADOOP-15009
> URL: https://issues.apache.org/jira/browse/HADOOP-15009
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: scripts, tools
>Affects Versions: 3.1.0
>Reporter: Allen Wittenauer
>Priority: Blocker
>
> #1:
> There's no reason for estimator.sh to exist.  Just make it a subcommand under 
> yarn or whatever.  
> #2:
> It it's current form, it's missing a BUNCH of boilerplate that makes certain 
> functionality completely fail.
> #3
> start/stop-estimator.sh is full of copypasta that doesn't actually do 
> anything/work correctly.  Additionally, if estimator.sh doesn't exist, 
> neither does this since yarn --daemon start/stop will do everything as 
> necessary.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14576) s3guard DynamoDB resource not found: tables not ACTIVE state after initial connection

2017-11-02 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236495#comment-16236495
 ] 

Steve Loughran commented on HADOOP-14576:
-

We should recognise these DDB exceptions and treat as recoverable, at lest for 
a while. Interesting that they are HTTP 400, not 404

> s3guard DynamoDB resource not found: tables not ACTIVE state after initial 
> connection
> -
>
> Key: HADOOP-14576
> URL: https://issues.apache.org/jira/browse/HADOOP-14576
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Sean Mackrory
>Priority: Major
>
> We currently only anticipate tables not being in the ACTIVE state when first 
> connecting. It is possible for a table to be in the ACTIVE state and move to 
> an UPDATING state during partitioning events. Attempts to read or write 
> during that time will result in an AmazonServerException getting thrown. We 
> should try to handle that better...



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14576) s3guard DynamoDB resource not found: tables not ACTIVE state after initial connection

2017-11-02 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236463#comment-16236463
 ] 

Aaron Fabbri commented on HADOOP-14576:
---

I've seen similar issues intermittently just running S3A integration tests.  I 
will paste a fresh stack trace next time I hit it.  For now, sharing another 
stack trace our Hive folks provided a while back.

Caused by: org.apache.hadoop.fs.s3a.AWSServiceIOException: get on 
s3a:///1000_unpartitioned_parquet_cdh_ip-10-0-0-158/catalog_sales/.hive-staging_hive_2017-06-21_00-27-07_895_5004400669443203568-24/_tmp.-ext-1/000530_0:
 com.amazonaws.services.dynamodbv2.model.ResourceNotFoundException: Requested 
resource not found (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: 
ResourceNotFoundException; Request ID: 
L93OFC0JOT2N6FT45BIPRQVP1FVV4KQNSO5AEMVJF66Q9ASUAAJG): Requested resource not 
found (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: 
ResourceNotFoundException; Request ID: 
L93OFC0JOT2N6FT45BIPRQVP1FVV4KQNSO5AEMVJF66Q9ASUAAJG)
at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:178)
at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:101)
at 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.get(DynamoDBMetadataStore.java:395)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:1775)
at org.apache.hadoop.fs.s3a.S3AFileSystem.innerRename(S3AFileSystem.java:776)
at org.apache.hadoop.fs.s3a.S3AFileSystem.rename(S3AFileSystem.java:734)
at 
org.apache.hadoop.hive.ql.util.ParallelDirectoryRenamer$1.call(ParallelDirectoryRenamer.java:105)
at 
org.apache.hadoop.hive.ql.util.ParallelDirectoryRenamer$1.call(ParallelDirectoryRenamer.java:101)
... 4 more
Caused by: com.amazonaws.services.dynamodbv2.model.ResourceNotFoundException: 
Requested resource not found (Service: AmazonDynamoDBv2; Status Code: 400; 
Error Code: ResourceNotFoundException; Request ID: L93OF...snip...)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1588)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1258)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1030)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:742)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:716)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)


> s3guard DynamoDB resource not found: tables not ACTIVE state after initial 
> connection
> -
>
> Key: HADOOP-14576
> URL: https://issues.apache.org/jira/browse/HADOOP-14576
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Sean Mackrory
>Priority: Major
>
> We currently only anticipate tables not being in the ACTIVE state when first 
> connecting. It is possible for a table to be in the ACTIVE state and move to 
> an UPDATING state during partitioning events. Attempts to read or write 
> during that time will result in an AmazonServerException getting thrown. We 
> should try to handle that better...



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14576) s3guard DynamoDB resource not found: tables not ACTIVE state after initial connection

2017-11-02 Thread Aaron Fabbri (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Fabbri updated HADOOP-14576:
--
Summary: s3guard DynamoDB resource not found: tables not ACTIVE state after 
initial connection  (was: DynamoDB tables may leave ACTIVE state after initial 
connection)

> s3guard DynamoDB resource not found: tables not ACTIVE state after initial 
> connection
> -
>
> Key: HADOOP-14576
> URL: https://issues.apache.org/jira/browse/HADOOP-14576
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Sean Mackrory
>Priority: Major
>
> We currently only anticipate tables not being in the ACTIVE state when first 
> connecting. It is possible for a table to be in the ACTIVE state and move to 
> an UPDATING state during partitioning events. Attempts to read or write 
> during that time will result in an AmazonServerException getting thrown. We 
> should try to handle that better...



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13430) Optimize getFileStatus in S3A

2017-11-02 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236409#comment-16236409
 ] 

Steve Loughran commented on HADOOP-13430:
-

Interesting thought that we could actually override isDirectory(path) and 
isFile(path) and so actually fail fast if the conditions not met. 

isDirectory: do a LIST path first, then HEAD path +"/". 
isFile(), HEAD path & return true/map FNFE to false without bothering with the 
two other checks

> Optimize getFileStatus in S3A
> -
>
> Key: HADOOP-13430
> URL: https://issues.apache.org/jira/browse/HADOOP-13430
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0
>Reporter: Steven K. Wong
>Assignee: Steven K. Wong
>Priority: Minor
> Attachments: HADOOP-13430.001.WIP.patch
>
>
> Currently, S3AFileSystem.getFileStatus(Path f) sends up to 3 requests to S3 
> when pathToKey(f) = key = "foo/bar" is a directory:
> 1. HEAD key=foo/bar \[continue if not found]
> 2. HEAD key=foo/bar/ \[continue if not found]
> 3. LIST prefix=foo/bar/ delimiter=/ max-keys=1
> My experience (and generally true, I reckon) is that almost all directories 
> are nonempty directories without a "fake directory" file (e.g. "foo/bar/"). 
> Under this condition, request #2 is mostly unhelpful; it only slows down 
> getFileStatus. Therefore, I propose swapping the order of requests #2 and #3. 
> The swapped HEAD request will be skipped in practically all cases.
> Furthermore, when key = "foo/bar" is a nonempty directory that contains a 
> "fake directory" file (in addition to actual files), getFileStatus currently 
> returns an S3AFileStatus with isEmptyDirectory=true, which is wrong. Swapping 
> will fix this. The swapped LIST request will use max-keys=2 to determine 
> isEmptyDirectory correctly. (Removing the delimiter from the LIST request 
> should make the logic a little simpler than otherwise.)
> Note that key = "foo/bar/" has the same problem with isEmptyDirectory. To fix 
> it, I propose skipping request #1 when key ends with "/". The price is this 
> will, for an empty directory, replace a HEAD request with a LIST request 
> that's generally more taxing on S3.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14989) metrics2 JMX cache refresh result in inconsistent Mutable(Stat|Rate) values

2017-11-02 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236260#comment-16236260
 ] 

Eric Yang commented on HADOOP-14989:


Hi [~xkrogen] Your observation is correct.  However, {{MetricsSourceAdapter}} 
can not call {{updateJmxCache}} at end of {{getMetrics}}.  It will just 
deadlock because {{updateJmxCache}} calls {{getMetrics}}.  
{{MetricsSystemImpl}} was not used by JMX to avoid a dead lock in the timer 
thread where  {{MetricsSourceAdapter}} lock and is trying to grab the 
{{MetricsSystemImpl}} lock. The locking order isn't consistent in the "push and 
pull" part of {{MetricsSourceAdapter}} so it can deadlocked.

In your second suggestion, store the return value of {{getMetrics}} and use 
that to populate jmx cache, this is the correct logic, in a push vs pull 
system.  We need to be careful in the synchronization of cache value to MBean 
or it can cause mbean to fail with null value.  HADOOP-11361 has some of the 
background information of how the system arrived at the current state.  There 
is a new ReentrantLock utility in Java 7 which might help to reduce the 
deadlock in publishing metrics and retrieved cache by JMX.  This might be one 
way to solve the race condition and produce more accurate data for JMX.  
HADOOP-12594 had an attempt in removing the deadlock, and it might be useful 
background information on how to solve this the proper way.

> metrics2 JMX cache refresh result in inconsistent Mutable(Stat|Rate) values
> ---
>
> Key: HADOOP-14989
> URL: https://issues.apache.org/jira/browse/HADOOP-14989
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: metrics
>Affects Versions: 2.6.5
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Critical
> Attachments: HADOOP-14989.test.patch
>
>
> While doing some digging in the metrics2 system recently, we noticed that the 
> way {{MutableStat}} values are collected (and thus {{MutableRate}}, since it 
> is based off of {{MutableStat}}) mean that every time the value is 
> snapshotted, all previous information is lost. So every time a JMX cache 
> refresh occurs, it resets the {{MutableStat}}, meaning that all configured 
> metrics sinks do not consider the previous statistics in their emitted 
> values. The same behavior is true if you configured multiple sink periods.
> {{MutableStat}}, to compute its average value, maintains a total value since 
> last snapshot, as well as operation count since last snapshot. Upon 
> snapshotting, the average is calculated as (total / opCount) and placed into 
> a gauge metric, and total / operation count are cleared. So the average value 
> represents the average since the last snapshot. If we have only a single sink 
> period ever snapshotting, this would result in the expected behavior that the 
> value is the average over the reporting period. However, if multiple sink 
> periods are configured, or if the JMX cache is refreshed, this is another 
> snapshot operation. So, for example, if you have a FileSink configured at a 
> 60 second interval and your JMX cache refreshes itself 1 second before the 
> FileSink period fires, the values emitted to your FileSink only represent 
> averages _over the last one second_.
> A few ways to solve this issue:
> * Make {{MutableRate}} manage its own average refresh, similar to 
> {{MutableQuantiles}}, which has a refresh thread and saves a snapshot of the 
> last quantile values that it will serve up until the next refresh. Given how 
> many {{MutableRate}} metrics there are, a thread per metric is not really 
> feasible, but could be done on e.g. a per-source basis. This has some 
> downsides: if multiple sinks are configured with different periods, what is 
> the right refresh period for the {{MutableRate}}? 
> * Make {{MutableRate}} emit two counters, one for total and one for operation 
> count, rather than an average gauge and an operation count counter. The 
> average could then be calculated downstream from this information. This is 
> cumbersome for operators and not backwards compatible. To improve on both of 
> those downsides, we could have it keep the current behavior but 
> _additionally_ emit the total as a counter. The snapshotted average is 
> probably sufficient in the common case (we've been using it for years), and 
> when more guaranteed accuracy is required, the average could be derived from 
> the total and operation count.
> The two above suggestions will fix this for both JMX and multiple sink 
> periods, but may be overkill. Multiple sink periods are probably not 
> necessary though we should at least document the behavior.
> Open to suggestions & input here.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HADOOP-15000) s3a new getdefaultblocksize be called in getFileStatus which has not been implemented in s3afilesystem yet

2017-11-02 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15000:

Issue Type: Sub-task  (was: Improvement)
Parent: HADOOP-14831

> s3a new getdefaultblocksize be called in getFileStatus which has not been 
> implemented in s3afilesystem yet
> --
>
> Key: HADOOP-15000
> URL: https://issues.apache.org/jira/browse/HADOOP-15000
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.9.0
>Reporter: Yonger
>Priority: Minor
>
> new implementation of getting block size has been called in getFileStatus 
> method: 
> {code:java}
>   return new S3AFileStatus(meta.getContentLength(),
>   dateToLong(meta.getLastModified()),
>   path,
>   getDefaultBlockSize(path),
>   username);
> }
> {code}
> while we don't implement it in our s3afilesystem currently, also we need to 
> implement this new method as the old one deprecated.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14937) initial part uploads seem to block unnecessarily in S3ABlockOutputStream

2017-11-02 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-14937:

Issue Type: Sub-task  (was: Bug)
Parent: HADOOP-14831

> initial part uploads seem to block unnecessarily in S3ABlockOutputStream
> 
>
> Key: HADOOP-14937
> URL: https://issues.apache.org/jira/browse/HADOOP-14937
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Steven Rand
>Assignee: Steven Rand
>Priority: Major
> Attachments: yjp_threads.png
>
>
> From looking at a YourKit snapshot of an FsShell process running a {{hadoop 
> fs -put file:///... s3a://...}}, it seems that the first part in the 
> multipart upload doesn't begin to upload until n of the 
> {{s3a-transfer-shared-pool}} threads are able to start uploading, where n is 
> the value of {{fs.s3a.fast.upload.active.blocks}}.
> To hopefully clarify a bit, the series of events that I expected to see with 
> {{fs.s3a.fast.upload.active.blocks}} set to 4 is:
> 1.  An amount of data equal to {{fs.s3a.multipart.size}} is buffered into 
> off-heap memory (I have {{fs.s3a.fast.upload.buffer = bytebuffer}}).
> 2. As soon as that happens, a thread begins to upload that part. Meanwhile, 
> the main thread continues to buffer data into off-heap memory.
> 3. Once another part has been buffered into off-heap memory, a separate 
> thread uploads that part, and so on.
> Whereas what I think the YK snapshot shows happening is:
> 1. An amount of data equal to {{fs.s3a.multipart.size}} * 4 is buffered into 
> off-heap memory.
> 2. Four threads start to upload one part each at the same time.
> I've attached a picture of the "Threads" tab to show what I mean. Basically 
> the times at which the first four {{s3a-transfer-shared-pool}} threads start 
> to upload are roughly the same, whereas I would've expected them to be more 
> staggered.
> I'm actually not sure whether this is the expected behavior or not, so feel 
> free to close if this doesn't come as a surprise to anyone.
> For some context, I've been trying to get a sense for roughly which values of 
> {{fs.s3a.multipart.size}} perform the best at different file sizes. One thing 
> that I found confusing is that a part size of 5 MB seems to outperform a part 
> size of 64 MB up until files that are upwards of about 500 MB in size. This 
> seems odd, since each {{uploadPart}} call is its own HTTP request, and I 
> would've expected the overhead of those to become costly at small part sizes. 
> My suspicion is that with 4 concurrent part uploads and 64 MB blocks, we have 
> to wait until 256 MB are buffered before we can start uploading, while with 5 
> MB blocks we can start uploading as soon as we buffer 20 MB, and that's what 
> gives the smaller parts the advantage for smaller files.
> I'm happy to submit a patch if this is in fact a problem, but wanted to check 
> to make sure I'm not just misunderstanding something.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14929) Cleanup usage of decodecomponent and use QueryStringDecoder from netty

2017-11-02 Thread Bharat Viswanadham (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236149#comment-16236149
 ] 

Bharat Viswanadham commented on HADOOP-14929:
-

Testcase failure is not related to this patch.

> Cleanup usage of decodecomponent and use QueryStringDecoder from netty
> --
>
> Key: HADOOP-14929
> URL: https://issues.apache.org/jira/browse/HADOOP-14929
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Bharat Viswanadham
>Assignee: Bharat Viswanadham
>Priority: Major
> Attachments: HADOOP-14929.00.patch, HADOOP-14929.01.patch, 
> HADOOP-14929.02.patch, HADOOP-14929.03.patch
>
>
> This is from the review of HADOOP-14910
> There is also other place usage of 
> decodeComponent(param(CreateFlagParam.NAME), StandardCharsets.UTF_8);
> In ParameterParser.java Line 147-148:
> String cf = decodeComponent(param(CreateFlagParam.NAME), 
> StandardCharsets.UTF_8);
> Use QueryStringDecoder from netty here too and cleanup the decodeComponent. 
> Actually this is added for netty issue only.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14998) Make AuthenticationFilter @Public

2017-11-02 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236048#comment-16236048
 ] 

Robert Kanter commented on HADOOP-14998:


{quote}Risk here is that it's getting close to jetty & kerberos, where changes 
over versions can be observable.{quote}
That's a valid thing to be concerned about.  However, we didn't have to change 
anything in {{AuthenticationFilter}} when moving from Jetty 6 to 9 
(HADOOP-10075).  The stability here is probably due to the Java interface, 
{{javax.servlet.Filter}}.  At the least, we could make the 
{{javax.servlet.Filter}} methods public and leave some of the custom methods 
private or unstable.

> Make AuthenticationFilter @Public
> -
>
> Key: HADOOP-14998
> URL: https://issues.apache.org/jira/browse/HADOOP-14998
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: 3.0.0
>Reporter: Robert Kanter
>Assignee: Bharat Viswanadham
>Priority: Major
>
> {{org.apache.hadoop.security.authentication.server.AuthenticationFilter}} is 
> currently marked as {{\@Private}} and {{\@Unstable}}.  
> {code:java}
> @InterfaceAudience.Private
> @InterfaceStability.Unstable
> public class AuthenticationFilter implements Filter {
> {code}
> However, many other projects (e.g. Oozie, Hive, Solr, HBase, etc) have been 
> using it for quite some time without having any compatibility issues AFAIK.  
> It doesn't seem to have had any breaking changes in quite some time.  On top 
> of that, it implements {{javax.servlet.Filter}}, so it can't change too 
> widely anyway.  {{AuthenticationFilter}} provides a lot of useful code for 
> dealing with tokens, Kerberos, etc, and we should encourage related projects 
> to re-use this code instead of rolling their own.
> I propose we change it to {{\@Public}} and {{\@Evolving}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14161) Failed to rename file in S3A during FileOutputFormat commitTask

2017-11-02 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235988#comment-16235988
 ] 

Steve Loughran commented on HADOOP-14161:
-

Right now, if you are using Spark, use https://github.com/rdblue/s3committer/ ; 
this will be in Hadoop 3.1

> Failed to rename file in S3A during FileOutputFormat commitTask
> ---
>
> Key: HADOOP-14161
> URL: https://issues.apache.org/jira/browse/HADOOP-14161
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.7.0, 2.7.1, 2.7.2, 2.7.3
> Environment: spark 2.0.2 with mesos
> hadoop 2.7.2
>Reporter: Luke Miner
>Priority: Minor
>
> I'm getting non deterministic rename errors while writing to S3 using spark 
> and hadoop. The proper permissions are set and this only happens 
> occasionally. It can happen on a job that is as simple as reading in json, 
> repartitioning and then writing out. After this failure occurs, the overall 
> job hangs indefinitely.
> {code}
> org.apache.spark.SparkException: Task failed while writing rows
> at 
> org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:261)
> at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
> at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
> at org.apache.spark.scheduler.Task.run(Task.scala:86)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: Failed to commit task
> at 
> org.apache.spark.sql.execution.datasources.DefaultWriterContainer.org$apache$spark$sql$execution$datasources$DefaultWriterContainer$$commitTask$1(WriterContainer.scala:275)
> at 
> org.apache.spark.sql.execution.datasources.DefaultWriterContainer$$anonfun$writeRows$1.apply$mcV$sp(WriterContainer.scala:257)
> at 
> org.apache.spark.sql.execution.datasources.DefaultWriterContainer$$anonfun$writeRows$1.apply(WriterContainer.scala:252)
> at 
> org.apache.spark.sql.execution.datasources.DefaultWriterContainer$$anonfun$writeRows$1.apply(WriterContainer.scala:252)
> at 
> org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1348)
> at 
> org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:258)
> ... 8 more
> Caused by: java.io.IOException: Failed to rename 
> S3AFileStatus{path=s3a://foo/_temporary/0/_temporary/attempt_201703081855_0018_m_000966_0/part-r-00966-615ed714-58c1-4b89-be56-e47966737c75.snappy.parquet;
>  isDirectory=false; length=111225342; replication=1; blocksize=33554432; 
> modification_time=1488999342000; access_time=0; owner=; group=; 
> permission=rw-rw-rw-; isSymlink=false} to 
> s3a://foo/part-r-00966-615ed714-58c1-4b89-be56-e47966737c75.snappy.parquet
> at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:415)
> at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:428)
> at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:539)
> at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:502)
> at 
> org.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:50)
> at 
> org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:76)
> at 
> org.apache.spark.sql.execution.datasources.BaseWriterContainer.commitTask(WriterContainer.scala:211)
> at 
> org.apache.spark.sql.execution.datasources.DefaultWriterContainer.org$apache$spark$sql$execution$datasources$DefaultWriterContainer$$commitTask$1(WriterContainer.scala:270)
> ... 13 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14997) Add hadoop-aliyun as dependency of hadoop-cloud-storage

2017-11-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235930#comment-16235930
 ] 

Hudson commented on HADOOP-14997:
-

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13178 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13178/])
HADOOP-14997. Add hadoop-aliyun as dependency of hadoop-cloud-storage. 
(sammi.chen: rev cde56b9cefe1eb2943eef56a6aa7fdfa1b78e909)
* (edit) hadoop-cloud-storage-project/hadoop-cloud-storage/pom.xml


>  Add hadoop-aliyun as dependency of hadoop-cloud-storage
> 
>
> Key: HADOOP-14997
> URL: https://issues.apache.org/jira/browse/HADOOP-14997
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HADOOP-14997.001.patch
>
>
> add {{hadoop-aliyun}} dependency in cloud storage modules



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14989) metrics2 JMX cache refresh result in inconsistent Mutable(Stat|Rate) values

2017-11-02 Thread Erik Krogen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Krogen updated HADOOP-14989:
-
Description: 
While doing some digging in the metrics2 system recently, we noticed that the 
way {{MutableStat}} values are collected (and thus {{MutableRate}}, since it is 
based off of {{MutableStat}}) mean that every time the value is snapshotted, 
all previous information is lost. So every time a JMX cache refresh occurs, it 
resets the {{MutableStat}}, meaning that all configured metrics sinks do not 
consider the previous statistics in their emitted values. The same behavior is 
true if you configured multiple sink periods.

{{MutableStat}}, to compute its average value, maintains a total value since 
last snapshot, as well as operation count since last snapshot. Upon 
snapshotting, the average is calculated as (total / opCount) and placed into a 
gauge metric, and total / operation count are cleared. So the average value 
represents the average since the last snapshot. If we have only a single sink 
period ever snapshotting, this would result in the expected behavior that the 
value is the average over the reporting period. However, if multiple sink 
periods are configured, or if the JMX cache is refreshed, this is another 
snapshot operation. So, for example, if you have a FileSink configured at a 60 
second interval and your JMX cache refreshes itself 1 second before the 
FileSink period fires, the values emitted to your FileSink only represent 
averages _over the last one second_.

A few ways to solve this issue:
* Make {{MutableRate}} manage its own average refresh, similar to 
{{MutableQuantiles}}, which has a refresh thread and saves a snapshot of the 
last quantile values that it will serve up until the next refresh. Given how 
many {{MutableRate}} metrics there are, a thread per metric is not really 
feasible, but could be done on e.g. a per-source basis. This has some 
downsides: if multiple sinks are configured with different periods, what is the 
right refresh period for the {{MutableRate}}? 
* Make {{MutableRate}} emit two counters, one for total and one for operation 
count, rather than an average gauge and an operation count counter. The average 
could then be calculated downstream from this information. This is cumbersome 
for operators and not backwards compatible. To improve on both of those 
downsides, we could have it keep the current behavior but _additionally_ emit 
the total as a counter. The snapshotted average is probably sufficient in the 
common case (we've been using it for years), and when more guaranteed accuracy 
is required, the average could be derived from the total and operation count.

The two above suggestions will fix this for both JMX and multiple sink periods, 
but may be overkill. Multiple sink periods are probably not necessary though we 
should at least document the behavior.

Open to suggestions & input here.

  was:
While doing some digging in the metrics2 system recently, we noticed that the 
way {{MutableStat}} values are collected (and thus {{MutableRate}}, since it is 
based off of {{MutableStat}}) mean that each sink configured (including JMX) 
only receives a portion of the average information.

{{MutableStat}}, to compute its average value, maintains a total value since 
last snapshot, as well as operation count since last snapshot. Upon 
snapshotting, the average is calculated as (total / opCount) and placed into a 
gauge metric, and total / operation count are cleared. So the average value 
represents the average since the last snapshot. If only a single sink ever 
snapshots, this would result in the expected behavior that the value is the 
average over the reporting period. However, if multiple sinks are configured, 
or if the JMX cache is refreshed, this is another snapshot operation. So, for 
example, if you have a FileSink configured at a 60 second interval and your JMX 
cache refreshes itself 1 second before the FileSink period fires, the values 
emitted to your FileSink only represent averages _over the last one second_.

A few ways to solve this issue:
* From an operator perspective, ensure only one sink is configured. This is not 
realistic given that the JMX cache exhibits the same behavior.
* Make {{MutableRate}} manage its own average refresh, similar to 
{{MutableQuantiles}}, which has a refresh thread and saves a snapshot of the 
last quantile values that it will serve up until the next refresh. Given how 
many {{MutableRate}} metrics there are, a thread per metric is not really 
feasible, but could be done on e.g. a per-source basis. This has some 
downsides: if multiple sinks are configured with different periods, what is the 
right refresh period for the {{MutableRate}}? 
* Make {{MutableRate}} emit two counters, one for total and one for operation 
count, rather than an average gauge and an operation count counter. The average 

[jira] [Updated] (HADOOP-14989) metrics2 JMX cache refresh result in inconsistent Mutable(Stat|Rate) values

2017-11-02 Thread Erik Krogen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Krogen updated HADOOP-14989:
-
Summary: metrics2 JMX cache refresh result in inconsistent 
Mutable(Stat|Rate) values  (was: Multiple metrics2 sinks (incl JMX) result in 
inconsistent Mutable(Stat|Rate) values)

> metrics2 JMX cache refresh result in inconsistent Mutable(Stat|Rate) values
> ---
>
> Key: HADOOP-14989
> URL: https://issues.apache.org/jira/browse/HADOOP-14989
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: metrics
>Affects Versions: 2.6.5
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Critical
> Attachments: HADOOP-14989.test.patch
>
>
> While doing some digging in the metrics2 system recently, we noticed that the 
> way {{MutableStat}} values are collected (and thus {{MutableRate}}, since it 
> is based off of {{MutableStat}}) mean that each sink configured (including 
> JMX) only receives a portion of the average information.
> {{MutableStat}}, to compute its average value, maintains a total value since 
> last snapshot, as well as operation count since last snapshot. Upon 
> snapshotting, the average is calculated as (total / opCount) and placed into 
> a gauge metric, and total / operation count are cleared. So the average value 
> represents the average since the last snapshot. If only a single sink ever 
> snapshots, this would result in the expected behavior that the value is the 
> average over the reporting period. However, if multiple sinks are configured, 
> or if the JMX cache is refreshed, this is another snapshot operation. So, for 
> example, if you have a FileSink configured at a 60 second interval and your 
> JMX cache refreshes itself 1 second before the FileSink period fires, the 
> values emitted to your FileSink only represent averages _over the last one 
> second_.
> A few ways to solve this issue:
> * From an operator perspective, ensure only one sink is configured. This is 
> not realistic given that the JMX cache exhibits the same behavior.
> * Make {{MutableRate}} manage its own average refresh, similar to 
> {{MutableQuantiles}}, which has a refresh thread and saves a snapshot of the 
> last quantile values that it will serve up until the next refresh. Given how 
> many {{MutableRate}} metrics there are, a thread per metric is not really 
> feasible, but could be done on e.g. a per-source basis. This has some 
> downsides: if multiple sinks are configured with different periods, what is 
> the right refresh period for the {{MutableRate}}? 
> * Make {{MutableRate}} emit two counters, one for total and one for operation 
> count, rather than an average gauge and an operation count counter. The 
> average could then be calculated downstream from this information. This is 
> cumbersome for operators and not backwards compatible. To improve on both of 
> those downsides, we could have it keep the current behavior but 
> _additionally_ emit the total as a counter. The snapshotted average is 
> probably sufficient in the common case (we've been using it for years), and 
> when more guaranteed accuracy is required, the average could be derived from 
> the total and operation count.
> Open to suggestions & input here.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-14989) Multiple metrics2 sinks (incl JMX) result in inconsistent Mutable(Stat|Rate) values

2017-11-02 Thread Erik Krogen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235883#comment-16235883
 ] 

Erik Krogen edited comment on HADOOP-14989 at 11/2/17 3:14 PM:
---

Assuming nothing else was submitting block reports, then {{val}} with the 
current code would be 10, but it should be 5005 (it is an average so {{= 
10010/2}}). Since it is taking metrics from a minicluster there are also 
some real block reports that skew things; that's why I used a big value and a 
comparison rather than equal. Like I said, hacky. But the test will 
definitively pass if you omit the JMX call and definitely fail if you include 
it. I'll try to put together a real unit test for this.

I am not sure what you mean about JMX mbean calling reset internally. Are you 
talking here about the metrics2 level reset 
({{MetricsSourceAdapter#updateJmxCache()}}) or something at a JVM level? I 
explained how the cache reset is managed at the metrics2 level; let me know if 
there's something about my explanation that was not clear.


was (Author: xkrogen):
Assuming nothing else was submitting block reports, then {{val}} with the 
current code would be 10, but it should be 5005 (it is an average so {{= 
10010/2}}). Since it is taking metrics from a minicluster there are also 
some real block reports that skew things; that's why I used a big value and a 
comparison rather than equal. Like I said, hacky. But the test will 
definitively pass if you omit the JMX call and definitely fail if you include 
it. I'll try to put together a real unit test for this.

I am not sure what you mean about JMX mbean calling reset internally. Are you 
talking here about the metrics2 level reset 
({{MetricsSourceAdapter#updateJmxCache()}} or something at a JVM level? I 
explained how the cache reset is managed at the metrics2 level; let me know if 
there's something about my explanation that was not clear.

> Multiple metrics2 sinks (incl JMX) result in inconsistent Mutable(Stat|Rate) 
> values
> ---
>
> Key: HADOOP-14989
> URL: https://issues.apache.org/jira/browse/HADOOP-14989
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: metrics
>Affects Versions: 2.6.5
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Critical
> Attachments: HADOOP-14989.test.patch
>
>
> While doing some digging in the metrics2 system recently, we noticed that the 
> way {{MutableStat}} values are collected (and thus {{MutableRate}}, since it 
> is based off of {{MutableStat}}) mean that each sink configured (including 
> JMX) only receives a portion of the average information.
> {{MutableStat}}, to compute its average value, maintains a total value since 
> last snapshot, as well as operation count since last snapshot. Upon 
> snapshotting, the average is calculated as (total / opCount) and placed into 
> a gauge metric, and total / operation count are cleared. So the average value 
> represents the average since the last snapshot. If only a single sink ever 
> snapshots, this would result in the expected behavior that the value is the 
> average over the reporting period. However, if multiple sinks are configured, 
> or if the JMX cache is refreshed, this is another snapshot operation. So, for 
> example, if you have a FileSink configured at a 60 second interval and your 
> JMX cache refreshes itself 1 second before the FileSink period fires, the 
> values emitted to your FileSink only represent averages _over the last one 
> second_.
> A few ways to solve this issue:
> * From an operator perspective, ensure only one sink is configured. This is 
> not realistic given that the JMX cache exhibits the same behavior.
> * Make {{MutableRate}} manage its own average refresh, similar to 
> {{MutableQuantiles}}, which has a refresh thread and saves a snapshot of the 
> last quantile values that it will serve up until the next refresh. Given how 
> many {{MutableRate}} metrics there are, a thread per metric is not really 
> feasible, but could be done on e.g. a per-source basis. This has some 
> downsides: if multiple sinks are configured with different periods, what is 
> the right refresh period for the {{MutableRate}}? 
> * Make {{MutableRate}} emit two counters, one for total and one for operation 
> count, rather than an average gauge and an operation count counter. The 
> average could then be calculated downstream from this information. This is 
> cumbersome for operators and not backwards compatible. To improve on both of 
> those downsides, we could have it keep the current behavior but 
> _additionally_ emit the total as a counter. The snapshotted average is 
> probably sufficient in the common case (we've been using it for years), and 
> when more 

[jira] [Commented] (HADOOP-14989) Multiple metrics2 sinks (incl JMX) result in inconsistent Mutable(Stat|Rate) values

2017-11-02 Thread Erik Krogen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235883#comment-16235883
 ] 

Erik Krogen commented on HADOOP-14989:
--

Assuming nothing else was submitting block reports, then {{val}} with the 
current code would be 10, but it should be 5005 (it is an average so {{= 
10010/2}}). Since it is taking metrics from a minicluster there are also 
some real block reports that skew things; that's why I used a big value and a 
comparison rather than equal. Like I said, hacky. But the test will 
definitively pass if you omit the JMX call and definitely fail if you include 
it. I'll try to put together a real unit test for this.

I am not sure what you mean about JMX mbean calling reset internally. Are you 
talking here about the metrics2 level reset 
({{MetricsSourceAdapter#updateJmxCache()}} or something at a JVM level? I 
explained how the cache reset is managed at the metrics2 level; let me know if 
there's something about my explanation that was not clear.

> Multiple metrics2 sinks (incl JMX) result in inconsistent Mutable(Stat|Rate) 
> values
> ---
>
> Key: HADOOP-14989
> URL: https://issues.apache.org/jira/browse/HADOOP-14989
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: metrics
>Affects Versions: 2.6.5
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Critical
> Attachments: HADOOP-14989.test.patch
>
>
> While doing some digging in the metrics2 system recently, we noticed that the 
> way {{MutableStat}} values are collected (and thus {{MutableRate}}, since it 
> is based off of {{MutableStat}}) mean that each sink configured (including 
> JMX) only receives a portion of the average information.
> {{MutableStat}}, to compute its average value, maintains a total value since 
> last snapshot, as well as operation count since last snapshot. Upon 
> snapshotting, the average is calculated as (total / opCount) and placed into 
> a gauge metric, and total / operation count are cleared. So the average value 
> represents the average since the last snapshot. If only a single sink ever 
> snapshots, this would result in the expected behavior that the value is the 
> average over the reporting period. However, if multiple sinks are configured, 
> or if the JMX cache is refreshed, this is another snapshot operation. So, for 
> example, if you have a FileSink configured at a 60 second interval and your 
> JMX cache refreshes itself 1 second before the FileSink period fires, the 
> values emitted to your FileSink only represent averages _over the last one 
> second_.
> A few ways to solve this issue:
> * From an operator perspective, ensure only one sink is configured. This is 
> not realistic given that the JMX cache exhibits the same behavior.
> * Make {{MutableRate}} manage its own average refresh, similar to 
> {{MutableQuantiles}}, which has a refresh thread and saves a snapshot of the 
> last quantile values that it will serve up until the next refresh. Given how 
> many {{MutableRate}} metrics there are, a thread per metric is not really 
> feasible, but could be done on e.g. a per-source basis. This has some 
> downsides: if multiple sinks are configured with different periods, what is 
> the right refresh period for the {{MutableRate}}? 
> * Make {{MutableRate}} emit two counters, one for total and one for operation 
> count, rather than an average gauge and an operation count counter. The 
> average could then be calculated downstream from this information. This is 
> cumbersome for operators and not backwards compatible. To improve on both of 
> those downsides, we could have it keep the current behavior but 
> _additionally_ emit the total as a counter. The snapshotted average is 
> probably sufficient in the common case (we've been using it for years), and 
> when more guaranteed accuracy is required, the average could be derived from 
> the total and operation count.
> Open to suggestions & input here.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14161) Failed to rename file in S3A during FileOutputFormat commitTask

2017-11-02 Thread Todor Kolev (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235873#comment-16235873
 ] 

Todor Kolev commented on HADOOP-14161:
--

[~ste...@apache.org] What is the alternative though, write to HDFS and then 
move stuff across to S3? 

> Failed to rename file in S3A during FileOutputFormat commitTask
> ---
>
> Key: HADOOP-14161
> URL: https://issues.apache.org/jira/browse/HADOOP-14161
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.7.0, 2.7.1, 2.7.2, 2.7.3
> Environment: spark 2.0.2 with mesos
> hadoop 2.7.2
>Reporter: Luke Miner
>Priority: Minor
>
> I'm getting non deterministic rename errors while writing to S3 using spark 
> and hadoop. The proper permissions are set and this only happens 
> occasionally. It can happen on a job that is as simple as reading in json, 
> repartitioning and then writing out. After this failure occurs, the overall 
> job hangs indefinitely.
> {code}
> org.apache.spark.SparkException: Task failed while writing rows
> at 
> org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:261)
> at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
> at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
> at org.apache.spark.scheduler.Task.run(Task.scala:86)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: Failed to commit task
> at 
> org.apache.spark.sql.execution.datasources.DefaultWriterContainer.org$apache$spark$sql$execution$datasources$DefaultWriterContainer$$commitTask$1(WriterContainer.scala:275)
> at 
> org.apache.spark.sql.execution.datasources.DefaultWriterContainer$$anonfun$writeRows$1.apply$mcV$sp(WriterContainer.scala:257)
> at 
> org.apache.spark.sql.execution.datasources.DefaultWriterContainer$$anonfun$writeRows$1.apply(WriterContainer.scala:252)
> at 
> org.apache.spark.sql.execution.datasources.DefaultWriterContainer$$anonfun$writeRows$1.apply(WriterContainer.scala:252)
> at 
> org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1348)
> at 
> org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:258)
> ... 8 more
> Caused by: java.io.IOException: Failed to rename 
> S3AFileStatus{path=s3a://foo/_temporary/0/_temporary/attempt_201703081855_0018_m_000966_0/part-r-00966-615ed714-58c1-4b89-be56-e47966737c75.snappy.parquet;
>  isDirectory=false; length=111225342; replication=1; blocksize=33554432; 
> modification_time=1488999342000; access_time=0; owner=; group=; 
> permission=rw-rw-rw-; isSymlink=false} to 
> s3a://foo/part-r-00966-615ed714-58c1-4b89-be56-e47966737c75.snappy.parquet
> at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:415)
> at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:428)
> at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:539)
> at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:502)
> at 
> org.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:50)
> at 
> org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:76)
> at 
> org.apache.spark.sql.execution.datasources.BaseWriterContainer.commitTask(WriterContainer.scala:211)
> at 
> org.apache.spark.sql.execution.datasources.DefaultWriterContainer.org$apache$spark$sql$execution$datasources$DefaultWriterContainer$$commitTask$1(WriterContainer.scala:270)
> ... 13 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14951) KMSACL implementation is not configurable

2017-11-02 Thread Zsombor Gegesy (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235776#comment-16235776
 ] 

Zsombor Gegesy commented on HADOOP-14951:
-

The test failures shouldn't be related to the changed code.

> KMSACL implementation is not configurable
> -
>
> Key: HADOOP-14951
> URL: https://issues.apache.org/jira/browse/HADOOP-14951
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: kms
>Reporter: Zsombor Gegesy
>Priority: Major
>  Labels: key-management, kms
> Attachments: 
> 0001-HADOOP-14951-Make-the-KMSACLs-implementation-customi.patch, 
> 0001-Make-the-KMSACLs-implementation-customizable-with-an.patch, 
> HADOOP-14951-3-Make-the-KMSACLs-implementation-customi.patch, 
> HADOOP-14951-4.patch, HADOOP-14951-5.patch
>
>
> Currently, it is not possible to customize KMS's key management, if KMSACLs 
> behaviour is not enough. If an external key management solution is used, that 
> would need a higher level API, where it can decide, if the given operation is 
> allowed, or not.
>  For this to achieve, it would be a solution, to introduce a new interface, 
> which could be implemented by KMSACLs - and also other KMS - and a new 
> configuration point could be added, where the actual interface implementation 
> could be specified.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13282) S3 blob etags to be made visible in status/getFileChecksum() calls

2017-11-02 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235561#comment-16235561
 ] 

Steve Loughran commented on HADOOP-13282:
-

+as it saves 1 GET  +path "/" + one List, it saves ~ $0.009 to discover a file 
doesn't exist. We could switch to it across the internal bits of our code which 
only look for the existence of a file; the presence of a directory is 
considered as much a failure as no file.

> S3 blob etags to be made visible in status/getFileChecksum() calls
> --
>
> Key: HADOOP-13282
> URL: https://issues.apache.org/jira/browse/HADOOP-13282
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.9.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Attachments: HADOOP-13282-001.patch
>
>
> If the etags of blobs were exported via {{getFileChecksum()}}, it'd be 
> possible to probe for a blob being in sync with a local file. Distcp could 
> use this to decide whether to skip a file or not.
> Now, there's a problem there: distcp needs source and dest filesystems to 
> implement the same algorithm. It'd only work out the box if you were copying 
> between S3 instances. There are also quirks with encryption and multipart: 
> [s3 
> docs|http://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonResponseHeaders.html].
>  At the very least, it's something which could be used when indexing the FS, 
> to check for changes later.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15000) s3a new getdefaultblocksize be called in getFileStatus which has not been implemented in s3afilesystem yet

2017-11-02 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235541#comment-16235541
 ] 

Steve Loughran commented on HADOOP-15000:
-

One thing which does need looking at is HADOOP-14943; that should be handling 
the block size too. If you have a patch w/ tests there, it'd be very useful

> s3a new getdefaultblocksize be called in getFileStatus which has not been 
> implemented in s3afilesystem yet
> --
>
> Key: HADOOP-15000
> URL: https://issues.apache.org/jira/browse/HADOOP-15000
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Affects Versions: 2.9.0
>Reporter: Yonger
>Priority: Minor
>
> new implementation of getting block size has been called in getFileStatus 
> method: 
> {code:java}
>   return new S3AFileStatus(meta.getContentLength(),
>   dateToLong(meta.getLastModified()),
>   path,
>   getDefaultBlockSize(path),
>   username);
> }
> {code}
> while we don't implement it in our s3afilesystem currently, also we need to 
> implement this new method as the old one deprecated.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15000) s3a new getdefaultblocksize be called in getFileStatus which has not been implemented in s3afilesystem yet

2017-11-02 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235535#comment-16235535
 ] 

Steve Loughran commented on HADOOP-15000:
-

It's there: the fallback to getDefaultBlockSize picks up the value

{code}
  public long getDefaultBlockSize() {
return getConf().getLongBytes(FS_S3A_BLOCK_SIZE, DEFAULT_BLOCKSIZE);
  }
{code}

We don't cache it because it allows callers to dynamically change the blocksize 
on the live FS instance just by changing the config option

> s3a new getdefaultblocksize be called in getFileStatus which has not been 
> implemented in s3afilesystem yet
> --
>
> Key: HADOOP-15000
> URL: https://issues.apache.org/jira/browse/HADOOP-15000
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Affects Versions: 2.9.0
>Reporter: Yonger
>Priority: Minor
>
> new implementation of getting block size has been called in getFileStatus 
> method: 
> {code:java}
>   return new S3AFileStatus(meta.getContentLength(),
>   dateToLong(meta.getLastModified()),
>   path,
>   getDefaultBlockSize(path),
>   username);
> }
> {code}
> while we don't implement it in our s3afilesystem currently, also we need to 
> implement this new method as the old one deprecated.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-14997) Add hadoop-aliyun as dependency of hadoop-cloud-storage

2017-11-02 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235382#comment-16235382
 ] 

SammiChen edited comment on HADOOP-14997 at 11/2/17 9:14 AM:
-

Sure. Will do it right now.  Thanks [~ajisakaa] for the reminder!


was (Author: sammi):
Sure. Will do it right now. 

>  Add hadoop-aliyun as dependency of hadoop-cloud-storage
> 
>
> Key: HADOOP-14997
> URL: https://issues.apache.org/jira/browse/HADOOP-14997
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HADOOP-14997.001.patch
>
>
> add {{hadoop-aliyun}} dependency in cloud storage modules



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14964) AliyunOSS: backport HADOOP-12756 to branch-2

2017-11-02 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235405#comment-16235405
 ] 

Kai Zheng commented on HADOOP-14964:


Sorry for the late response, Genmao.

Steve, could you help with my following questions? Thanks.
bq. It's too late for the 2.9 branch...
Did you mean the backport to 2.9 branch is too late? If so, I'd wonder why, 
because I don't aware any release on-going based on the branch and the branch 
is frozen right. Or did you mean it's too late for branch-2, since as you 
mentioned, we might not have 2.10? Thanks for your clarifying or confirm.

I thought OSS support is rather standing alone and I roughly believe we could 
backport it to branch-2, branch-2.7, branch-2.8 and branch-2.9, technically. 
I'm not familiar with release logics, though. Would you cast some insights? 
Thanks again.

> AliyunOSS: backport HADOOP-12756 to branch-2
> 
>
> Key: HADOOP-14964
> URL: https://issues.apache.org/jira/browse/HADOOP-14964
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/oss
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14997) Add hadoop-aliyun as dependency of hadoop-cloud-storage

2017-11-02 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235382#comment-16235382
 ] 

SammiChen commented on HADOOP-14997:


Sure. Will do it right now. 

>  Add hadoop-aliyun as dependency of hadoop-cloud-storage
> 
>
> Key: HADOOP-14997
> URL: https://issues.apache.org/jira/browse/HADOOP-14997
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HADOOP-14997.001.patch
>
>
> add {{hadoop-aliyun}} dependency in cloud storage modules



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-14997) Add hadoop-aliyun as dependency of hadoop-cloud-storage

2017-11-02 Thread Akira Ajisaka (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235380#comment-16235380
 ] 

Akira Ajisaka edited comment on HADOOP-14997 at 11/2/17 8:24 AM:
-

Hi [~Sammi], now the commits are in origin/apache-3.0 and origin/apache-trunk 
branch. Would you delete the newly created branches and push the commit to 
origin/branch-3.0 and origin/trunk?


was (Author: ajisakaa):
Hi [~Sammi], now the commits are in origin/apache-3.0 and origin/apache-trunk 
branch. Would you remove the newly created branches and push the commit to 
origin/branch-3.0 and origin/trunk?

>  Add hadoop-aliyun as dependency of hadoop-cloud-storage
> 
>
> Key: HADOOP-14997
> URL: https://issues.apache.org/jira/browse/HADOOP-14997
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HADOOP-14997.001.patch
>
>
> add {{hadoop-aliyun}} dependency in cloud storage modules



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14997) Add hadoop-aliyun as dependency of hadoop-cloud-storage

2017-11-02 Thread Akira Ajisaka (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235380#comment-16235380
 ] 

Akira Ajisaka commented on HADOOP-14997:


Hi [~Sammi], now the commits are in origin/apache-3.0 and origin/apache-trunk 
branch. Would you remove the newly created branches and push the commit to 
origin/branch-3.0 and origin/trunk?

>  Add hadoop-aliyun as dependency of hadoop-cloud-storage
> 
>
> Key: HADOOP-14997
> URL: https://issues.apache.org/jira/browse/HADOOP-14997
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HADOOP-14997.001.patch
>
>
> add {{hadoop-aliyun}} dependency in cloud storage modules



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14997) Add hadoop-aliyun as dependency of hadoop-cloud-storage

2017-11-02 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HADOOP-14997:
---
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

>  Add hadoop-aliyun as dependency of hadoop-cloud-storage
> 
>
> Key: HADOOP-14997
> URL: https://issues.apache.org/jira/browse/HADOOP-14997
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HADOOP-14997.001.patch
>
>
> add {{hadoop-aliyun}} dependency in cloud storage modules



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14997) Add hadoop-aliyun as dependency of hadoop-cloud-storage

2017-11-02 Thread SammiChen (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235257#comment-16235257
 ] 

SammiChen commented on HADOOP-14997:


Committed to trunk and 3.0 branch. Thanks [~uncleGen] for the contribution. 

>  Add hadoop-aliyun as dependency of hadoop-cloud-storage
> 
>
> Key: HADOOP-14997
> URL: https://issues.apache.org/jira/browse/HADOOP-14997
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Minor
> Attachments: HADOOP-14997.001.patch
>
>
> add {{hadoop-aliyun}} dependency in cloud storage modules



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org