date:20171122

[jira] [Commented] (HADOOP-14600) LocatedFileStatus constructor forces RawLocalFS to exec a process to get the permissions

2017-11-22 Thread Chris Douglas (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263924#comment-16263924
 ] 

Chris Douglas commented on HADOOP-14600:


lgtm, but if you have cycles to verify the patch, then let's commit it as soon 
as you +1 it.

> LocatedFileStatus constructor forces RawLocalFS to exec a process to get the 
> permissions
> 
>
> Key: HADOOP-14600
> URL: https://issues.apache.org/jira/browse/HADOOP-14600
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 2.7.3
> Environment: file:// in a dir with many files
>Reporter: Steve Loughran
>Assignee: Ping Liu
> Attachments: HADOOP-14600.001.patch, HADOOP-14600.002.patch, 
> HADOOP-14600.003.patch, HADOOP-14600.004.patch, HADOOP-14600.005.patch, 
> HADOOP-14600.006.patch, HADOOP-14600.007.patch, HADOOP-14600.008.patch, 
> HADOOP-14600.009.patch, TestRawLocalFileSystemContract.java
>
>
> Reported in SPARK-21137. a {{FileSystem.listStatus}} call really craws 
> against the local FS, because {{FileStatus.getPemissions}} call forces  
> {{DeprecatedRawLocalFileStatus}} tp spawn a process to read the real UGI 
> values.
> That is: for every other FS, what's a field lookup or even a no-op, on the 
> local FS it's a process exec/spawn, with all the costs. This gets expensive 
> if you have many files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15067) GC time percentage reported in JvmMetrics should be a gauge, not counter

2017-11-22 Thread Erik Krogen (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263893#comment-16263893
 ] 

Erik Krogen commented on HADOOP-15067:
--

Thanks for the ping [~xiaochen] and sorry for the late response, as you seem to 
have suspected I was OOO for the holidays today. And thank you for the fix 
[~mi...@cloudera.com]! LGTM.

> GC time percentage reported in JvmMetrics should be a gauge, not counter
> 
>
> Key: HADOOP-15067
> URL: https://issues.apache.org/jira/browse/HADOOP-15067
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
> Attachments: HADOOP-15067.01.patch
>
>
> A new GcTimeMonitor class has been recently added, and the corresponding 
> metrics added in JvmMetrics.java, line 190:
> {code}
> if (gcTimeMonitor != null) {
>   rb.addCounter(GcTimePercentage,
>   gcTimeMonitor.getLatestGcData().getGcTimePercentage());
> }
> {code}
> Since GC time percentage can go up and down, a gauge rather than counter 
> should be used to report it. That is, {{addCounter}} should be replaced with 
> {{addGauge}} above.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HADOOP-14600) LocatedFileStatus constructor forces RawLocalFS to exec a process to get the permissions

2017-11-22 Thread Ping Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263821#comment-16263821
 ] 

Ping Liu edited comment on HADOOP-14600 at 11/23/17 5:01 AM:
-

[~chris.douglas] Finally, this round is green.  That's great!  Do you still 
need me verify it?  If so, I will try to work on it during this weekend.


was (Author: myapachejira):
[~chris.douglas] Finally, this round is green.  That's great!  Do you still 
need me verify it?  If so, I need learn how to use "git apply " :) 

> LocatedFileStatus constructor forces RawLocalFS to exec a process to get the 
> permissions
> 
>
> Key: HADOOP-14600
> URL: https://issues.apache.org/jira/browse/HADOOP-14600
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 2.7.3
> Environment: file:// in a dir with many files
>Reporter: Steve Loughran
>Assignee: Ping Liu
> Attachments: HADOOP-14600.001.patch, HADOOP-14600.002.patch, 
> HADOOP-14600.003.patch, HADOOP-14600.004.patch, HADOOP-14600.005.patch, 
> HADOOP-14600.006.patch, HADOOP-14600.007.patch, HADOOP-14600.008.patch, 
> HADOOP-14600.009.patch, TestRawLocalFileSystemContract.java
>
>
> Reported in SPARK-21137. a {{FileSystem.listStatus}} call really craws 
> against the local FS, because {{FileStatus.getPemissions}} call forces  
> {{DeprecatedRawLocalFileStatus}} tp spawn a process to read the real UGI 
> values.
> That is: for every other FS, what's a field lookup or even a no-op, on the 
> local FS it's a process exec/spawn, with all the costs. This gets expensive 
> if you have many files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-14600) LocatedFileStatus constructor forces RawLocalFS to exec a process to get the permissions

2017-11-22 Thread Ping Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263821#comment-16263821
 ] 

Ping Liu commented on HADOOP-14600:
---

[~chris.douglas] Finally, this round is green.  That's great!  Do you still 
need me verify it?  If so, I need learn how to use "git apply " :) 

> LocatedFileStatus constructor forces RawLocalFS to exec a process to get the 
> permissions
> 
>
> Key: HADOOP-14600
> URL: https://issues.apache.org/jira/browse/HADOOP-14600
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 2.7.3
> Environment: file:// in a dir with many files
>Reporter: Steve Loughran
>Assignee: Ping Liu
> Attachments: HADOOP-14600.001.patch, HADOOP-14600.002.patch, 
> HADOOP-14600.003.patch, HADOOP-14600.004.patch, HADOOP-14600.005.patch, 
> HADOOP-14600.006.patch, HADOOP-14600.007.patch, HADOOP-14600.008.patch, 
> HADOOP-14600.009.patch, TestRawLocalFileSystemContract.java
>
>
> Reported in SPARK-21137. a {{FileSystem.listStatus}} call really craws 
> against the local FS, because {{FileStatus.getPemissions}} call forces  
> {{DeprecatedRawLocalFileStatus}} tp spawn a process to read the real UGI 
> values.
> That is: for every other FS, what's a field lookup or even a no-op, on the 
> local FS it's a process exec/spawn, with all the costs. This gets expensive 
> if you have many files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Created] (HADOOP-15068) cancelToken and renewToken should use shortUserName consistently

2017-11-22 Thread Vihang Karajgaonkar (JIRA)

Vihang Karajgaonkar created HADOOP-15068:


 Summary: cancelToken and renewToken should use shortUserName 
consistently
 Key: HADOOP-15068
 URL: https://issues.apache.org/jira/browse/HADOOP-15068
 Project: Hadoop Common
  Issue Type: Improvement
  Components: common
Affects Versions: 2.8.2
Reporter: Vihang Karajgaonkar


 {{AbstractDelegationTokenSecretManager}} is used by many external projects 
including Hive. This class provides implementations of renewToken and 
cancelToken which are used for the delegation token management. The methods are 
semantically inconsistent. Specifically, when you call cancelToken, the string 
value of the canceller is used to get the Kerberos shortname and then compared 
with the renewer value of the token to be cancelled. While in case of 
renewToken, the string value which is passed in is used directly to compare 
with the renewer value of the token.

This inconsistency means that applications need to know about this subtle 
difference and pass in the shortname while renewing the token, while it can 
pass the full kerberos username during cancellation. Can we change the 
renewToken method such that it uses the shortName similar to the cancelToken 
method?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15059) 3.0 deployment cannot work with old version MR tar ball which break rolling upgrade

2017-11-22 Thread Rohith Sharma K S (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263739#comment-16263739
 ] 

Rohith Sharma K S commented on HADOOP-15059:


Update : Fortunately I couldn't reproduce the issue which I reported in earlier 
comment. I am able to install Hadoop-3.0-RC0 + HBase-1.2.6 in secure mode and 
run successfully today. I am not sure that any issues post Hadoop-alpha-2 has 
fixed this issue. IIRC, the build which I used to test this combination is 
Hadoop-3.0-alpha2/3 + HBase-1.2.4/5!  Anyway its good news for ATSv2 folks 
which we were worried about this. 

I will be keep trying to reproduce this weekend as well. If there any issues 
found, I will be updating here. Till that time, please ignore that issue. I 
would appreciate if someone else can also validate the behavior. This gives 
additional confidence that wire compatibility across Hadoop-2 and Hadoop-3 is 
achieved! 

> 3.0 deployment cannot work with old version MR tar ball which break rolling 
> upgrade
> ---
>
> Key: HADOOP-15059
> URL: https://issues.apache.org/jira/browse/HADOOP-15059
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: security
>Reporter: Junping Du
>Priority: Blocker
>
> I tried to deploy 3.0 cluster with 2.9 MR tar ball. The MR job is failed 
> because following error:
> {noformat}
> 2017-11-21 12:42:50,911 INFO [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for 
> application appattempt_1511295641738_0003_01
> 2017-11-21 12:42:51,070 WARN [main] org.apache.hadoop.util.NativeCodeLoader: 
> Unable to load native-hadoop library for your platform... using builtin-java 
> classes where applicable
> 2017-11-21 12:42:51,118 FATAL [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
> java.lang.RuntimeException: Unable to determine current user
>   at 
> org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:254)
>   at 
> org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:220)
>   at 
> org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:212)
>   at 
> org.apache.hadoop.conf.Configuration.addResource(Configuration.java:888)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1638)
> Caused by: java.io.IOException: Exception reading 
> /tmp/nm-local-dir/usercache/jdu/appcache/application_1511295641738_0003/container_e03_1511295641738_0003_01_01/container_tokens
>   at 
> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:208)
>   at 
> org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:907)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:820)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:689)
>   at 
> org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:252)
>   ... 4 more
> Caused by: java.io.IOException: Unknown version 1 in token storage.
>   at 
> org.apache.hadoop.security.Credentials.readTokenStorageStream(Credentials.java:226)
>   at 
> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:205)
>   ... 8 more
> 2017-11-21 12:42:51,122 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting 
> with status 1: java.lang.RuntimeException: Unable to determine current user
> {noformat}
> I think it is due to token incompatiblity change between 2.9 and 3.0. As we 
> claim "rolling upgrade" is supported in Hadoop 3, we should fix this before 
> we ship 3.0 otherwise all MR running applications will get stuck during/after 
> upgrade.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Resolved] (HADOOP-13478) Aliyun OSS phase I: some preparation and improvements before release

2017-11-22 Thread Genmao Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu resolved HADOOP-13478.

Resolution: Done

> Aliyun OSS phase I: some preparation and improvements before release
> 
>
> Key: HADOOP-13478
> URL: https://issues.apache.org/jira/browse/HADOOP-13478
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Affects Versions: HADOOP-12756
>Reporter: Genmao Yu
> Fix For: HADOOP-12756
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15067) GC time percentage reported in JvmMetrics should be a gauge, not counter

2017-11-22 Thread Xiao Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263449#comment-16263449
 ] 

Xiao Chen commented on HADOOP-15067:


Plan to commit tonight. Erik please feel free to comment if you got a chance. 
Otherwise: happy thanksgiving!

> GC time percentage reported in JvmMetrics should be a gauge, not counter
> 
>
> Key: HADOOP-15067
> URL: https://issues.apache.org/jira/browse/HADOOP-15067
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
> Attachments: HADOOP-15067.01.patch
>
>
> A new GcTimeMonitor class has been recently added, and the corresponding 
> metrics added in JvmMetrics.java, line 190:
> {code}
> if (gcTimeMonitor != null) {
>   rb.addCounter(GcTimePercentage,
>   gcTimeMonitor.getLatestGcData().getGcTimePercentage());
> }
> {code}
> Since GC time percentage can go up and down, a gauge rather than counter 
> should be used to report it. That is, {{addCounter}} should be replaced with 
> {{addGauge}} above.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15066) Spurious error stopping secure datanode

2017-11-22 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263433#comment-16263433
 ] 

Hadoop QA commented on HADOOP-15066:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
9s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
 2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 45s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green}  0m 
 2s{color} | {color:green} There were no new shellcheck issues. {color} |
| {color:green}+1{color} | {color:green} shelldocs {color} | {color:green}  0m  
9s{color} | {color:green} There were no new shelldocs issues. {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 59s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
10s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
21s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 42m 54s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HADOOP-15066 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12898940/HADOOP-15066.01.patch 
|
| Optional Tests |  asflicense  mvnsite  unit  shellcheck  shelldocs  |
| uname | Linux a5146a12d74f 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 738d1a2 |
| maven | version: Apache Maven 3.3.9 |
| shellcheck | v0.4.6 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/13742/artifact/out/whitespace-eol.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/13742/testReport/ |
| Max. process+thread count | 303 (vs. ulimit of 5000) |
| modules | C: hadoop-common-project/hadoop-common U: 
hadoop-common-project/hadoop-common |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/13742/console |
| Powered by | Apache Yetus 0.7.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Spurious error stopping secure datanode
> ---
>
> Key: HADOOP-15066
> URL: https://issues.apache.org/jira/browse/HADOOP-15066
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: scripts
>Affects Versions: 3.0.0
>Reporter: Arpit Agarwal
>Assignee: Bharat Viswanadham
> Attachments: HADOOP-15066.00.patch, HADOOP-15066.01.patch
>
>
> There is a spurious error when stopping a secure datanode.
> {code}
> # hdfs --daemon stop datanode
> cat: /var/run/hadoop/hdfs//hadoop-hdfs-root-datanode.pid: No such file or 
> directory
> WARNING: pid has changed for datanode, skip deleting pid file
> cat: /var/run/hadoop/hdfs//hadoop-hdfs-root-datanode.pid: No such file or 
> directory
> WARNING: daemon pid has changed for datanode, skip deleting daemon pid file
> {code}
> The error appears

[jira] [Commented] (HADOOP-15066) Spurious error stopping secure datanode

2017-11-22 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263409#comment-16263409
 ] 

Arpit Agarwal commented on HADOOP-15066:


That's a good question. I am not sure why the daemon pid file deletion is 
attempted twice.

If you setup a secure cluster and try to stop the secure DN the error shows up 
twice.

> Spurious error stopping secure datanode
> ---
>
> Key: HADOOP-15066
> URL: https://issues.apache.org/jira/browse/HADOOP-15066
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: scripts
>Affects Versions: 3.0.0
>Reporter: Arpit Agarwal
>Assignee: Bharat Viswanadham
> Attachments: HADOOP-15066.00.patch, HADOOP-15066.01.patch
>
>
> There is a spurious error when stopping a secure datanode.
> {code}
> # hdfs --daemon stop datanode
> cat: /var/run/hadoop/hdfs//hadoop-hdfs-root-datanode.pid: No such file or 
> directory
> WARNING: pid has changed for datanode, skip deleting pid file
> cat: /var/run/hadoop/hdfs//hadoop-hdfs-root-datanode.pid: No such file or 
> directory
> WARNING: daemon pid has changed for datanode, skip deleting daemon pid file
> {code}
> The error appears benign. The service was stopped correctly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Work started] (HADOOP-14898) Create official Docker images for development and testing features

2017-11-22 Thread Elek, Marton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HADOOP-14898 started by Elek, Marton.
-
> Create official Docker images for development and testing features 
> ---
>
> Key: HADOOP-14898
> URL: https://issues.apache.org/jira/browse/HADOOP-14898
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Elek, Marton
>Assignee: Elek, Marton
> Attachments: HADOOP-14898.001.tar.gz, HADOOP-14898.002.tar.gz, 
> HADOOP-14898.003.tgz
>
>
> This is the original mail from the mailing list:
> {code}
> TL;DR: I propose to create official hadoop images and upload them to the 
> dockerhub.
> GOAL/SCOPE: I would like improve the existing documentation with easy-to-use 
> docker based recipes to start hadoop clusters with various configuration.
> The images also could be used to test experimental features. For example 
> ozone could be tested easily with these compose file and configuration:
> https://gist.github.com/elek/1676a97b98f4ba561c9f51fce2ab2ea6
> Or even the configuration could be included in the compose file:
> https://github.com/elek/hadoop/blob/docker-2.8.0/example/docker-compose.yaml
> I would like to create separated example compose files for federation, ha, 
> metrics usage, etc. to make it easier to try out and understand the features.
> CONTEXT: There is an existing Jira 
> https://issues.apache.org/jira/browse/HADOOP-13397
> But it’s about a tool to generate production quality docker images (multiple 
> types, in a flexible way). If no objections, I will create a separated issue 
> to create simplified docker images for rapid prototyping and investigating 
> new features. And register the branch to the dockerhub to create the images 
> automatically.
> MY BACKGROUND: I am working with docker based hadoop/spark clusters quite a 
> while and run them succesfully in different environments (kubernetes, 
> docker-swarm, nomad-based scheduling, etc.) My work is available from here: 
> https://github.com/flokkr but they could handle more complex use cases (eg. 
> instrumenting java processes with btrace, or read/reload configuration from 
> consul).
>  And IMHO in the official hadoop documentation it’s better to suggest to use 
> official apache docker images and not external ones (which could be changed).
> {code}
> The next list will enumerate the key decision points regarding to docker 
> image creating
> A. automated dockerhub build  / jenkins build
> Docker images could be built on the dockerhub (a branch pattern should be 
> defined for a github repository and the location of the Docker files) or 
> could be built on a CI server and pushed.
> The second one is more flexible (it's more easy to create matrix build, for 
> example)
> The first one had the advantage that we can get an additional flag on the 
> dockerhub that the build is automated (and built from the source by the 
> dockerhub).
> The decision is easy as ASF supports the first approach: (see 
> https://issues.apache.org/jira/browse/INFRA-12781?focusedCommentId=15824096=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15824096)
> B. source: binary distribution or source build
> The second question is about creating the docker image. One option is to 
> build the software on the fly during the creation of the docker image the 
> other one is to use the binary releases.
> I suggest to use the second approach as:
> 1. In that case the hadoop:2.7.3 could contain exactly the same hadoop 
> distrubution as the downloadable one
> 2. We don't need to add development tools to the image, the image could be 
> more smaller (which is important as the goal for this image to getting 
> started as fast as possible)
> 3. The docker definition will be more simple (and more easy to maintain)
> Usually this approach is used in other projects (I checked Apache Zeppelin 
> and Apache Nutch)
> C. branch usage
> Other question is the location of the Docker file. It could be on the 
> official source-code branches (branch-2, trunk, etc.) or we can create 
> separated branches for the dockerhub (eg. docker/2.7 docker/2.8 docker/3.0)
> For the first approach it's easier to find the docker images, but it's less 
> flexible. For example if we had a Dockerfile for on the source code it should 
> be used for every release (for example the Docker file from the tag 
> release-3.0.0 should be used for the 3.0 hadoop docker image). In that case 
> the release process is much more harder: in case of a Dockerfile error (which 
> could be test on dockerhub only after the taging), a new release should be 
> added after fixing the Dockerfile.
> Another problem is that with using tags it's not possible to improve the 
> Dockerfiles. I can imagine that we would like to improve

[jira] [Commented] (HADOOP-15067) GC time percentage reported in JvmMetrics should be a gauge, not counter

2017-11-22 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263373#comment-16263373
 ] 

Hadoop QA commented on HADOOP-15067:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
8s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
 1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 37s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
53s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 11m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 43s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  8m 51s{color} 
| {color:red} hadoop-common in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
32s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 79m  7s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.ipc.TestRPC |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HADOOP-15067 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12898933/HADOOP-15067.01.patch 
|
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux b9605eaef434 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 738d1a2 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/13740/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/13740/testReport/ |
| Max. process+thread count | 1360 (vs. ulimit of 5000) |
| modules | C: hadoop-common-project/hadoop-common U: 
hadoop-common-project/hadoop-common |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/13740/console |
| Powered by | Apache Yetus 0.7.0-SNAPSHOT   http://yetus.apache.org |


This

[jira] [Commented] (HADOOP-15066) Spurious error stopping secure datanode

2017-11-22 Thread Bharat Viswanadham (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263356#comment-16263356
 ] 

Bharat Viswanadham commented on HADOOP-15066:
-

Thanks [~arpitagarwal] for review.
But one question I have is
we do delete daemonpid file in hadoop_stop_daemon 

{code:java}
if [[ "${pid}" = "${cur_pid}" ]]; then
rm -f "${pidfile}" >/dev/null 2>&1
{code}

And again in hadoop_stop_secure_daemon

{code:java}
if [[ "${daemon_pid}" = "${cur_daemon_pid}" ]]; then
  rm -f "${daemonpidfile}" >/dev/null 2>&1
{code}


again we are trying to delete same file, not clear why we have delete logic 2 
times.


> Spurious error stopping secure datanode
> ---
>
> Key: HADOOP-15066
> URL: https://issues.apache.org/jira/browse/HADOOP-15066
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: scripts
>Affects Versions: 3.0.0
>Reporter: Arpit Agarwal
>Assignee: Bharat Viswanadham
> Attachments: HADOOP-15066.00.patch, HADOOP-15066.01.patch
>
>
> There is a spurious error when stopping a secure datanode.
> {code}
> # hdfs --daemon stop datanode
> cat: /var/run/hadoop/hdfs//hadoop-hdfs-root-datanode.pid: No such file or 
> directory
> WARNING: pid has changed for datanode, skip deleting pid file
> cat: /var/run/hadoop/hdfs//hadoop-hdfs-root-datanode.pid: No such file or 
> directory
> WARNING: daemon pid has changed for datanode, skip deleting daemon pid file
> {code}
> The error appears benign. The service was stopped correctly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15066) Spurious error stopping secure datanode

2017-11-22 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263355#comment-16263355
 ] 

Hadoop QA commented on HADOOP-15066:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
8s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
 3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 48s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green}  0m 
 3s{color} | {color:green} There were no new shellcheck issues. {color} |
| {color:green}+1{color} | {color:green} shelldocs {color} | {color:green}  0m  
8s{color} | {color:green} There were no new shelldocs issues. {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 18s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
11s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
21s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 42m 21s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HADOOP-15066 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12898937/HADOOP-15066.00.patch 
|
| Optional Tests |  asflicense  mvnsite  unit  shellcheck  shelldocs  |
| uname | Linux f59f8d5b2f41 3.13.0-129-generic #178-Ubuntu SMP Fri Aug 11 
12:48:20 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 738d1a2 |
| maven | version: Apache Maven 3.3.9 |
| shellcheck | v0.4.6 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/13741/testReport/ |
| Max. process+thread count | 341 (vs. ulimit of 5000) |
| modules | C: hadoop-common-project/hadoop-common U: 
hadoop-common-project/hadoop-common |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/13741/console |
| Powered by | Apache Yetus 0.7.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Spurious error stopping secure datanode
> ---
>
> Key: HADOOP-15066
> URL: https://issues.apache.org/jira/browse/HADOOP-15066
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: scripts
>Affects Versions: 3.0.0
>Reporter: Arpit Agarwal
>Assignee: Bharat Viswanadham
> Attachments: HADOOP-15066.00.patch, HADOOP-15066.01.patch
>
>
> There is a spurious error when stopping a secure datanode.
> {code}
> # hdfs --daemon stop datanode
> cat: /var/run/hadoop/hdfs//hadoop-hdfs-root-datanode.pid: No such file or 
> directory
> WARNING: pid has changed for datanode, skip deleting pid file
> cat: /var/run/hadoop/hdfs//hadoop-hdfs-root-datanode.pid: No such file or 
> directory
> WARNING: daemon pid has changed for datanode, skip deleting daemon pid file
> {code}
> The error appears benign. The service was stopped correctly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe,

[jira] [Updated] (HADOOP-15066) Spurious error stopping secure datanode

2017-11-22 Thread Bharat Viswanadham (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham updated HADOOP-15066:

Attachment: HADOOP-15066.01.patch

> Spurious error stopping secure datanode
> ---
>
> Key: HADOOP-15066
> URL: https://issues.apache.org/jira/browse/HADOOP-15066
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: scripts
>Affects Versions: 3.0.0
>Reporter: Arpit Agarwal
>Assignee: Bharat Viswanadham
> Attachments: HADOOP-15066.00.patch, HADOOP-15066.01.patch
>
>
> There is a spurious error when stopping a secure datanode.
> {code}
> # hdfs --daemon stop datanode
> cat: /var/run/hadoop/hdfs//hadoop-hdfs-root-datanode.pid: No such file or 
> directory
> WARNING: pid has changed for datanode, skip deleting pid file
> cat: /var/run/hadoop/hdfs//hadoop-hdfs-root-datanode.pid: No such file or 
> directory
> WARNING: daemon pid has changed for datanode, skip deleting daemon pid file
> {code}
> The error appears benign. The service was stopped correctly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HADOOP-15066) Spurious error stopping secure datanode

2017-11-22 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263308#comment-16263308
 ] 

Arpit Agarwal edited comment on HADOOP-15066 at 11/22/17 8:40 PM:
--

Thanks for the fix [~bharatviswa]. Couple of comments:
# The following can be replaced with elif:
{code}
else
  if [[ -f "${pidfile}" ]]; then
{code}
# We'd also need a fix in hadoop_stop_secure_daemon here. The pid equality 
checks should be skipped if the pid file no longer exists.
{code}
  cur_daemon_pid=$(cat "$daemonpidfile")
  cur_priv_pid=$(cat "$privpidfile")

  if [[ "${daemon_pid}" = "${cur_daemon_pid}" ]]; then
rm -f "${daemonpidfile}" >/dev/null 2>&1
  else
hadoop_error "WARNING: daemon pid has changed for ${command}, skip deleting 
daemon pid file"
  fi

  if [[ "${priv_pid}" = "${cur_priv_pid}" ]]; then
rm -f "${privpidfile}" >/dev/null 2>&1
  else
hadoop_error "WARNING: priv pid has changed for ${command}, skip deleting 
priv pid file"
  fi
{code}


was (Author: arpitagarwal):
Thanks for the fix [~bharatviswa]. Couple of comments:
# The following can be replaced with elif:
{code}
else
  if [[ -f "${pidfile}" ]]; then
{code}
# We'd also need a fix in hadoop_stop_secure_daemon here. The pid equality 
checks should be skipped if the pid file no longer exists.
{code}
  if [[ "${daemon_pid}" = "${cur_daemon_pid}" ]]; then
rm -f "${daemonpidfile}" >/dev/null 2>&1
  else
hadoop_error "WARNING: daemon pid has changed for ${command}, skip deleting 
daemon pid file"
  fi

  if [[ "${priv_pid}" = "${cur_priv_pid}" ]]; then
rm -f "${privpidfile}" >/dev/null 2>&1
  else
hadoop_error "WARNING: priv pid has changed for ${command}, skip deleting 
priv pid file"
  fi
{code}

> Spurious error stopping secure datanode
> ---
>
> Key: HADOOP-15066
> URL: https://issues.apache.org/jira/browse/HADOOP-15066
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: scripts
>Affects Versions: 3.0.0
>Reporter: Arpit Agarwal
>Assignee: Bharat Viswanadham
> Attachments: HADOOP-15066.00.patch
>
>
> There is a spurious error when stopping a secure datanode.
> {code}
> # hdfs --daemon stop datanode
> cat: /var/run/hadoop/hdfs//hadoop-hdfs-root-datanode.pid: No such file or 
> directory
> WARNING: pid has changed for datanode, skip deleting pid file
> cat: /var/run/hadoop/hdfs//hadoop-hdfs-root-datanode.pid: No such file or 
> directory
> WARNING: daemon pid has changed for datanode, skip deleting daemon pid file
> {code}
> The error appears benign. The service was stopped correctly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15066) Spurious error stopping secure datanode

2017-11-22 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263308#comment-16263308
 ] 

Arpit Agarwal commented on HADOOP-15066:


Thanks for the fix [~bharatviswa]. Couple of comments:
# The following can be replaced with elif:
{code}
else
  if [[ -f "${pidfile}" ]]; then
{code}
# We'd also need a fix in hadoop_stop_secure_daemon here. The pid equality 
checks should be skipped if the pid file no longer exists.
{code}
  if [[ "${daemon_pid}" = "${cur_daemon_pid}" ]]; then
rm -f "${daemonpidfile}" >/dev/null 2>&1
  else
hadoop_error "WARNING: daemon pid has changed for ${command}, skip deleting 
daemon pid file"
  fi

  if [[ "${priv_pid}" = "${cur_priv_pid}" ]]; then
rm -f "${privpidfile}" >/dev/null 2>&1
  else
hadoop_error "WARNING: priv pid has changed for ${command}, skip deleting 
priv pid file"
  fi
{code}

> Spurious error stopping secure datanode
> ---
>
> Key: HADOOP-15066
> URL: https://issues.apache.org/jira/browse/HADOOP-15066
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: scripts
>Affects Versions: 3.0.0
>Reporter: Arpit Agarwal
>Assignee: Bharat Viswanadham
> Attachments: HADOOP-15066.00.patch
>
>
> There is a spurious error when stopping a secure datanode.
> {code}
> # hdfs --daemon stop datanode
> cat: /var/run/hadoop/hdfs//hadoop-hdfs-root-datanode.pid: No such file or 
> directory
> WARNING: pid has changed for datanode, skip deleting pid file
> cat: /var/run/hadoop/hdfs//hadoop-hdfs-root-datanode.pid: No such file or 
> directory
> WARNING: daemon pid has changed for datanode, skip deleting daemon pid file
> {code}
> The error appears benign. The service was stopped correctly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Assigned] (HADOOP-15066) Spurious error stopping secure datanode

2017-11-22 Thread Arpit Agarwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal reassigned HADOOP-15066:
--

Assignee: Bharat Viswanadham

> Spurious error stopping secure datanode
> ---
>
> Key: HADOOP-15066
> URL: https://issues.apache.org/jira/browse/HADOOP-15066
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: scripts
>Affects Versions: 3.0.0
>Reporter: Arpit Agarwal
>Assignee: Bharat Viswanadham
> Attachments: HADOOP-15066.00.patch
>
>
> There is a spurious error when stopping a secure datanode.
> {code}
> # hdfs --daemon stop datanode
> cat: /var/run/hadoop/hdfs//hadoop-hdfs-root-datanode.pid: No such file or 
> directory
> WARNING: pid has changed for datanode, skip deleting pid file
> cat: /var/run/hadoop/hdfs//hadoop-hdfs-root-datanode.pid: No such file or 
> directory
> WARNING: daemon pid has changed for datanode, skip deleting daemon pid file
> {code}
> The error appears benign. The service was stopped correctly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15067) GC time percentage reported in JvmMetrics should be a gauge, not counter

2017-11-22 Thread Andrew Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263302#comment-16263302
 ] 

Andrew Wang commented on HADOOP-15067:
--

SGTM, looks like a simple fix.

> GC time percentage reported in JvmMetrics should be a gauge, not counter
> 
>
> Key: HADOOP-15067
> URL: https://issues.apache.org/jira/browse/HADOOP-15067
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
> Attachments: HADOOP-15067.01.patch
>
>
> A new GcTimeMonitor class has been recently added, and the corresponding 
> metrics added in JvmMetrics.java, line 190:
> {code}
> if (gcTimeMonitor != null) {
>   rb.addCounter(GcTimePercentage,
>   gcTimeMonitor.getLatestGcData().getGcTimePercentage());
> }
> {code}
> Since GC time percentage can go up and down, a gauge rather than counter 
> should be used to report it. That is, {{addCounter}} should be replaced with 
> {{addGauge}} above.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15066) Spurious error stopping secure datanode

2017-11-22 Thread Bharat Viswanadham (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham updated HADOOP-15066:

Status: Patch Available  (was: Open)

> Spurious error stopping secure datanode
> ---
>
> Key: HADOOP-15066
> URL: https://issues.apache.org/jira/browse/HADOOP-15066
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: scripts
>Affects Versions: 3.0.0
>Reporter: Arpit Agarwal
> Attachments: HADOOP-15066.00.patch
>
>
> There is a spurious error when stopping a secure datanode.
> {code}
> # hdfs --daemon stop datanode
> cat: /var/run/hadoop/hdfs//hadoop-hdfs-root-datanode.pid: No such file or 
> directory
> WARNING: pid has changed for datanode, skip deleting pid file
> cat: /var/run/hadoop/hdfs//hadoop-hdfs-root-datanode.pid: No such file or 
> directory
> WARNING: daemon pid has changed for datanode, skip deleting daemon pid file
> {code}
> The error appears benign. The service was stopped correctly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15066) Spurious error stopping secure datanode

2017-11-22 Thread Bharat Viswanadham (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharat Viswanadham updated HADOOP-15066:

Attachment: HADOOP-15066.00.patch

> Spurious error stopping secure datanode
> ---
>
> Key: HADOOP-15066
> URL: https://issues.apache.org/jira/browse/HADOOP-15066
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: scripts
>Affects Versions: 3.0.0
>Reporter: Arpit Agarwal
> Attachments: HADOOP-15066.00.patch
>
>
> There is a spurious error when stopping a secure datanode.
> {code}
> # hdfs --daemon stop datanode
> cat: /var/run/hadoop/hdfs//hadoop-hdfs-root-datanode.pid: No such file or 
> directory
> WARNING: pid has changed for datanode, skip deleting pid file
> cat: /var/run/hadoop/hdfs//hadoop-hdfs-root-datanode.pid: No such file or 
> directory
> WARNING: daemon pid has changed for datanode, skip deleting daemon pid file
> {code}
> The error appears benign. The service was stopped correctly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-14876) Create downstream developer docs from the compatibility guidelines

2017-11-22 Thread Daniel Templeton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated HADOOP-14876:
--
Fix Version/s: (was: 3.0.1)
   3.0.0

> Create downstream developer docs from the compatibility guidelines
> --
>
> Key: HADOOP-14876
> URL: https://issues.apache.org/jira/browse/HADOOP-14876
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 3.0.0-beta1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
> Fix For: 3.0.0, 3.1.0
>
> Attachments: Compatibility.pdf, DownstreamDev.pdf, 
> HADOOP-14876.001.patch, HADOOP-14876.002.patch, HADOOP-14876.003.patch, 
> HADOOP-14876.004.patch, HADOOP-14876.005.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HADOOP-15067) GC time percentage reported in JvmMetrics should be a gauge, not counter

2017-11-22 Thread Xiao Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263285#comment-16263285
 ] 

Xiao Chen edited comment on HADOOP-15067 at 11/22/17 8:22 PM:
--

+1 pending jenkins. Thanks Misha. Also thanks [~xkrogen] for the good catch.

Do you have any other comments Erik?

[~andrew.wang] FYI this would be a useful supportability fix that we'd like to 
add to 3.0.0, so downstream could use hadoop-3.0.0 package.


was (Author: xiaochen):
+1 pending jenkins. Thanks Misha.

[~andrew.wang] FYI this would be a useful supportability fix that we'd like to 
add to 3.0.0, so downstream could use hadoop-3.0.0 package.

> GC time percentage reported in JvmMetrics should be a gauge, not counter
> 
>
> Key: HADOOP-15067
> URL: https://issues.apache.org/jira/browse/HADOOP-15067
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
> Attachments: HADOOP-15067.01.patch
>
>
> A new GcTimeMonitor class has been recently added, and the corresponding 
> metrics added in JvmMetrics.java, line 190:
> {code}
> if (gcTimeMonitor != null) {
>   rb.addCounter(GcTimePercentage,
>   gcTimeMonitor.getLatestGcData().getGcTimePercentage());
> }
> {code}
> Since GC time percentage can go up and down, a gauge rather than counter 
> should be used to report it. That is, {{addCounter}} should be replaced with 
> {{addGauge}} above.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15067) GC time percentage reported in JvmMetrics should be a gauge, not counter

2017-11-22 Thread Xiao Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263285#comment-16263285
 ] 

Xiao Chen commented on HADOOP-15067:


+1 pending jenkins. Thanks Misha.

[~andrew.wang] FYI this would be a useful supportability fix that we'd like to 
add to 3.0.0, so downstream could use hadoop-3.0.0 package.

> GC time percentage reported in JvmMetrics should be a gauge, not counter
> 
>
> Key: HADOOP-15067
> URL: https://issues.apache.org/jira/browse/HADOOP-15067
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
> Attachments: HADOOP-15067.01.patch
>
>
> A new GcTimeMonitor class has been recently added, and the corresponding 
> metrics added in JvmMetrics.java, line 190:
> {code}
> if (gcTimeMonitor != null) {
>   rb.addCounter(GcTimePercentage,
>   gcTimeMonitor.getLatestGcData().getGcTimePercentage());
> }
> {code}
> Since GC time percentage can go up and down, a gauge rather than counter 
> should be used to report it. That is, {{addCounter}} should be replaced with 
> {{addGauge}} above.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-14876) Create downstream developer docs from the compatibility guidelines

2017-11-22 Thread Andrew Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263264#comment-16263264
 ] 

Andrew Wang commented on HADOOP-14876:
--

I like better docs, please go ahead and backport. Thanks!

> Create downstream developer docs from the compatibility guidelines
> --
>
> Key: HADOOP-14876
> URL: https://issues.apache.org/jira/browse/HADOOP-14876
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 3.0.0-beta1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
> Fix For: 3.1.0, 3.0.1
>
> Attachments: Compatibility.pdf, DownstreamDev.pdf, 
> HADOOP-14876.001.patch, HADOOP-14876.002.patch, HADOOP-14876.003.patch, 
> HADOOP-14876.004.patch, HADOOP-14876.005.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15067) GC time percentage reported in JvmMetrics should be a gauge, not counter

2017-11-22 Thread Misha Dmitriev (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Misha Dmitriev updated HADOOP-15067:

Status: Patch Available  (was: In Progress)

> GC time percentage reported in JvmMetrics should be a gauge, not counter
> 
>
> Key: HADOOP-15067
> URL: https://issues.apache.org/jira/browse/HADOOP-15067
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
> Attachments: HADOOP-15067.01.patch
>
>
> A new GcTimeMonitor class has been recently added, and the corresponding 
> metrics added in JvmMetrics.java, line 190:
> {code}
> if (gcTimeMonitor != null) {
>   rb.addCounter(GcTimePercentage,
>   gcTimeMonitor.getLatestGcData().getGcTimePercentage());
> }
> {code}
> Since GC time percentage can go up and down, a gauge rather than counter 
> should be used to report it. That is, {{addCounter}} should be replaced with 
> {{addGauge}} above.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Work started] (HADOOP-15067) GC time percentage reported in JvmMetrics should be a gauge, not counter

2017-11-22 Thread Misha Dmitriev (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HADOOP-15067 started by Misha Dmitriev.
---
> GC time percentage reported in JvmMetrics should be a gauge, not counter
> 
>
> Key: HADOOP-15067
> URL: https://issues.apache.org/jira/browse/HADOOP-15067
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
> Attachments: HADOOP-15067.01.patch
>
>
> A new GcTimeMonitor class has been recently added, and the corresponding 
> metrics added in JvmMetrics.java, line 190:
> {code}
> if (gcTimeMonitor != null) {
>   rb.addCounter(GcTimePercentage,
>   gcTimeMonitor.getLatestGcData().getGcTimePercentage());
> }
> {code}
> Since GC time percentage can go up and down, a gauge rather than counter 
> should be used to report it. That is, {{addCounter}} should be replaced with 
> {{addGauge}} above.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15067) GC time percentage reported in JvmMetrics should be a gauge, not counter

2017-11-22 Thread Misha Dmitriev (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Misha Dmitriev updated HADOOP-15067:

Attachment: HADOOP-15067.01.patch

> GC time percentage reported in JvmMetrics should be a gauge, not counter
> 
>
> Key: HADOOP-15067
> URL: https://issues.apache.org/jira/browse/HADOOP-15067
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
> Attachments: HADOOP-15067.01.patch
>
>
> A new GcTimeMonitor class has been recently added, and the corresponding 
> metrics added in JvmMetrics.java, line 190:
> {code}
> if (gcTimeMonitor != null) {
>   rb.addCounter(GcTimePercentage,
>   gcTimeMonitor.getLatestGcData().getGcTimePercentage());
> }
> {code}
> Since GC time percentage can go up and down, a gauge rather than counter 
> should be used to report it. That is, {{addCounter}} should be replaced with 
> {{addGauge}} above.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13282) S3 blob etags to be made visible in status/getFileChecksum() calls

2017-11-22 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263248#comment-16263248
 ] 

Hadoop QA commented on HADOOP-13282:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
8s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
17s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
 1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 58s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
18s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
15s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 11m 
26s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
2m  0s{color} | {color:orange} root: The patch generated 2 new + 3 unchanged - 
0 fixed = 5 total (was 3) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
38s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 3 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 46s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
17s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  8m 
34s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
39s{color} | {color:green} hadoop-aws in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
32s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 94m 28s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HADOOP-13282 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12898907/HADOOP-13282-004.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 166552a9b586 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / d42a336 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| checkstyle |

[jira] [Commented] (HADOOP-14960) Add GC time percentage monitor/alerter

2017-11-22 Thread Misha Dmitriev (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263222#comment-16263222
 ] 

Misha Dmitriev commented on HADOOP-14960:
-

Created https://issues.apache.org/jira/browse/HADOOP-15067, will post a patch 
momentarily.

> Add GC time percentage monitor/alerter
> --
>
> Key: HADOOP-14960
> URL: https://issues.apache.org/jira/browse/HADOOP-14960
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
> Fix For: 3.0.0, 2.10.0
>
> Attachments: HADOOP-14960.01.patch, HADOOP-14960.02.patch, 
> HADOOP-14960.03.patch, HADOOP-14960.04.patch
>
>
> Currently class {{org.apache.hadoop.metrics2.source.JvmMetrics}} provides 
> several metrics related to GC. Unfortunately, all these metrics are not as 
> useful as they could be, because they don't answer the first and most 
> important question related to GC and JVM health: what percentage of time my 
> JVM is paused in GC? This percentage, calculated as the sum of the GC pauses 
> over some period, like 1 minute, divided by that period - is the most 
> convenient measure of the GC health because:
> - it is just one number, and it's clear that, say, 1..5% is good, but 80..90% 
> is really bad
> - it allows for easy apple-to-apple comparison between runs, even between 
> different apps
> - when this metric reaches some critical value like 70%, it almost always 
> indicates a "GC death spiral", from which the app can recover only if it 
> drops some task(s) etc.
> The existing "total GC time", "total number of GCs" etc. metrics only give 
> numbers that can be used to rougly estimate this percentage. Thus it is 
> suggested to add a new metric to this class, and possibly allow users to 
> register handlers that will be automatically invoked if this metric reaches 
> the specified threshold.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Assigned] (HADOOP-15067) GC time percentage reported in JvmMetrics should be a gauge, not counter

2017-11-22 Thread Misha Dmitriev (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Misha Dmitriev reassigned HADOOP-15067:
---

Assignee: Misha Dmitriev

> GC time percentage reported in JvmMetrics should be a gauge, not counter
> 
>
> Key: HADOOP-15067
> URL: https://issues.apache.org/jira/browse/HADOOP-15067
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
>
> A new GcTimeMonitor class has been recently added, and the corresponding 
> metrics added in JvmMetrics.java, line 190:
> {code}
> if (gcTimeMonitor != null) {
>   rb.addCounter(GcTimePercentage,
>   gcTimeMonitor.getLatestGcData().getGcTimePercentage());
> }
> {code}
> Since GC time percentage can go up and down, a gauge rather than counter 
> should be used to report it. That is, {{addCounter}} should be replaced with 
> {{addGauge}} above.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Created] (HADOOP-15067) GC time percentage reported in JvmMetrics should be a gauge, not counter

2017-11-22 Thread Misha Dmitriev (JIRA)

Misha Dmitriev created HADOOP-15067:
---

 Summary: GC time percentage reported in JvmMetrics should be a 
gauge, not counter
 Key: HADOOP-15067
 URL: https://issues.apache.org/jira/browse/HADOOP-15067
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Misha Dmitriev


A new GcTimeMonitor class has been recently added, and the corresponding 
metrics added in JvmMetrics.java, line 190:

{code}
if (gcTimeMonitor != null) {
  rb.addCounter(GcTimePercentage,
  gcTimeMonitor.getLatestGcData().getGcTimePercentage());
}
{code}

Since GC time percentage can go up and down, a gauge rather than counter should 
be used to report it. That is, {{addCounter}} should be replaced with 
{{addGauge}} above.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13493) Compatibility Docs should clarify the policy for what takes precedence when a conflict is found

2017-11-22 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263219#comment-16263219
 ] 

Hadoop QA commented on HADOOP-13493:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  9m 
47s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
 4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
26m 42s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  5s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 49m  9s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HADOOP-13493 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12898916/HADOOP-13493.002.patch
 |
| Optional Tests |  asflicense  mvnsite  |
| uname | Linux 900f2f4b879b 3.13.0-129-generic #178-Ubuntu SMP Fri Aug 11 
12:48:20 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 785732c |
| maven | version: Apache Maven 3.3.9 |
| Max. process+thread count | 341 (vs. ulimit of 5000) |
| modules | C: hadoop-common-project/hadoop-common U: 
hadoop-common-project/hadoop-common |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/13739/console |
| Powered by | Apache Yetus 0.7.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Compatibility Docs should clarify the policy for what takes precedence when a 
> conflict is found
> ---
>
> Key: HADOOP-13493
> URL: https://issues.apache.org/jira/browse/HADOOP-13493
> Project: Hadoop Common
>  Issue Type: Task
>  Components: documentation
>Affects Versions: 2.7.2
>Reporter: Robert Kanter
>Assignee: Daniel Templeton
>Priority: Critical
> Attachments: HADOOP-13493.001.patch, HADOOP-13493.002.patch
>
>
> The Compatibility Docs 
> (https://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/Compatibility.html#Java_API)
>  list the policies for Private, Public, not annotated, etc Classes and 
> members, but it doesn't say what happens when there's a conflict.  We should 
> try obviously try to avoid this situation, but it would be good to explicitly 
> state what takes precedence.
> As an example, until YARN-3225 made it consistent, {{RefreshNodesRequest}} 
> looked like this:
> {code:java}
> @Private
> @Stable
> public abstract class RefreshNodesRequest {
>   @Public
>   @Stable
>   public static RefreshNodesRequest newInstance() {
> RefreshNodesRequest request = 
> Records.newRecord(RefreshNodesRequest.class);
> return request;
>   }
> }
> {code}
> Note that the class is marked {{\@Private}}, but the method is marked 
> {{\@Public}}.
> In this example, I'd say that the class level should have priority.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15066) Spurious error stopping secure datanode

2017-11-22 Thread Arpit Agarwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HADOOP-15066:
---
Description: 
There is a spurious error when stopping a secure datanode.

{code}
# hdfs --daemon stop datanode
cat: /var/run/hadoop/hdfs//hadoop-hdfs-root-datanode.pid: No such file or 
directory
WARNING: pid has changed for datanode, skip deleting pid file
cat: /var/run/hadoop/hdfs//hadoop-hdfs-root-datanode.pid: No such file or 
directory
WARNING: daemon pid has changed for datanode, skip deleting daemon pid file
{code}

The error appears benign. The service was stopped correctly.

  was:
Looks like there is a spurious error when stopping a secure datanode.

{code}
# hdfs --daemon stop datanode
cat: /var/run/hadoop/hdfs//hadoop-hdfs-root-datanode.pid: No such file or 
directory
WARNING: pid has changed for datanode, skip deleting pid file
cat: /var/run/hadoop/hdfs//hadoop-hdfs-root-datanode.pid: No such file or 
directory
WARNING: daemon pid has changed for datanode, skip deleting daemon pid file
{code}



> Spurious error stopping secure datanode
> ---
>
> Key: HADOOP-15066
> URL: https://issues.apache.org/jira/browse/HADOOP-15066
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: scripts
>Affects Versions: 3.0.0
>Reporter: Arpit Agarwal
>
> There is a spurious error when stopping a secure datanode.
> {code}
> # hdfs --daemon stop datanode
> cat: /var/run/hadoop/hdfs//hadoop-hdfs-root-datanode.pid: No such file or 
> directory
> WARNING: pid has changed for datanode, skip deleting pid file
> cat: /var/run/hadoop/hdfs//hadoop-hdfs-root-datanode.pid: No such file or 
> directory
> WARNING: daemon pid has changed for datanode, skip deleting daemon pid file
> {code}
> The error appears benign. The service was stopped correctly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15066) Spurious error stopping secure datanode

2017-11-22 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263168#comment-16263168
 ] 

Arpit Agarwal commented on HADOOP-15066:


The error is from {{hadoop_stop_daemon}} in _hadoop-functions.sh_.

{code}
pid=$(cat "$pidfile")

kill "${pid}" >/dev/null 2>&1
...
  cur_pid=$(cat "$pidfile")
...
  if [[ "${pid}" = "${cur_pid}" ]]; then
rm -f "${pidfile}" >/dev/null 2>&1
  else
hadoop_error "WARNING: pid has changed for ${cmd}, skip deleting pid 
file"
{code}

It looks like jsvc auto-deletes the pid file when the process is killed with 
SIGTERM. The check for changed pid likely needs to be skipped if the pid file 
doesn't exist.

> Spurious error stopping secure datanode
> ---
>
> Key: HADOOP-15066
> URL: https://issues.apache.org/jira/browse/HADOOP-15066
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: scripts
>Affects Versions: 3.0.0
>Reporter: Arpit Agarwal
>
> Looks like there is a spurious error when stopping a secure datanode.
> {code}
> # hdfs --daemon stop datanode
> cat: /var/run/hadoop/hdfs//hadoop-hdfs-root-datanode.pid: No such file or 
> directory
> WARNING: pid has changed for datanode, skip deleting pid file
> cat: /var/run/hadoop/hdfs//hadoop-hdfs-root-datanode.pid: No such file or 
> directory
> WARNING: daemon pid has changed for datanode, skip deleting daemon pid file
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Created] (HADOOP-15066) Spurious error stopping secure datanode

2017-11-22 Thread Arpit Agarwal (JIRA)

Arpit Agarwal created HADOOP-15066:
--

 Summary: Spurious error stopping secure datanode
 Key: HADOOP-15066
 URL: https://issues.apache.org/jira/browse/HADOOP-15066
 Project: Hadoop Common
  Issue Type: Bug
  Components: scripts
Affects Versions: 3.0.0
Reporter: Arpit Agarwal


Looks like there is a spurious error when stopping a secure datanode.

{code}
# hdfs --daemon stop datanode
cat: /var/run/hadoop/hdfs//hadoop-hdfs-root-datanode.pid: No such file or 
directory
WARNING: pid has changed for datanode, skip deleting pid file
cat: /var/run/hadoop/hdfs//hadoop-hdfs-root-datanode.pid: No such file or 
directory
WARNING: daemon pid has changed for datanode, skip deleting daemon pid file
{code}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-14600) LocatedFileStatus constructor forces RawLocalFS to exec a process to get the permissions

2017-11-22 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263136#comment-16263136
 ] 

Hadoop QA commented on HADOOP-14600:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  9m 
27s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 34s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
54s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 11m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 11m 
56s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 41s{color} | {color:orange} hadoop-common-project/hadoop-common: The patch 
generated 19 new + 227 unchanged - 1 fixed = 246 total (was 228) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 50s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  9m 
15s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
34s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 89m 49s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HADOOP-14600 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12898893/HADOOP-14600.009.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  cc  |
| uname | Linux ece29766f12c 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / de8b6ca |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/13737/artifact/out/diff-checkstyle-hadoop-common-project_hadoop-common.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/13737/testReport/ |
| Max. process+thread count | 1570 (vs. ulimit of 5000) |
| modules | C: hadoop-common-project/hadoop-common U:

[jira] [Updated] (HADOOP-13493) Compatibility Docs should clarify the policy for what takes precedence when a conflict is found

2017-11-22 Thread Daniel Templeton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated HADOOP-13493:
--
Attachment: HADOOP-13493.002.patch

You're right--that patch was useless. :)  Try this.

> Compatibility Docs should clarify the policy for what takes precedence when a 
> conflict is found
> ---
>
> Key: HADOOP-13493
> URL: https://issues.apache.org/jira/browse/HADOOP-13493
> Project: Hadoop Common
>  Issue Type: Task
>  Components: documentation
>Affects Versions: 2.7.2
>Reporter: Robert Kanter
>Assignee: Daniel Templeton
>Priority: Critical
> Attachments: HADOOP-13493.001.patch, HADOOP-13493.002.patch
>
>
> The Compatibility Docs 
> (https://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/Compatibility.html#Java_API)
>  list the policies for Private, Public, not annotated, etc Classes and 
> members, but it doesn't say what happens when there's a conflict.  We should 
> try obviously try to avoid this situation, but it would be good to explicitly 
> state what takes precedence.
> As an example, until YARN-3225 made it consistent, {{RefreshNodesRequest}} 
> looked like this:
> {code:java}
> @Private
> @Stable
> public abstract class RefreshNodesRequest {
>   @Public
>   @Stable
>   public static RefreshNodesRequest newInstance() {
> RefreshNodesRequest request = 
> Records.newRecord(RefreshNodesRequest.class);
> return request;
>   }
> }
> {code}
> Note that the class is marked {{\@Private}}, but the method is marked 
> {{\@Public}}.
> In this example, I'd say that the class level should have priority.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-13282) S3 blob etags to be made visible in status/getFileChecksum() calls

2017-11-22 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-13282:

Status: Patch Available  (was: Open)

> S3 blob etags to be made visible in status/getFileChecksum() calls
> --
>
> Key: HADOOP-13282
> URL: https://issues.apache.org/jira/browse/HADOOP-13282
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.9.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Attachments: HADOOP-13282-001.patch, HADOOP-13282-002.patch, 
> HADOOP-13282-003.patch, HADOOP-13282-004.patch
>
>
> If the etags of blobs were exported via {{getFileChecksum()}}, it'd be 
> possible to probe for a blob being in sync with a local file. Distcp could 
> use this to decide whether to skip a file or not.
> Now, there's a problem there: distcp needs source and dest filesystems to 
> implement the same algorithm. It'd only work out the box if you were copying 
> between S3 instances. There are also quirks with encryption and multipart: 
> [s3 
> docs|http://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonResponseHeaders.html].
>  At the very least, it's something which could be used when indexing the FS, 
> to check for changes later.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-13282) S3 blob etags to be made visible in status/getFileChecksum() calls

2017-11-22 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-13282:

Attachment: HADOOP-13282-004.patch

Patch 004

This applies to s3a trunk, uses once() to translate the underlying 
getObjectMetadata call (which is retryraw).

Test: s3 london with default encryption

> S3 blob etags to be made visible in status/getFileChecksum() calls
> --
>
> Key: HADOOP-13282
> URL: https://issues.apache.org/jira/browse/HADOOP-13282
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.9.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Attachments: HADOOP-13282-001.patch, HADOOP-13282-002.patch, 
> HADOOP-13282-003.patch, HADOOP-13282-004.patch
>
>
> If the etags of blobs were exported via {{getFileChecksum()}}, it'd be 
> possible to probe for a blob being in sync with a local file. Distcp could 
> use this to decide whether to skip a file or not.
> Now, there's a problem there: distcp needs source and dest filesystems to 
> implement the same algorithm. It'd only work out the box if you were copying 
> between S3 instances. There are also quirks with encryption and multipart: 
> [s3 
> docs|http://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonResponseHeaders.html].
>  At the very least, it's something which could be used when indexing the FS, 
> to check for changes later.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-13493) Compatibility Docs should clarify the policy for what takes precedence when a conflict is found

2017-11-22 Thread Daniel Templeton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated HADOOP-13493:
--
Target Version/s: 3.0.0, 3.1.0  (was: 3.1.0)

> Compatibility Docs should clarify the policy for what takes precedence when a 
> conflict is found
> ---
>
> Key: HADOOP-13493
> URL: https://issues.apache.org/jira/browse/HADOOP-13493
> Project: Hadoop Common
>  Issue Type: Task
>  Components: documentation
>Affects Versions: 2.7.2
>Reporter: Robert Kanter
>Assignee: Daniel Templeton
>Priority: Critical
> Attachments: HADOOP-13493.001.patch
>
>
> The Compatibility Docs 
> (https://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/Compatibility.html#Java_API)
>  list the policies for Private, Public, not annotated, etc Classes and 
> members, but it doesn't say what happens when there's a conflict.  We should 
> try obviously try to avoid this situation, but it would be good to explicitly 
> state what takes precedence.
> As an example, until YARN-3225 made it consistent, {{RefreshNodesRequest}} 
> looked like this:
> {code:java}
> @Private
> @Stable
> public abstract class RefreshNodesRequest {
>   @Public
>   @Stable
>   public static RefreshNodesRequest newInstance() {
> RefreshNodesRequest request = 
> Records.newRecord(RefreshNodesRequest.class);
> return request;
>   }
> }
> {code}
> Note that the class is marked {{\@Private}}, but the method is marked 
> {{\@Public}}.
> In this example, I'd say that the class level should have priority.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-14876) Create downstream developer docs from the compatibility guidelines

2017-11-22 Thread Daniel Templeton (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263036#comment-16263036
 ] 

Daniel Templeton commented on HADOOP-14876:
---

[~andrew.wang], can we pull this in for the respin of 3.0.0 RC?

> Create downstream developer docs from the compatibility guidelines
> --
>
> Key: HADOOP-14876
> URL: https://issues.apache.org/jira/browse/HADOOP-14876
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 3.0.0-beta1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
> Fix For: 3.1.0, 3.0.1
>
> Attachments: Compatibility.pdf, DownstreamDev.pdf, 
> HADOOP-14876.001.patch, HADOOP-14876.002.patch, HADOOP-14876.003.patch, 
> HADOOP-14876.004.patch, HADOOP-14876.005.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Resolved] (HADOOP-14303) Review retry logic on all S3 SDK calls, implement where needed

2017-11-22 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-14303.
-
   Resolution: Fixed
Fix Version/s: 3.1.0

Fixed in HADOOP-13786, with

* Java 8 lambdas API for invoking S3A operations with retry and error 
translation
* All methods calling of the S3 client marked up with their (current) retry 
logic to make clear what's happening and when you don't need to add retry code 
around retry code.
* metrics & stats to track retries
* testing through fault injection
* What seems a good initial Policy (S3ARetryPolicy). Always scope for tuning 
there, especially "what to do about the 400 error code?" For now: treating as 
retryable on all call types (idempotent/non-idempotent) in the hope its 
transient. Fail fast, or at least "fail medium" may be better though.

> Review retry logic on all S3 SDK calls, implement where needed
> --
>
> Key: HADOOP-14303
> URL: https://issues.apache.org/jira/browse/HADOOP-14303
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Fix For: 3.1.0
>
>
> AWS S3, IAM, KMS, DDB etc all throttle callers: the S3A code needs to handle 
> this without failing, as if it slows down its requests it can recover.
> 1. Look at all the places where we are calling S3A via the AWS SDK and make 
> sure we are retrying with some backoff & jitter policy, ideally something 
> unified. This must be more systematic than the case-by-case, 
> problem-by-problem strategy we are implicitly using.
> 2. Many of the AWS S3 SDK calls do implement retry (e.g PUT/multipart PUT), 
> but we need to check the other parts of the process: login, initiate/complete 
> MPU, ...
> Related
> HADOOP-13811 Failed to sanitize XML document destined for handler class
> HADOOP-13664 S3AInputStream to use a retry policy on read failures
> This stuff is all hard to test. A key need is to be able to differentiate 
> recoverable throttle & network failures from unrecoverable problems like: 
> auth, network config (e.g bad endpoint), etc.
> May be the opportunity to add a faulting subclass of Amazon S3 client which 
> can be configured in IT Tests to fail at specific points. Ryan Blue's mcok S3 
> client does this in HADOOP-13786, but it is for 100% mock. I'm thinking of 
> something with similar fault raising, but in front of the real S3A client 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Resolved] (HADOOP-14161) Failed to rename file in S3A during FileOutputFormat commitTask

2017-11-22 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-14161.
-
   Resolution: Won't Fix
Fix Version/s: 3.1.0

I'm closing this as a WONTFIX because the classic FileOutputFormat committer 
isn't the right way to work with data in S3. It should work with HADOOP-13345 
and the consistent listings there, but performance will still suffer. 

# Short term (Hadoop 2.9+): use S3Guard for the consistency you need
# Longer term: Hadoop 3.1+: use the S3A Committers for the performance you want


> Failed to rename file in S3A during FileOutputFormat commitTask
> ---
>
> Key: HADOOP-14161
> URL: https://issues.apache.org/jira/browse/HADOOP-14161
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.7.0, 2.7.1, 2.7.2, 2.7.3
> Environment: spark 2.0.2 with mesos
> hadoop 2.7.2
>Reporter: Luke Miner
>Priority: Minor
> Fix For: 3.1.0
>
>
> I'm getting non deterministic rename errors while writing to S3 using spark 
> and hadoop. The proper permissions are set and this only happens 
> occasionally. It can happen on a job that is as simple as reading in json, 
> repartitioning and then writing out. After this failure occurs, the overall 
> job hangs indefinitely.
> {code}
> org.apache.spark.SparkException: Task failed while writing rows
> at 
> org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:261)
> at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
> at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
> at org.apache.spark.scheduler.Task.run(Task.scala:86)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: Failed to commit task
> at 
> org.apache.spark.sql.execution.datasources.DefaultWriterContainer.org$apache$spark$sql$execution$datasources$DefaultWriterContainer$$commitTask$1(WriterContainer.scala:275)
> at 
> org.apache.spark.sql.execution.datasources.DefaultWriterContainer$$anonfun$writeRows$1.apply$mcV$sp(WriterContainer.scala:257)
> at 
> org.apache.spark.sql.execution.datasources.DefaultWriterContainer$$anonfun$writeRows$1.apply(WriterContainer.scala:252)
> at 
> org.apache.spark.sql.execution.datasources.DefaultWriterContainer$$anonfun$writeRows$1.apply(WriterContainer.scala:252)
> at 
> org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1348)
> at 
> org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:258)
> ... 8 more
> Caused by: java.io.IOException: Failed to rename 
> S3AFileStatus{path=s3a://foo/_temporary/0/_temporary/attempt_201703081855_0018_m_000966_0/part-r-00966-615ed714-58c1-4b89-be56-e47966737c75.snappy.parquet;
>  isDirectory=false; length=111225342; replication=1; blocksize=33554432; 
> modification_time=1488999342000; access_time=0; owner=; group=; 
> permission=rw-rw-rw-; isSymlink=false} to 
> s3a://foo/part-r-00966-615ed714-58c1-4b89-be56-e47966737c75.snappy.parquet
> at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:415)
> at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:428)
> at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:539)
> at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:502)
> at 
> org.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:50)
> at 
> org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:76)
> at 
> org.apache.spark.sql.execution.datasources.BaseWriterContainer.commitTask(WriterContainer.scala:211)
> at 
> org.apache.spark.sql.execution.datasources.DefaultWriterContainer.org$apache$spark$sql$execution$datasources$DefaultWriterContainer$$commitTask$1(WriterContainer.scala:270)
> ... 13 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe,

[jira] [Resolved] (HADOOP-13811) s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to sanitize XML document destined for handler class

2017-11-22 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-13811.
-
   Resolution: Fixed
 Assignee: Steve Loughran
Fix Version/s: 3.1.0

Fixed in HADOOP-13786; client calls are retried on idempotent calls, of which 
getFileStatus is

> s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to 
> sanitize XML document destined for handler class
> -
>
> Key: HADOOP-13811
> URL: https://issues.apache.org/jira/browse/HADOOP-13811
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0, 2.7.3
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Fix For: 3.1.0
>
>
> Sometimes, occasionally, getFileStatus() fails with a stack trace starting 
> with {{com.amazonaws.AmazonClientException: Failed to sanitize XML document 
> destined for handler class}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Resolved] (HADOOP-14381) S3AUtils.translateException to map 503 reponse to => throttling failure

2017-11-22 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-14381.
-
   Resolution: Fixed
Fix Version/s: 3.1.0

Fixed in HADOOP-13786; inconsistent s3 client generated throttle events and so 
can be used to test this. There's a also a metric/statistic on the # fielded at 
the S3A level.

AWS SDK handles a lot of throttling internally, these values aren't picked up

> S3AUtils.translateException to map 503 reponse to => throttling failure
> ---
>
> Key: HADOOP-14381
> URL: https://issues.apache.org/jira/browse/HADOOP-14381
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
> Fix For: 3.1.0
>
>
> When AWS S3 returns "503", it means that the overall set of requests on a 
> part of an S3 bucket exceeds the permitted limit; the client(s) need to 
> throttle back or away for some rebalancing to complete.
> The aws SDK retries 3 times on a 503, but then throws it up. Our code doesn't 
> do anything with that other than create a generic {{AWSS3IOException}}.
> Proposed
> * add a new exception, {{AWSOverloadedException}}
> * raise it on a 503 from S3 (& for s3guard, on DDB complaints)
> * have it include a link to a wiki page on the topic, as well as the path
> * and any other diags
> Code talking to S3 may then be able to catch this and choose to react. Some 
> retry with exponential backoff is the obvious option. Failing, well, that 
> could trigger task reattempts at that part of the query, then job retry 
> —which will again fail, *unless the number of tasks run in parallel is 
> reduced*
> As this throttling is across all clients talking to the same part of a 
> bucket, fixing it is potentially a high level option. We can at least start 
> by reporting things better



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Assigned] (HADOOP-14381) S3AUtils.translateException to map 503 reponse to => throttling failure

2017-11-22 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran reassigned HADOOP-14381:
---

Assignee: Steve Loughran

> S3AUtils.translateException to map 503 reponse to => throttling failure
> ---
>
> Key: HADOOP-14381
> URL: https://issues.apache.org/jira/browse/HADOOP-14381
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Fix For: 3.1.0
>
>
> When AWS S3 returns "503", it means that the overall set of requests on a 
> part of an S3 bucket exceeds the permitted limit; the client(s) need to 
> throttle back or away for some rebalancing to complete.
> The aws SDK retries 3 times on a 503, but then throws it up. Our code doesn't 
> do anything with that other than create a generic {{AWSS3IOException}}.
> Proposed
> * add a new exception, {{AWSOverloadedException}}
> * raise it on a 503 from S3 (& for s3guard, on DDB complaints)
> * have it include a link to a wiki page on the topic, as well as the path
> * and any other diags
> Code talking to S3 may then be able to catch this and choose to react. Some 
> retry with exponential backoff is the obvious option. Failing, well, that 
> could trigger task reattempts at that part of the query, then job retry 
> —which will again fail, *unless the number of tasks run in parallel is 
> reduced*
> As this throttling is across all clients talking to the same part of a 
> bucket, fixing it is potentially a high level option. We can at least start 
> by reporting things better



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Resolved] (HADOOP-13205) S3A to support custom retry policies; failfast on unknown host

2017-11-22 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-13205.
-
   Resolution: Fixed
Fix Version/s: 3.1.0

Fixed in HADOOP-13786

> S3A to support custom retry policies; failfast on unknown host
> --
>
> Key: HADOOP-13205
> URL: https://issues.apache.org/jira/browse/HADOOP-13205
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Fix For: 3.1.0
>
>
> Noticed today that when connections are down, S3A retries on 
> UnknownHostExceptions logging noisily in the process.
> # it should be possible to define or customize retry policies for an FS 
> instance (fail fast, exponential backoff, etc)
> # we may want to explicitly have a fail-fast-if-offline retry policy, 
> catching the common connectivity ones.
> Testing will be fun here.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Resolved] (HADOOP-13664) S3AInputStream to use a retry policy on read failures

2017-11-22 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-13664.
-
   Resolution: Duplicate
 Assignee: Steve Loughran
Fix Version/s: 3.1.0

included in HADOOP-13786: attempts to reopen the connection are wrapped with 
retry logic

> S3AInputStream to use a retry policy on read failures
> -
>
> Key: HADOOP-13664
> URL: https://issues.apache.org/jira/browse/HADOOP-13664
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Fix For: 3.1.0
>
>
> {{S3AInputStream}} has some retry logic to handle failures on a read: log and 
> retry. We should move this over to a (possibly hard coded RetryPolicy with 
> some sleep logic, so that longer-than-just-transient read failures can be 
> handled.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-14971) Merge S3A committers into trunk

2017-11-22 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-14971:

   Resolution: Duplicate
Fix Version/s: 3.1.0
   Status: Resolved  (was: Patch Available)

> Merge S3A committers into trunk
> ---
>
> Key: HADOOP-14971
> URL: https://issues.apache.org/jira/browse/HADOOP-14971
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Fix For: 3.1.0
>
> Attachments: HADOOP-13786-040.patch, HADOOP-13786-041.patch
>
>
> Merge the HADOOP-13786 committer into trunk. This branch is being set up as a 
> github PR for review there & to keep it out the mailboxes of the watchers on 
> the main JIRA



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15003) Merge S3A committers into trunk: Yetus patch checker

2017-11-22 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15003:

   Resolution: Duplicate
Fix Version/s: 3.1.0
   Status: Resolved  (was: Patch Available)

thanks, committed under the main JIRA, closing this as a duplicate. 

> Merge S3A committers into trunk: Yetus patch checker
> 
>
> Key: HADOOP-15003
> URL: https://issues.apache.org/jira/browse/HADOOP-15003
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Fix For: 3.1.0
>
> Attachments: HADOOP-13786-041.patch, HADOOP-13786-042.patch, 
> HADOOP-13786-043.patch, HADOOP-13786-044.patch, HADOOP-13786-045.patch, 
> HADOOP-13786-046.patch, HADOOP-13786-047.patch, HADOOP-13786-048.patch, 
> HADOOP-13786-049.patch, HADOOP-13786-050.patch, HADOOP-13786-051.patch, 
> HADOOP-13786-052.patch, HADOOP-13786-053.patch, HADOOP-15033-testfix-1.diff
>
>
> This is a Yetus only JIRA created to have Yetus review the 
> HADOOP-13786/HADOOP-14971 patch as a .patch file, as the review PR 
> [https://github.com/apache/hadoop/pull/282] is stopping this happening in 
> HADOOP-14971.
> Reviews should go into the PR/other task



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-13786) Add S3A committer for zero-rename commits to S3 endpoints

2017-11-22 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-13786:

   Resolution: Fixed
Fix Version/s: 3.1.0
   Status: Resolved  (was: Patch Available)

This now committed! Thank you all for your support, insight, testing, reviews! 

Special mention of : Sanjay Radia, Ryan Blue, Ewan Higgs, Mingliang Liu and 
extra especially Aaron Fabbri!

Not only does this patch add the committer, it adds (configurable) retry policy 
to every single call s3a makes of the AWS s3 SDK, with the inconsistent s3 
client now configurable to simulate throttling events. Everyone gets to see how 
their code handles the presence of transient throttle failures.

Finally, I now know more about Hadoop & Spark commit protocols than I never 
knew I needed to, as well as all those nuances of S3 which matter for that. 
I'll have to make more use of that knowledge, somehow.

> Add S3A committer for zero-rename commits to S3 endpoints
> -
>
> Key: HADOOP-13786
> URL: https://issues.apache.org/jira/browse/HADOOP-13786
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Fix For: 3.1.0
>
> Attachments: HADOOP-13786-036.patch, HADOOP-13786-037.patch, 
> HADOOP-13786-038.patch, HADOOP-13786-039.patch, 
> HADOOP-13786-HADOOP-13345-001.patch, HADOOP-13786-HADOOP-13345-002.patch, 
> HADOOP-13786-HADOOP-13345-003.patch, HADOOP-13786-HADOOP-13345-004.patch, 
> HADOOP-13786-HADOOP-13345-005.patch, HADOOP-13786-HADOOP-13345-006.patch, 
> HADOOP-13786-HADOOP-13345-006.patch, HADOOP-13786-HADOOP-13345-007.patch, 
> HADOOP-13786-HADOOP-13345-009.patch, HADOOP-13786-HADOOP-13345-010.patch, 
> HADOOP-13786-HADOOP-13345-011.patch, HADOOP-13786-HADOOP-13345-012.patch, 
> HADOOP-13786-HADOOP-13345-013.patch, HADOOP-13786-HADOOP-13345-015.patch, 
> HADOOP-13786-HADOOP-13345-016.patch, HADOOP-13786-HADOOP-13345-017.patch, 
> HADOOP-13786-HADOOP-13345-018.patch, HADOOP-13786-HADOOP-13345-019.patch, 
> HADOOP-13786-HADOOP-13345-020.patch, HADOOP-13786-HADOOP-13345-021.patch, 
> HADOOP-13786-HADOOP-13345-022.patch, HADOOP-13786-HADOOP-13345-023.patch, 
> HADOOP-13786-HADOOP-13345-024.patch, HADOOP-13786-HADOOP-13345-025.patch, 
> HADOOP-13786-HADOOP-13345-026.patch, HADOOP-13786-HADOOP-13345-027.patch, 
> HADOOP-13786-HADOOP-13345-028.patch, HADOOP-13786-HADOOP-13345-028.patch, 
> HADOOP-13786-HADOOP-13345-029.patch, HADOOP-13786-HADOOP-13345-030.patch, 
> HADOOP-13786-HADOOP-13345-031.patch, HADOOP-13786-HADOOP-13345-032.patch, 
> HADOOP-13786-HADOOP-13345-033.patch, HADOOP-13786-HADOOP-13345-035.patch, 
> MAPREDUCE-6823-003.patch, cloud-intergration-test-failure.log, 
> objectstore.pdf, s3committer-master.zip
>
>
> A goal of this code is "support O(1) commits to S3 repositories in the 
> presence of failures". Implement it, including whatever is needed to 
> demonstrate the correctness of the algorithm. (that is, assuming that s3guard 
> provides a consistent view of the presence/absence of blobs, show that we can 
> commit directly).
> I consider ourselves free to expose the blobstore-ness of the s3 output 
> streams (ie. not visible until the close()), if we need to use that to allow 
> us to abort commit operations.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-14600) LocatedFileStatus constructor forces RawLocalFS to exec a process to get the permissions

2017-11-22 Thread Chris Douglas (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated HADOOP-14600:
---
Attachment: HADOOP-14600.009.patch

Attaching identical patch, to retry Jenkins...

> LocatedFileStatus constructor forces RawLocalFS to exec a process to get the 
> permissions
> 
>
> Key: HADOOP-14600
> URL: https://issues.apache.org/jira/browse/HADOOP-14600
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 2.7.3
> Environment: file:// in a dir with many files
>Reporter: Steve Loughran
>Assignee: Ping Liu
> Attachments: HADOOP-14600.001.patch, HADOOP-14600.002.patch, 
> HADOOP-14600.003.patch, HADOOP-14600.004.patch, HADOOP-14600.005.patch, 
> HADOOP-14600.006.patch, HADOOP-14600.007.patch, HADOOP-14600.008.patch, 
> HADOOP-14600.009.patch, TestRawLocalFileSystemContract.java
>
>
> Reported in SPARK-21137. a {{FileSystem.listStatus}} call really craws 
> against the local FS, because {{FileStatus.getPemissions}} call forces  
> {{DeprecatedRawLocalFileStatus}} tp spawn a process to read the real UGI 
> values.
> That is: for every other FS, what's a field lookup or even a no-op, on the 
> local FS it's a process exec/spawn, with all the costs. This gets expensive 
> if you have many files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Created] (HADOOP-15065) Make mapreduce specific GenericOptionsParser arguments optional

2017-11-22 Thread Elek, Marton (JIRA)

Elek, Marton created HADOOP-15065:
-

 Summary: Make mapreduce specific GenericOptionsParser arguments 
optional
 Key: HADOOP-15065
 URL: https://issues.apache.org/jira/browse/HADOOP-15065
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Elek, Marton
Priority: Minor


org.apache.hadoop.util.GenericOptionsParser is widely used to use common 
arguments in all the command line applications.

Some of the common arguments are really generic:

{code}
-D

[jira] [Commented] (HADOOP-15059) 3.0 deployment cannot work with old version MR tar ball which break rolling upgrade

2017-11-22 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16262809#comment-16262809
 ] 

Jason Lowe commented on HADOOP-15059:
-

bq. Are we going to keep binary compatibility across hadoop-2.x and hadoop-3.x?

Wire compatibility between 2.x clients and 3.x servers is a prerequisite to 
supporting a rolling upgrade from 2.x to 3.x, but I do not think everyone 
realizes wire compatibility between a 3.x client and a 2.x server is also very 
important to many of our users.  There are many cases where more than one 
cluster is involved in a workflow.  Requiring that all clusters upgrade from 
2.x to 3.x simultaneously is a huge hurdle for adoption, and most users will 
upgrade them one at a time.  As individual clusters upgrade there will be 
clients/jobs on a newly upgraded 3.x cluster trying to interact with an older 
2.x cluster.

Back to the issue of launching jobs using an incompatible token format -- 
here's a couple of options we could consider:

1) YARN nodemanager writes out *two* token credential files, the original 2.x 
file for backwards compatibility and a new 3.x file.  The 3.x UGI code looks 
for the new file and falls back to the old one if it cannot find it.  The 2.x 
code will simply load the old format from the original filename as it does 
today.

2) Application submission context contains information on which version of 
credentials to use for an application.  This gets transferred to the container 
launch context for each container, and the nodemanager writes out the 
appropriate credentials version based on what was specified in the container 
launch context.  In other words, the nodemanager knows which version of the 
credentials format the container is expecting to find and writes the token file 
in that format.


> 3.0 deployment cannot work with old version MR tar ball which break rolling 
> upgrade
> ---
>
> Key: HADOOP-15059
> URL: https://issues.apache.org/jira/browse/HADOOP-15059
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: security
>Reporter: Junping Du
>Priority: Blocker
>
> I tried to deploy 3.0 cluster with 2.9 MR tar ball. The MR job is failed 
> because following error:
> {noformat}
> 2017-11-21 12:42:50,911 INFO [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for 
> application appattempt_1511295641738_0003_01
> 2017-11-21 12:42:51,070 WARN [main] org.apache.hadoop.util.NativeCodeLoader: 
> Unable to load native-hadoop library for your platform... using builtin-java 
> classes where applicable
> 2017-11-21 12:42:51,118 FATAL [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
> java.lang.RuntimeException: Unable to determine current user
>   at 
> org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:254)
>   at 
> org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:220)
>   at 
> org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:212)
>   at 
> org.apache.hadoop.conf.Configuration.addResource(Configuration.java:888)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1638)
> Caused by: java.io.IOException: Exception reading 
> /tmp/nm-local-dir/usercache/jdu/appcache/application_1511295641738_0003/container_e03_1511295641738_0003_01_01/container_tokens
>   at 
> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:208)
>   at 
> org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:907)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:820)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:689)
>   at 
> org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:252)
>   ... 4 more
> Caused by: java.io.IOException: Unknown version 1 in token storage.
>   at 
> org.apache.hadoop.security.Credentials.readTokenStorageStream(Credentials.java:226)
>   at 
> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:205)
>   ... 8 more
> 2017-11-21 12:42:51,122 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting 
> with status 1: java.lang.RuntimeException: Unable to determine current user
> {noformat}
> I think it is due to token incompatiblity change between 2.9 and 3.0. As we 
> claim "rolling upgrade" is supported in Hadoop 3, we should fix this before 
> we ship 3.0 otherwise all MR running applications will get stuck during/after 
> upgrade.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HADOOP-15054) upgrade hadoop dependency on commons-codec to 1.11

2017-11-22 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16262801#comment-16262801
 ] 

Wei-Chiu Chuang commented on HADOOP-15054:
--

+1 will commit after Thanksgiving.

> upgrade hadoop dependency on commons-codec to 1.11
> --
>
> Key: HADOOP-15054
> URL: https://issues.apache.org/jira/browse/HADOOP-15054
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.0.0-beta1
>Reporter: PJ Fanning
>Assignee: Bharat Viswanadham
> Attachments: HADOOP-15054.00.patch
>
>
> https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-auth/3.0.0-beta1 
> retains the dependency on an old commons-codec version (1.4).
> And hadoop-common.
> Would it be possible to consider an upgrade to 1.11?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15033) Use java.util.zip.CRC32C for Java 9 and above

2017-11-22 Thread Dmitry Chuyko (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16262782#comment-16262782
 ] 

Dmitry Chuyko commented on HADOOP-15033:


So attached HADOOP-15033.004.patch which is a copy of 
https://patch-diff.githubusercontent.com/raw/apache/hadoop/pull/291.patch 
passes pre-commit QA checks and shows ~4x improvement in benchmarks. Could 
someone please review it?

> Use java.util.zip.CRC32C for Java 9 and above
> -
>
> Key: HADOOP-15033
> URL: https://issues.apache.org/jira/browse/HADOOP-15033
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: performance, util
>Affects Versions: 3.0.0
>Reporter: Dmitry Chuyko
> Attachments: HADOOP-15033.001.patch, HADOOP-15033.001.patch, 
> HADOOP-15033.002.patch, HADOOP-15033.003.patch, HADOOP-15033.003.patch, 
> HADOOP-15033.004.patch
>
>
> java.util.zip.CRC32C implementation is available since Java 9.
> https://docs.oracle.com/javase/9/docs/api/java/util/zip/CRC32C.html
> Platform specific assembler intrinsics make it more effective than any pure 
> Java implementation.
> Hadoop is compiled against Java 8 but class constructor may be accessible 
> with method handle on 9 to instances implementing Checksum in runtime.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15039) move SemaphoredDelegatingExecutor to hadoop-common

2017-11-22 Thread Genmao Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16262756#comment-16262756
 ] 

Genmao Yu commented on HADOOP-15039:


[~ste...@apache.org] take a look please.

> move SemaphoredDelegatingExecutor to hadoop-common
> --
>
> Key: HADOOP-15039
> URL: https://issues.apache.org/jira/browse/HADOOP-15039
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs, fs/oss, fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Minor
> Attachments: HADOOP-15039.001.patch, HADOOP-15039.002.patch, 
> HADOOP-15039.003.patch
>
>
> Detailed discussions in HADOOP-14999 and HADOOP-15027.
> share {{SemaphoredDelegatingExecutor}} and move it to {{hadoop-common}}.
> cc [~ste...@apache.org] 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-13786) Add S3A committer for zero-rename commits to S3 endpoints

2017-11-22 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-13786:

Summary: Add S3A committer for zero-rename commits to S3 endpoints  (was: 
Add S3Guard committer for zero-rename commits to S3 endpoints)

> Add S3A committer for zero-rename commits to S3 endpoints
> -
>
> Key: HADOOP-13786
> URL: https://issues.apache.org/jira/browse/HADOOP-13786
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: HADOOP-13786-036.patch, HADOOP-13786-037.patch, 
> HADOOP-13786-038.patch, HADOOP-13786-039.patch, 
> HADOOP-13786-HADOOP-13345-001.patch, HADOOP-13786-HADOOP-13345-002.patch, 
> HADOOP-13786-HADOOP-13345-003.patch, HADOOP-13786-HADOOP-13345-004.patch, 
> HADOOP-13786-HADOOP-13345-005.patch, HADOOP-13786-HADOOP-13345-006.patch, 
> HADOOP-13786-HADOOP-13345-006.patch, HADOOP-13786-HADOOP-13345-007.patch, 
> HADOOP-13786-HADOOP-13345-009.patch, HADOOP-13786-HADOOP-13345-010.patch, 
> HADOOP-13786-HADOOP-13345-011.patch, HADOOP-13786-HADOOP-13345-012.patch, 
> HADOOP-13786-HADOOP-13345-013.patch, HADOOP-13786-HADOOP-13345-015.patch, 
> HADOOP-13786-HADOOP-13345-016.patch, HADOOP-13786-HADOOP-13345-017.patch, 
> HADOOP-13786-HADOOP-13345-018.patch, HADOOP-13786-HADOOP-13345-019.patch, 
> HADOOP-13786-HADOOP-13345-020.patch, HADOOP-13786-HADOOP-13345-021.patch, 
> HADOOP-13786-HADOOP-13345-022.patch, HADOOP-13786-HADOOP-13345-023.patch, 
> HADOOP-13786-HADOOP-13345-024.patch, HADOOP-13786-HADOOP-13345-025.patch, 
> HADOOP-13786-HADOOP-13345-026.patch, HADOOP-13786-HADOOP-13345-027.patch, 
> HADOOP-13786-HADOOP-13345-028.patch, HADOOP-13786-HADOOP-13345-028.patch, 
> HADOOP-13786-HADOOP-13345-029.patch, HADOOP-13786-HADOOP-13345-030.patch, 
> HADOOP-13786-HADOOP-13345-031.patch, HADOOP-13786-HADOOP-13345-032.patch, 
> HADOOP-13786-HADOOP-13345-033.patch, HADOOP-13786-HADOOP-13345-035.patch, 
> MAPREDUCE-6823-003.patch, cloud-intergration-test-failure.log, 
> objectstore.pdf, s3committer-master.zip
>
>
> A goal of this code is "support O(1) commits to S3 repositories in the 
> presence of failures". Implement it, including whatever is needed to 
> demonstrate the correctness of the algorithm. (that is, assuming that s3guard 
> provides a consistent view of the presence/absence of blobs, show that we can 
> commit directly).
> I consider ourselves free to expose the blobstore-ness of the s3 output 
> streams (ie. not visible until the close()), if we need to use that to allow 
> us to abort commit operations.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-14898) Create official Docker images for development and testing features

2017-11-22 Thread Elek, Marton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elek, Marton updated HADOOP-14898:
--
Attachment: HADOOP-14898.003.tgz

Third version of the base image. It includes the support of Ozone SCM creation 
(can be turned on with env variable).

> Create official Docker images for development and testing features 
> ---
>
> Key: HADOOP-14898
> URL: https://issues.apache.org/jira/browse/HADOOP-14898
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Elek, Marton
>Assignee: Elek, Marton
> Attachments: HADOOP-14898.001.tar.gz, HADOOP-14898.002.tar.gz, 
> HADOOP-14898.003.tgz
>
>
> This is the original mail from the mailing list:
> {code}
> TL;DR: I propose to create official hadoop images and upload them to the 
> dockerhub.
> GOAL/SCOPE: I would like improve the existing documentation with easy-to-use 
> docker based recipes to start hadoop clusters with various configuration.
> The images also could be used to test experimental features. For example 
> ozone could be tested easily with these compose file and configuration:
> https://gist.github.com/elek/1676a97b98f4ba561c9f51fce2ab2ea6
> Or even the configuration could be included in the compose file:
> https://github.com/elek/hadoop/blob/docker-2.8.0/example/docker-compose.yaml
> I would like to create separated example compose files for federation, ha, 
> metrics usage, etc. to make it easier to try out and understand the features.
> CONTEXT: There is an existing Jira 
> https://issues.apache.org/jira/browse/HADOOP-13397
> But it’s about a tool to generate production quality docker images (multiple 
> types, in a flexible way). If no objections, I will create a separated issue 
> to create simplified docker images for rapid prototyping and investigating 
> new features. And register the branch to the dockerhub to create the images 
> automatically.
> MY BACKGROUND: I am working with docker based hadoop/spark clusters quite a 
> while and run them succesfully in different environments (kubernetes, 
> docker-swarm, nomad-based scheduling, etc.) My work is available from here: 
> https://github.com/flokkr but they could handle more complex use cases (eg. 
> instrumenting java processes with btrace, or read/reload configuration from 
> consul).
>  And IMHO in the official hadoop documentation it’s better to suggest to use 
> official apache docker images and not external ones (which could be changed).
> {code}
> The next list will enumerate the key decision points regarding to docker 
> image creating
> A. automated dockerhub build  / jenkins build
> Docker images could be built on the dockerhub (a branch pattern should be 
> defined for a github repository and the location of the Docker files) or 
> could be built on a CI server and pushed.
> The second one is more flexible (it's more easy to create matrix build, for 
> example)
> The first one had the advantage that we can get an additional flag on the 
> dockerhub that the build is automated (and built from the source by the 
> dockerhub).
> The decision is easy as ASF supports the first approach: (see 
> https://issues.apache.org/jira/browse/INFRA-12781?focusedCommentId=15824096=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15824096)
> B. source: binary distribution or source build
> The second question is about creating the docker image. One option is to 
> build the software on the fly during the creation of the docker image the 
> other one is to use the binary releases.
> I suggest to use the second approach as:
> 1. In that case the hadoop:2.7.3 could contain exactly the same hadoop 
> distrubution as the downloadable one
> 2. We don't need to add development tools to the image, the image could be 
> more smaller (which is important as the goal for this image to getting 
> started as fast as possible)
> 3. The docker definition will be more simple (and more easy to maintain)
> Usually this approach is used in other projects (I checked Apache Zeppelin 
> and Apache Nutch)
> C. branch usage
> Other question is the location of the Docker file. It could be on the 
> official source-code branches (branch-2, trunk, etc.) or we can create 
> separated branches for the dockerhub (eg. docker/2.7 docker/2.8 docker/3.0)
> For the first approach it's easier to find the docker images, but it's less 
> flexible. For example if we had a Dockerfile for on the source code it should 
> be used for every release (for example the Docker file from the tag 
> release-3.0.0 should be used for the 3.0 hadoop docker image). In that case 
> the release process is much more harder: in case of a Dockerfile error (which 
> could be test on dockerhub only after the taging), a new release should be 
> added after fixing the Dockerfile.
>

[jira] [Updated] (HADOOP-15063) IOException may be thrown when read from Aliyun OSS in some case

2017-11-22 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15063:
-
Resolution: Duplicate
Status: Resolved  (was: Patch Available)

> IOException may be thrown when read from Aliyun OSS in some case
> 
>
> Key: HADOOP-15063
> URL: https://issues.apache.org/jira/browse/HADOOP-15063
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-alpha2, 3.0.0-beta1
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15063.001.patch
>
>
> IOException will be thrown in this case
> 1. set part size = n(102400)
> 2. assume current position = 0, then partRemaining = 102400
> 3. we call seek(pos = 101802), with pos > position && pos < position + 
> partRemaining, so it will skip pos - position bytes, but partRemaining 
> remains the same
> 4. if we read bytes more than n - pos, it will throw IOException.
> Current code:
> {code:java}
> @Override
>   public synchronized void seek(long pos) throws IOException {
> checkNotClosed();
> if (position == pos) {
>   return;
> } else if (pos > position && pos < position + partRemaining) {
>   AliyunOSSUtils.skipFully(wrappedStream, pos - position);
>   // we need update partRemaining here
>   position = pos;
> } else {
>   reopen(pos);
> }
>   }
> {code}
> Logs:
> java.io.IOException: Failed to read from stream. Remaining:101802
>   at 
> org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182)
>   at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75)
>   at 
> org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92)
> How to re-produce:
> 1. create a file with 10MB size
> 2. 
> {code:java}
> int seekTimes = 150;
> for (int i = 0; i < seekTimes; i++) {
>   long pos = size / (seekTimes - i) - 1;
>   LOG.info("begin seeking for pos: " + pos);
>   byte []buf = new byte[1024];
>   instream.read(pos, buf, 0, 1024);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15063) IOException may be thrown when read from Aliyun OSS in some case

2017-11-22 Thread wujinhu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16262396#comment-16262396
 ] 

wujinhu commented on HADOOP-15063:
--

Thanks for the review.
I found it is the same with https://issues.apache.org/jira/browse/HADOOP-14072
I will close this.

> IOException may be thrown when read from Aliyun OSS in some case
> 
>
> Key: HADOOP-15063
> URL: https://issues.apache.org/jira/browse/HADOOP-15063
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-alpha2, 3.0.0-beta1
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15063.001.patch
>
>
> IOException will be thrown in this case
> 1. set part size = n(102400)
> 2. assume current position = 0, then partRemaining = 102400
> 3. we call seek(pos = 101802), with pos > position && pos < position + 
> partRemaining, so it will skip pos - position bytes, but partRemaining 
> remains the same
> 4. if we read bytes more than n - pos, it will throw IOException.
> Current code:
> {code:java}
> @Override
>   public synchronized void seek(long pos) throws IOException {
> checkNotClosed();
> if (position == pos) {
>   return;
> } else if (pos > position && pos < position + partRemaining) {
>   AliyunOSSUtils.skipFully(wrappedStream, pos - position);
>   // we need update partRemaining here
>   position = pos;
> } else {
>   reopen(pos);
> }
>   }
> {code}
> Logs:
> java.io.IOException: Failed to read from stream. Remaining:101802
>   at 
> org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182)
>   at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75)
>   at 
> org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92)
> How to re-produce:
> 1. create a file with 10MB size
> 2. 
> {code:java}
> int seekTimes = 150;
> for (int i = 0; i < seekTimes; i++) {
>   long pos = size / (seekTimes - i) - 1;
>   LOG.info("begin seeking for pos: " + pos);
>   byte []buf = new byte[1024];
>   instream.read(pos, buf, 0, 1024);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15063) IOException may be thrown when read from Aliyun OSS in some case

2017-11-22 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16262356#comment-16262356
 ] 

Steve Loughran commented on HADOOP-15063:
-

+ [~uncleGen] + [~drankye]

I'll leave it to the OSS experts to review the production code; but the 
argument makes sense, and the patch appears to fix it. But I don't know the 
code well enough to be the reviewer there —let's see what the others say

Test-wise: which endpoint did you run the full module test suite against?

test code comments:

* use try-with-resrouces to autoamtcially close the input stream, even on an 
assert failure
* use assertEquals(56, bytesRead) for an automatic message if the check fails
* if the store is eventually consistent, use a different filename for the 
different test. This guarantees that you don't accidentally get the file from a 
previous test case.
* minor layout change: and use {{bye[] buf}} as the layout for declaring the 
variable (i.e. )

> IOException may be thrown when read from Aliyun OSS in some case
> 
>
> Key: HADOOP-15063
> URL: https://issues.apache.org/jira/browse/HADOOP-15063
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-alpha2, 3.0.0-beta1
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15063.001.patch
>
>
> IOException will be thrown in this case
> 1. set part size = n(102400)
> 2. assume current position = 0, then partRemaining = 102400
> 3. we call seek(pos = 101802), with pos > position && pos < position + 
> partRemaining, so it will skip pos - position bytes, but partRemaining 
> remains the same
> 4. if we read bytes more than n - pos, it will throw IOException.
> Current code:
> {code:java}
> @Override
>   public synchronized void seek(long pos) throws IOException {
> checkNotClosed();
> if (position == pos) {
>   return;
> } else if (pos > position && pos < position + partRemaining) {
>   AliyunOSSUtils.skipFully(wrappedStream, pos - position);
>   // we need update partRemaining here
>   position = pos;
> } else {
>   reopen(pos);
> }
>   }
> {code}
> Logs:
> java.io.IOException: Failed to read from stream. Remaining:101802
>   at 
> org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182)
>   at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75)
>   at 
> org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92)
> How to re-produce:
> 1. create a file with 10MB size
> 2. 
> {code:java}
> int seekTimes = 150;
> for (int i = 0; i < seekTimes; i++) {
>   long pos = size / (seekTimes - i) - 1;
>   LOG.info("begin seeking for pos: " + pos);
>   byte []buf = new byte[1024];
>   instream.read(pos, buf, 0, 1024);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15063) IOException may be thrown when read from Aliyun OSS in some case

2017-11-22 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15063:

Issue Type: Sub-task  (was: Bug)
Parent: HADOOP-13377

> IOException may be thrown when read from Aliyun OSS in some case
> 
>
> Key: HADOOP-15063
> URL: https://issues.apache.org/jira/browse/HADOOP-15063
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/oss
>Affects Versions: 3.0.0-alpha2, 3.0.0-beta1
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15063.001.patch
>
>
> IOException will be thrown in this case
> 1. set part size = n(102400)
> 2. assume current position = 0, then partRemaining = 102400
> 3. we call seek(pos = 101802), with pos > position && pos < position + 
> partRemaining, so it will skip pos - position bytes, but partRemaining 
> remains the same
> 4. if we read bytes more than n - pos, it will throw IOException.
> Current code:
> {code:java}
> @Override
>   public synchronized void seek(long pos) throws IOException {
> checkNotClosed();
> if (position == pos) {
>   return;
> } else if (pos > position && pos < position + partRemaining) {
>   AliyunOSSUtils.skipFully(wrappedStream, pos - position);
>   // we need update partRemaining here
>   position = pos;
> } else {
>   reopen(pos);
> }
>   }
> {code}
> Logs:
> java.io.IOException: Failed to read from stream. Remaining:101802
>   at 
> org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182)
>   at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75)
>   at 
> org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92)
> How to re-produce:
> 1. create a file with 10MB size
> 2. 
> {code:java}
> int seekTimes = 150;
> for (int i = 0; i < seekTimes; i++) {
>   long pos = size / (seekTimes - i) - 1;
>   LOG.info("begin seeking for pos: " + pos);
>   byte []buf = new byte[1024];
>   instream.read(pos, buf, 0, 1024);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15053) new Server(out).write call delay occurs

2017-11-22 Thread Suganya (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suganya updated HADOOP-15053:
-
   Priority: Major  (was: Critical)
Description: 
In createBlockOutputStream, Block details write call takes more time to get 
acknowledgement.

new Sender(out).writeBlock(this.block, .

pipeline has three data nodes.

  was:
Hadoop datastream thread runs after 80th packet (5mb). till then datastreamer 
thread was waiting.

Hadoop file write takes more time for the first 5mb data write process.

*Thread waits here - code*

  while (((!this.streamerClosed) && (!this.hasError) && 
(DFSOutputStream.this.dfsClient.clientRunning) && 
(DFSOutputStream.this.dataQueue.size() == 0) && ((this.stage != 
BlockConstructionStage.DATA_STREAMING) || ((this.stage == 
BlockConstructionStage.DATA_STREAMING) && (now - lastPacket < 
DFSOutputStream.this.dfsClient.getConf().socketTimeout / 2 || (doSleep)) {
  long timeout = 
DFSOutputStream.this.dfsClient.getConf().socketTimeout / 2 - (now - lastPacket);
  timeout = timeout <= 0L ? 1000L : timeout;
  timeout = this.stage == BlockConstructionStage.DATA_STREAMING ? 
timeout : 1000L;
 
*Thread dump:*

Thread-9" #32 daemon prio=5 os_prio=0 tid=0x7fcb79401800 nid=0x2c1b in 
Object.wait() [0x7fcb2c7a2000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:503)
- locked <0x0006c6f95fd0> (a java.util.LinkedList)


*Debug logs:* - here DataStreamer for seq no 0 started after adding 80th packet 
in queue 

1646 [main] DEBUG org.apache.hadoop.hdfs.DFSClient  - Queued packet 80
1646 [main] DEBUG org.apache.hadoop.hdfs.DFSClient  - computePacketChunkSize: 
src=/1/test/file4.txt, chunkSize=516, chunksPerPacket=127, packetSize=65532
1646 [main] DEBUG org.apache.hadoop.hdfs.DFSClient  - DFSClient writeChunk 
allocating new packet seqno=81, src=/1/test/file4.txt, packetSize=65532, 
chunksPerPacket=127, bytesCurBlock=5266944
1646 [main] DEBUG org.apache.hadoop.hdfs.DFSClient  - DFSClient writeChunk 
packet full seqno=81, src=/1/test/file4.txt, bytesCurBlock=5331968, 
blockSize=134217728, appendChunk=false
2022 [Thread-9] DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine  - Call: addBlock 
took 34ms
2022 [Thread-9] DEBUG org.apache.hadoop.hdfs.DFSClient  - pipeline = 
172.20.19.76:50010
2022 [Thread-9] DEBUG org.apache.hadoop.hdfs.DFSClient  - pipeline = 
172.20.9.13:50010
2022 [Thread-9] DEBUG org.apache.hadoop.hdfs.DFSClient  - pipeline = 
172.20.19.70:50010
2022 [Thread-9] DEBUG org.apache.hadoop.hdfs.DFSClient  - Connecting to 
datanode 172.20.19.76:50010
2048 [Thread-9] DEBUG org.apache.hadoop.hdfs.DFSClient  - Send buf size 131072
2090 [DataStreamer for file /1/test/file4.txt block 
BP-2107533656-172.20.14.104-1483595560691:blk_1074141603_401350] DEBUG 
org.apache.hadoop.hdfs.DFSClient  - DataStreamer block 
BP-2107533656-172.20.14.104-1483595560691:blk_1074141603_401350 sending packet 
packet seqno:0 offsetInBlock:0 lastPacketInBlock:false lastByteOffsetInBlock: 
65024
2091 [DataStreamer for file /1/test/file4.txt block 
BP-2107533656-172.20.14.104-1483595560691:blk_1074141603_401350] DEBUG 
org.apache.hadoop.hdfs.DFSClient  - DataStreamer block 
BP-2107533656-172.20.14.104-1483595560691:blk_1074141603_401350 sending packet 
packet seqno:1 offsetInBlock:65024 lastPacketInBlock:false 
lastByteOffsetInBlock: 130048
2091 [DataStreamer for file /1/test/file4.txt block 
BP-2107533656-172.20.14.104-1483595560691:blk_1074141603_401350] DEBUG 
org.apache.hadoop.hdfs.DFSClient  - DataStreamer block 
BP-2107533656-172.20.14.104-1483595560691:blk_1074141603_401350 sending packet 
packet seqno:2 offsetInBlock:130048 lastPacketInBlock:false 
lastByteOffsetInBlock: 195072
2233 [DataStreamer for file /1/test/file4.txt block 
BP-2107533656-172.20.14.104-1483595560691:blk_1074141603_401350] DEBUG 
org.apache.hadoop.hdfs.DFSClient  - DataStreamer block 
BP-2107533656-172.20.14.104-1483595560691:blk_1074141603_401350 sending packet 
packet seqno:3 offsetInBlock:195072 lastPacketInBlock:false 
lastByteOffsetInBlock: 260096
2333 [ResponseProcessor for block 
BP-2107533656-172.20.14.104-1483595560691:blk_1074141603_401350] DEBUG 
org.apache.hadoop.hdfs.DFSClient  - DFSClient seqno: 0 status: SUCCESS status: 
SUCCESS status: SUCCESS downstreamAckTimeNanos: 6015384
2333 [main] DEBUG org.apache.hadoop.hdfs.DFSClient  - Queued packet 81
2334 [main] DEBUG org.apache.hadoop.hdfs.DFSClient  - computePacketChunkSize: 
src=/1/test/file4.txt, chunkSize=516, chunksPerPacket=127, packetSize=65532
2334 [main] DEBUG org.apache.hadoop.hdfs.DFSClient  - DFSClient writeChunk 
allocating new packet seqno=82, src=/1/test/file4.txt, packetSize=65532, 
chunksPerPacket=127, bytesCurBlock=5331968

Summary:

[jira] [Commented] (HADOOP-15063) IOException may be thrown when read from Aliyun OSS in some case

2017-11-22 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16262240#comment-16262240
 ] 

Hadoop QA commented on HADOOP-15063:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  4s{color} 
| {color:red} HADOOP-15063 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HADOOP-15063 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12898815/HADOOP-15063.001.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/13736/console |
| Powered by | Apache Yetus 0.7.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> IOException may be thrown when read from Aliyun OSS in some case
> 
>
> Key: HADOOP-15063
> URL: https://issues.apache.org/jira/browse/HADOOP-15063
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/oss
>Affects Versions: 3.0.0-alpha2, 3.0.0-beta1
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15063.001.patch
>
>
> IOException will be thrown in this case
> 1. set part size = n(102400)
> 2. assume current position = 0, then partRemaining = 102400
> 3. we call seek(pos = 101802), with pos > position && pos < position + 
> partRemaining, so it will skip pos - position bytes, but partRemaining 
> remains the same
> 4. if we read bytes more than n - pos, it will throw IOException.
> Current code:
> {code:java}
> @Override
>   public synchronized void seek(long pos) throws IOException {
> checkNotClosed();
> if (position == pos) {
>   return;
> } else if (pos > position && pos < position + partRemaining) {
>   AliyunOSSUtils.skipFully(wrappedStream, pos - position);
>   // we need update partRemaining here
>   position = pos;
> } else {
>   reopen(pos);
> }
>   }
> {code}
> Logs:
> java.io.IOException: Failed to read from stream. Remaining:101802
>   at 
> org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182)
>   at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75)
>   at 
> org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92)
> How to re-produce:
> 1. create a file with 10MB size
> 2. 
> {code:java}
> int seekTimes = 150;
> for (int i = 0; i < seekTimes; i++) {
>   long pos = size / (seekTimes - i) - 1;
>   LOG.info("begin seeking for pos: " + pos);
>   byte []buf = new byte[1024];
>   instream.read(pos, buf, 0, 1024);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15064) hadoop-common 3.0.0-beta1 exposes a dependency on slf4j-log4j12

2017-11-22 Thread PJ Fanning (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PJ Fanning updated HADOOP-15064:

Description: 
https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common/3.0.0-beta1
https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-auth/3.0.0-beta1 
One of the ideas of SLF4J is that you should depend on the API jar and it is up 
to users of your lib to add a dependency to their preferred SLF4J 
implementation. You can only have one such implementation jar on your classpath.
If the hadoop build uses log4j in its tests, then this can be made a test 
dependency and not a general compile or runtime dependency.

  was:
https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common/3.0.0-beta1
One of the ideas of SLF4J is that you should depend on the API jar and it is up 
to users of your lib to add a dependency to their preferred SLF4J 
implementation. You can only have one such implementation jar on your classpath.
If the hadoop build uses log4j in its tests, then this can be made a test 
dependency and not a general compile or runtime dependency.


> hadoop-common 3.0.0-beta1 exposes a dependency on slf4j-log4j12
> ---
>
> Key: HADOOP-15064
> URL: https://issues.apache.org/jira/browse/HADOOP-15064
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.0.0-beta1
>Reporter: PJ Fanning
>
> https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common/3.0.0-beta1
> https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-auth/3.0.0-beta1 
> One of the ideas of SLF4J is that you should depend on the API jar and it is 
> up to users of your lib to add a dependency to their preferred SLF4J 
> implementation. You can only have one such implementation jar on your 
> classpath.
> If the hadoop build uses log4j in its tests, then this can be made a test 
> dependency and not a general compile or runtime dependency.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15064) hadoop-common and hadoop-auth 3.0.0-beta1 expose a dependency on slf4j-log4j12

2017-11-22 Thread PJ Fanning (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PJ Fanning updated HADOOP-15064:

Summary: hadoop-common and hadoop-auth 3.0.0-beta1 expose a dependency on 
slf4j-log4j12  (was: hadoop-common 3.0.0-beta1 exposes a dependency on 
slf4j-log4j12)

> hadoop-common and hadoop-auth 3.0.0-beta1 expose a dependency on slf4j-log4j12
> --
>
> Key: HADOOP-15064
> URL: https://issues.apache.org/jira/browse/HADOOP-15064
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.0.0-beta1
>Reporter: PJ Fanning
>
> https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common/3.0.0-beta1
> https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-auth/3.0.0-beta1 
> One of the ideas of SLF4J is that you should depend on the API jar and it is 
> up to users of your lib to add a dependency to their preferred SLF4J 
> implementation. You can only have one such implementation jar on your 
> classpath.
> If the hadoop build uses log4j in its tests, then this can be made a test 
> dependency and not a general compile or runtime dependency.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15064) hadoop-common 3.0.0-beta1 exposes a dependency on slf4j-log4j12

2017-11-22 Thread PJ Fanning (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PJ Fanning updated HADOOP-15064:

Affects Version/s: 3.0.0-beta1

> hadoop-common 3.0.0-beta1 exposes a dependency on slf4j-log4j12
> ---
>
> Key: HADOOP-15064
> URL: https://issues.apache.org/jira/browse/HADOOP-15064
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.0.0-beta1
>Reporter: PJ Fanning
>
> https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common/3.0.0-beta1
> One of the ideas of SLF4J is that you should depend on the API jar and it is 
> up to users of your lib to add a dependency to their preferred SLF4J 
> implementation. You can only have one such implementation jar on your 
> classpath.
> If the hadoop build uses log4j in its tests, then this can be made a test 
> dependency and not a general compile or runtime dependency.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15064) hadoop-common 3.0.0-beta1 exposes a dependency on slf4j-log4j12

2017-11-22 Thread PJ Fanning (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PJ Fanning updated HADOOP-15064:

Environment: (was: 
https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common/3.0.0-beta1

One of the ideas of SLF4J is that you should depend on the API jar and it is up 
to users of your lib to add a dependency to their preferred SLF4J 
implementation. You can only have one such implementation jar on your classpath.

If the hadoop build uses log4j in its tests, then this can be made a test 
dependency and not a general compile or runtime dependency.)

> hadoop-common 3.0.0-beta1 exposes a dependency on slf4j-log4j12
> ---
>
> Key: HADOOP-15064
> URL: https://issues.apache.org/jira/browse/HADOOP-15064
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.0.0-beta1
>Reporter: PJ Fanning
>
> https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common/3.0.0-beta1
> One of the ideas of SLF4J is that you should depend on the API jar and it is 
> up to users of your lib to add a dependency to their preferred SLF4J 
> implementation. You can only have one such implementation jar on your 
> classpath.
> If the hadoop build uses log4j in its tests, then this can be made a test 
> dependency and not a general compile or runtime dependency.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15064) hadoop-common 3.0.0-beta1 exposes a dependency on slf4j-log4j12

2017-11-22 Thread PJ Fanning (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PJ Fanning updated HADOOP-15064:

Description: 
https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common/3.0.0-beta1
One of the ideas of SLF4J is that you should depend on the API jar and it is up 
to users of your lib to add a dependency to their preferred SLF4J 
implementation. You can only have one such implementation jar on your classpath.
If the hadoop build uses log4j in its tests, then this can be made a test 
dependency and not a general compile or runtime dependency.

> hadoop-common 3.0.0-beta1 exposes a dependency on slf4j-log4j12
> ---
>
> Key: HADOOP-15064
> URL: https://issues.apache.org/jira/browse/HADOOP-15064
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.0.0-beta1
> Environment: 
> https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common/3.0.0-beta1
> One of the ideas of SLF4J is that you should depend on the API jar and it is 
> up to users of your lib to add a dependency to their preferred SLF4J 
> implementation. You can only have one such implementation jar on your 
> classpath.
> If the hadoop build uses log4j in its tests, then this can be made a test 
> dependency and not a general compile or runtime dependency.
>Reporter: PJ Fanning
>
> https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common/3.0.0-beta1
> One of the ideas of SLF4J is that you should depend on the API jar and it is 
> up to users of your lib to add a dependency to their preferred SLF4J 
> implementation. You can only have one such implementation jar on your 
> classpath.
> If the hadoop build uses log4j in its tests, then this can be made a test 
> dependency and not a general compile or runtime dependency.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Created] (HADOOP-15064) hadoop-common 3.0.0-beta1 exposes a dependency on slf4j-log4j12

2017-11-22 Thread PJ Fanning (JIRA)

PJ Fanning created HADOOP-15064:
---

 Summary: hadoop-common 3.0.0-beta1 exposes a dependency on 
slf4j-log4j12
 Key: HADOOP-15064
 URL: https://issues.apache.org/jira/browse/HADOOP-15064
 Project: Hadoop Common
  Issue Type: Bug
 Environment: 
https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common/3.0.0-beta1

One of the ideas of SLF4J is that you should depend on the API jar and it is up 
to users of your lib to add a dependency to their preferred SLF4J 
implementation. You can only have one such implementation jar on your classpath.

If the hadoop build uses log4j in its tests, then this can be made a test 
dependency and not a general compile or runtime dependency.
Reporter: PJ Fanning






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15063) IOException may be thrown when read from Aliyun OSS in some case

2017-11-22 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15063:
-
Status: Patch Available  (was: In Progress)

> IOException may be thrown when read from Aliyun OSS in some case
> 
>
> Key: HADOOP-15063
> URL: https://issues.apache.org/jira/browse/HADOOP-15063
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/oss
>Affects Versions: 3.0.0-beta1, 3.0.0-alpha2
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15063.001.patch
>
>
> IOException will be thrown in this case
> 1. set part size = n(102400)
> 2. assume current position = 0, then partRemaining = 102400
> 3. we call seek(pos = 101802), with pos > position && pos < position + 
> partRemaining, so it will skip pos - position bytes, but partRemaining 
> remains the same
> 4. if we read bytes more than n - pos, it will throw IOException.
> Current code:
> {code:java}
> @Override
>   public synchronized void seek(long pos) throws IOException {
> checkNotClosed();
> if (position == pos) {
>   return;
> } else if (pos > position && pos < position + partRemaining) {
>   AliyunOSSUtils.skipFully(wrappedStream, pos - position);
>   // we need update partRemaining here
>   position = pos;
> } else {
>   reopen(pos);
> }
>   }
> {code}
> Logs:
> java.io.IOException: Failed to read from stream. Remaining:101802
>   at 
> org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182)
>   at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75)
>   at 
> org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92)
> How to re-produce:
> 1. create a file with 10MB size
> 2. 
> {code:java}
> int seekTimes = 150;
> for (int i = 0; i < seekTimes; i++) {
>   long pos = size / (seekTimes - i) - 1;
>   LOG.info("begin seeking for pos: " + pos);
>   byte []buf = new byte[1024];
>   instream.read(pos, buf, 0, 1024);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Work started] (HADOOP-15063) IOException may be thrown when read from Aliyun OSS in some case

2017-11-22 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HADOOP-15063 started by wujinhu.

> IOException may be thrown when read from Aliyun OSS in some case
> 
>
> Key: HADOOP-15063
> URL: https://issues.apache.org/jira/browse/HADOOP-15063
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/oss
>Affects Versions: 3.0.0-alpha2, 3.0.0-beta1
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15063.001.patch
>
>
> IOException will be thrown in this case
> 1. set part size = n(102400)
> 2. assume current position = 0, then partRemaining = 102400
> 3. we call seek(pos = 101802), with pos > position && pos < position + 
> partRemaining, so it will skip pos - position bytes, but partRemaining 
> remains the same
> 4. if we read bytes more than n - pos, it will throw IOException.
> Current code:
> {code:java}
> @Override
>   public synchronized void seek(long pos) throws IOException {
> checkNotClosed();
> if (position == pos) {
>   return;
> } else if (pos > position && pos < position + partRemaining) {
>   AliyunOSSUtils.skipFully(wrappedStream, pos - position);
>   // we need update partRemaining here
>   position = pos;
> } else {
>   reopen(pos);
> }
>   }
> {code}
> Logs:
> java.io.IOException: Failed to read from stream. Remaining:101802
>   at 
> org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182)
>   at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75)
>   at 
> org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92)
> How to re-produce:
> 1. create a file with 10MB size
> 2. 
> {code:java}
> int seekTimes = 150;
> for (int i = 0; i < seekTimes; i++) {
>   long pos = size / (seekTimes - i) - 1;
>   LOG.info("begin seeking for pos: " + pos);
>   byte []buf = new byte[1024];
>   instream.read(pos, buf, 0, 1024);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15063) IOException may be thrown when read from Aliyun OSS in some case

2017-11-22 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15063:
-
Priority: Major  (was: Critical)

> IOException may be thrown when read from Aliyun OSS in some case
> 
>
> Key: HADOOP-15063
> URL: https://issues.apache.org/jira/browse/HADOOP-15063
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/oss
>Affects Versions: 3.0.0-alpha2, 3.0.0-beta1
>Reporter: wujinhu
>Assignee: wujinhu
> Attachments: HADOOP-15063.001.patch
>
>
> IOException will be thrown in this case
> 1. set part size = n(102400)
> 2. assume current position = 0, then partRemaining = 102400
> 3. we call seek(pos = 101802), with pos > position && pos < position + 
> partRemaining, so it will skip pos - position bytes, but partRemaining 
> remains the same
> 4. if we read bytes more than n - pos, it will throw IOException.
> Current code:
> {code:java}
> @Override
>   public synchronized void seek(long pos) throws IOException {
> checkNotClosed();
> if (position == pos) {
>   return;
> } else if (pos > position && pos < position + partRemaining) {
>   AliyunOSSUtils.skipFully(wrappedStream, pos - position);
>   // we need update partRemaining here
>   position = pos;
> } else {
>   reopen(pos);
> }
>   }
> {code}
> Logs:
> java.io.IOException: Failed to read from stream. Remaining:101802
>   at 
> org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182)
>   at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75)
>   at 
> org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92)
> How to re-produce:
> 1. create a file with 10MB size
> 2. 
> {code:java}
> int seekTimes = 150;
> for (int i = 0; i < seekTimes; i++) {
>   long pos = size / (seekTimes - i) - 1;
>   LOG.info("begin seeking for pos: " + pos);
>   byte []buf = new byte[1024];
>   instream.read(pos, buf, 0, 1024);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15063) IOException may be thrown when read from Aliyun OSS in some case

2017-11-22 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15063:
-
Summary: IOException may be thrown when read from Aliyun OSS in some case  
(was: IOException will be thrown when read from Aliyun OSS)

> IOException may be thrown when read from Aliyun OSS in some case
> 
>
> Key: HADOOP-15063
> URL: https://issues.apache.org/jira/browse/HADOOP-15063
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/oss
>Affects Versions: 3.0.0-alpha2, 3.0.0-beta1
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Critical
> Attachments: HADOOP-15063.001.patch
>
>
> IOException will be thrown in this case
> 1. set part size = n(102400)
> 2. assume current position = 0, then partRemaining = 102400
> 3. we call seek(pos = 101802), with pos > position && pos < position + 
> partRemaining, so it will skip pos - position bytes, but partRemaining 
> remains the same
> 4. if we read bytes more than n - pos, it will throw IOException.
> Current code:
> {code:java}
> @Override
>   public synchronized void seek(long pos) throws IOException {
> checkNotClosed();
> if (position == pos) {
>   return;
> } else if (pos > position && pos < position + partRemaining) {
>   AliyunOSSUtils.skipFully(wrappedStream, pos - position);
>   // we need update partRemaining here
>   position = pos;
> } else {
>   reopen(pos);
> }
>   }
> {code}
> Logs:
> java.io.IOException: Failed to read from stream. Remaining:101802
>   at 
> org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182)
>   at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75)
>   at 
> org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92)
> How to re-produce:
> 1. create a file with 10MB size
> 2. 
> {code:java}
> int seekTimes = 150;
> for (int i = 0; i < seekTimes; i++) {
>   long pos = size / (seekTimes - i) - 1;
>   LOG.info("begin seeking for pos: " + pos);
>   byte []buf = new byte[1024];
>   instream.read(pos, buf, 0, 1024);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15063) IOException will be thrown when read from Aliyun OSS

2017-11-22 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15063:
-
Attachment: HADOOP-15063.001.patch

> IOException will be thrown when read from Aliyun OSS
> 
>
> Key: HADOOP-15063
> URL: https://issues.apache.org/jira/browse/HADOOP-15063
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/oss
>Affects Versions: 3.0.0-alpha2, 3.0.0-beta1
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Critical
> Attachments: HADOOP-15063.001.patch
>
>
> IOException will be thrown in this case
> 1. set part size = n(102400)
> 2. assume current position = 0, then partRemaining = 102400
> 3. we call seek(pos = 101802), with pos > position && pos < position + 
> partRemaining, so it will skip pos - position bytes, but partRemaining 
> remains the same
> 4. if we read bytes more than n - pos, it will throw IOException.
> Current code:
> {code:java}
> @Override
>   public synchronized void seek(long pos) throws IOException {
> checkNotClosed();
> if (position == pos) {
>   return;
> } else if (pos > position && pos < position + partRemaining) {
>   AliyunOSSUtils.skipFully(wrappedStream, pos - position);
>   // we need update partRemaining here
>   position = pos;
> } else {
>   reopen(pos);
> }
>   }
> {code}
> Logs:
> java.io.IOException: Failed to read from stream. Remaining:101802
>   at 
> org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182)
>   at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75)
>   at 
> org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92)
> How to re-produce:
> 1. create a file with 10MB size
> 2. 
> {code:java}
> int seekTimes = 150;
> for (int i = 0; i < seekTimes; i++) {
>   long pos = size / (seekTimes - i) - 1;
>   LOG.info("begin seeking for pos: " + pos);
>   byte []buf = new byte[1024];
>   instream.read(pos, buf, 0, 1024);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15063) IOException will be thrown when read from Aliyun OSS

2017-11-22 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15063:
-
Attachment: (was: HADOOP-15063.001.patch)

> IOException will be thrown when read from Aliyun OSS
> 
>
> Key: HADOOP-15063
> URL: https://issues.apache.org/jira/browse/HADOOP-15063
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/oss
>Affects Versions: 3.0.0-alpha2, 3.0.0-beta1
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Critical
>
> IOException will be thrown in this case
> 1. set part size = n(102400)
> 2. assume current position = 0, then partRemaining = 102400
> 3. we call seek(pos = 101802), with pos > position && pos < position + 
> partRemaining, so it will skip pos - position bytes, but partRemaining 
> remains the same
> 4. if we read bytes more than n - pos, it will throw IOException.
> Current code:
> {code:java}
> @Override
>   public synchronized void seek(long pos) throws IOException {
> checkNotClosed();
> if (position == pos) {
>   return;
> } else if (pos > position && pos < position + partRemaining) {
>   AliyunOSSUtils.skipFully(wrappedStream, pos - position);
>   // we need update partRemaining here
>   position = pos;
> } else {
>   reopen(pos);
> }
>   }
> {code}
> Logs:
> java.io.IOException: Failed to read from stream. Remaining:101802
>   at 
> org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182)
>   at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75)
>   at 
> org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92)
> How to re-produce:
> 1. create a file with 10MB size
> 2. 
> {code:java}
> int seekTimes = 150;
> for (int i = 0; i < seekTimes; i++) {
>   long pos = size / (seekTimes - i) - 1;
>   LOG.info("begin seeking for pos: " + pos);
>   byte []buf = new byte[1024];
>   instream.read(pos, buf, 0, 1024);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15063) IOException will be thrown when read from Aliyun OSS

2017-11-22 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15063:
-
Attachment: HADOOP-15063.001.patch

> IOException will be thrown when read from Aliyun OSS
> 
>
> Key: HADOOP-15063
> URL: https://issues.apache.org/jira/browse/HADOOP-15063
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/oss
>Affects Versions: 3.0.0-alpha2, 3.0.0-beta1
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Critical
> Attachments: HADOOP-15063.001.patch
>
>
> IOException will be thrown in this case
> 1. set part size = n(102400)
> 2. assume current position = 0, then partRemaining = 102400
> 3. we call seek(pos = 101802), with pos > position && pos < position + 
> partRemaining, so it will skip pos - position bytes, but partRemaining 
> remains the same
> 4. if we read bytes more than n - pos, it will throw IOException.
> Current code:
> {code:java}
> @Override
>   public synchronized void seek(long pos) throws IOException {
> checkNotClosed();
> if (position == pos) {
>   return;
> } else if (pos > position && pos < position + partRemaining) {
>   AliyunOSSUtils.skipFully(wrappedStream, pos - position);
>   // we need update partRemaining here
>   position = pos;
> } else {
>   reopen(pos);
> }
>   }
> {code}
> Logs:
> java.io.IOException: Failed to read from stream. Remaining:101802
>   at 
> org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182)
>   at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75)
>   at 
> org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92)
> How to re-produce:
> 1. create a file with 10MB size
> 2. 
> {code:java}
> int seekTimes = 150;
> for (int i = 0; i < seekTimes; i++) {
>   long pos = size / (seekTimes - i) - 1;
>   LOG.info("begin seeking for pos: " + pos);
>   byte []buf = new byte[1024];
>   instream.read(pos, buf, 0, 1024);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-15063) IOException will be thrown when read from Aliyun OSS

2017-11-22 Thread wujinhu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16262132#comment-16262132
 ] 

wujinhu commented on HADOOP-15063:
--

Upload patch file.

> IOException will be thrown when read from Aliyun OSS
> 
>
> Key: HADOOP-15063
> URL: https://issues.apache.org/jira/browse/HADOOP-15063
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/oss
>Affects Versions: 3.0.0-alpha2, 3.0.0-beta1
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Critical
> Attachments: HADOOP-15063.001.patch
>
>
> IOException will be thrown in this case
> 1. set part size = n(102400)
> 2. assume current position = 0, then partRemaining = 102400
> 3. we call seek(pos = 101802), with pos > position && pos < position + 
> partRemaining, so it will skip pos - position bytes, but partRemaining 
> remains the same
> 4. if we read bytes more than n - pos, it will throw IOException.
> Current code:
> {code:java}
> @Override
>   public synchronized void seek(long pos) throws IOException {
> checkNotClosed();
> if (position == pos) {
>   return;
> } else if (pos > position && pos < position + partRemaining) {
>   AliyunOSSUtils.skipFully(wrappedStream, pos - position);
>   // we need update partRemaining here
>   position = pos;
> } else {
>   reopen(pos);
> }
>   }
> {code}
> Logs:
> java.io.IOException: Failed to read from stream. Remaining:101802
>   at 
> org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182)
>   at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75)
>   at 
> org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92)
> How to re-produce:
> 1. create a file with 10MB size
> 2. 
> {code:java}
> int seekTimes = 150;
> for (int i = 0; i < seekTimes; i++) {
>   long pos = size / (seekTimes - i) - 1;
>   LOG.info("begin seeking for pos: " + pos);
>   byte []buf = new byte[1024];
>   instream.read(pos, buf, 0, 1024);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-15063) IOException will be thrown when read from Aliyun OSS

2017-11-22 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu updated HADOOP-15063:
-
Affects Version/s: 3.0.0-beta1

> IOException will be thrown when read from Aliyun OSS
> 
>
> Key: HADOOP-15063
> URL: https://issues.apache.org/jira/browse/HADOOP-15063
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/oss
>Affects Versions: 3.0.0-alpha2, 3.0.0-beta1
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Critical
>
> IOException will be thrown in this case
> 1. set part size = n(102400)
> 2. assume current position = 0, then partRemaining = 102400
> 3. we call seek(pos = 101802), with pos > position && pos < position + 
> partRemaining, so it will skip pos - position bytes, but partRemaining 
> remains the same
> 4. if we read bytes more than n - pos, it will throw IOException.
> Current code:
> {code:java}
> @Override
>   public synchronized void seek(long pos) throws IOException {
> checkNotClosed();
> if (position == pos) {
>   return;
> } else if (pos > position && pos < position + partRemaining) {
>   AliyunOSSUtils.skipFully(wrappedStream, pos - position);
>   // we need update partRemaining here
>   position = pos;
> } else {
>   reopen(pos);
> }
>   }
> {code}
> Logs:
> java.io.IOException: Failed to read from stream. Remaining:101802
>   at 
> org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182)
>   at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75)
>   at 
> org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92)
> How to re-produce:
> 1. create a file with 10MB size
> 2. 
> {code:java}
> int seekTimes = 150;
> for (int i = 0; i < seekTimes; i++) {
>   long pos = size / (seekTimes - i) - 1;
>   LOG.info("begin seeking for pos: " + pos);
>   byte []buf = new byte[1024];
>   instream.read(pos, buf, 0, 1024);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Assigned] (HADOOP-15063) IOException will be thrown when read from Aliyun OSS

2017-11-22 Thread wujinhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wujinhu reassigned HADOOP-15063:


Assignee: wujinhu

> IOException will be thrown when read from Aliyun OSS
> 
>
> Key: HADOOP-15063
> URL: https://issues.apache.org/jira/browse/HADOOP-15063
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/oss
>Affects Versions: 3.0.0-alpha2
>Reporter: wujinhu
>Assignee: wujinhu
>Priority: Critical
>
> IOException will be thrown in this case
> 1. set part size = n(102400)
> 2. assume current position = 0, then partRemaining = 102400
> 3. we call seek(pos = 101802), with pos > position && pos < position + 
> partRemaining, so it will skip pos - position bytes, but partRemaining 
> remains the same
> 4. if we read bytes more than n - pos, it will throw IOException.
> Current code:
> {code:java}
> @Override
>   public synchronized void seek(long pos) throws IOException {
> checkNotClosed();
> if (position == pos) {
>   return;
> } else if (pos > position && pos < position + partRemaining) {
>   AliyunOSSUtils.skipFully(wrappedStream, pos - position);
>   // we need update partRemaining here
>   position = pos;
> } else {
>   reopen(pos);
> }
>   }
> {code}
> Logs:
> java.io.IOException: Failed to read from stream. Remaining:101802
>   at 
> org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182)
>   at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75)
>   at 
> org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92)
> How to re-produce:
> 1. create a file with 10MB size
> 2. 
> {code:java}
> int seekTimes = 150;
> for (int i = 0; i < seekTimes; i++) {
>   long pos = size / (seekTimes - i) - 1;
>   LOG.info("begin seeking for pos: " + pos);
>   byte []buf = new byte[1024];
>   instream.read(pos, buf, 0, 1024);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

82 matches

Mail list logo