[jira] [Commented] (HADOOP-14600) LocatedFileStatus constructor forces RawLocalFS to exec a process to get the permissions
[ https://issues.apache.org/jira/browse/HADOOP-14600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263924#comment-16263924 ] Chris Douglas commented on HADOOP-14600: lgtm, but if you have cycles to verify the patch, then let's commit it as soon as you +1 it. > LocatedFileStatus constructor forces RawLocalFS to exec a process to get the > permissions > > > Key: HADOOP-14600 > URL: https://issues.apache.org/jira/browse/HADOOP-14600 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.7.3 > Environment: file:// in a dir with many files >Reporter: Steve Loughran >Assignee: Ping Liu > Attachments: HADOOP-14600.001.patch, HADOOP-14600.002.patch, > HADOOP-14600.003.patch, HADOOP-14600.004.patch, HADOOP-14600.005.patch, > HADOOP-14600.006.patch, HADOOP-14600.007.patch, HADOOP-14600.008.patch, > HADOOP-14600.009.patch, TestRawLocalFileSystemContract.java > > > Reported in SPARK-21137. a {{FileSystem.listStatus}} call really craws > against the local FS, because {{FileStatus.getPemissions}} call forces > {{DeprecatedRawLocalFileStatus}} tp spawn a process to read the real UGI > values. > That is: for every other FS, what's a field lookup or even a no-op, on the > local FS it's a process exec/spawn, with all the costs. This gets expensive > if you have many files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15067) GC time percentage reported in JvmMetrics should be a gauge, not counter
[ https://issues.apache.org/jira/browse/HADOOP-15067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263893#comment-16263893 ] Erik Krogen commented on HADOOP-15067: -- Thanks for the ping [~xiaochen] and sorry for the late response, as you seem to have suspected I was OOO for the holidays today. And thank you for the fix [~mi...@cloudera.com]! LGTM. > GC time percentage reported in JvmMetrics should be a gauge, not counter > > > Key: HADOOP-15067 > URL: https://issues.apache.org/jira/browse/HADOOP-15067 > Project: Hadoop Common > Issue Type: Bug >Reporter: Misha Dmitriev >Assignee: Misha Dmitriev > Attachments: HADOOP-15067.01.patch > > > A new GcTimeMonitor class has been recently added, and the corresponding > metrics added in JvmMetrics.java, line 190: > {code} > if (gcTimeMonitor != null) { > rb.addCounter(GcTimePercentage, > gcTimeMonitor.getLatestGcData().getGcTimePercentage()); > } > {code} > Since GC time percentage can go up and down, a gauge rather than counter > should be used to report it. That is, {{addCounter}} should be replaced with > {{addGauge}} above. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-14600) LocatedFileStatus constructor forces RawLocalFS to exec a process to get the permissions
[ https://issues.apache.org/jira/browse/HADOOP-14600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263821#comment-16263821 ] Ping Liu edited comment on HADOOP-14600 at 11/23/17 5:01 AM: - [~chris.douglas] Finally, this round is green. That's great! Do you still need me verify it? If so, I will try to work on it during this weekend. was (Author: myapachejira): [~chris.douglas] Finally, this round is green. That's great! Do you still need me verify it? If so, I need learn how to use "git apply " :) > LocatedFileStatus constructor forces RawLocalFS to exec a process to get the > permissions > > > Key: HADOOP-14600 > URL: https://issues.apache.org/jira/browse/HADOOP-14600 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.7.3 > Environment: file:// in a dir with many files >Reporter: Steve Loughran >Assignee: Ping Liu > Attachments: HADOOP-14600.001.patch, HADOOP-14600.002.patch, > HADOOP-14600.003.patch, HADOOP-14600.004.patch, HADOOP-14600.005.patch, > HADOOP-14600.006.patch, HADOOP-14600.007.patch, HADOOP-14600.008.patch, > HADOOP-14600.009.patch, TestRawLocalFileSystemContract.java > > > Reported in SPARK-21137. a {{FileSystem.listStatus}} call really craws > against the local FS, because {{FileStatus.getPemissions}} call forces > {{DeprecatedRawLocalFileStatus}} tp spawn a process to read the real UGI > values. > That is: for every other FS, what's a field lookup or even a no-op, on the > local FS it's a process exec/spawn, with all the costs. This gets expensive > if you have many files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14600) LocatedFileStatus constructor forces RawLocalFS to exec a process to get the permissions
[ https://issues.apache.org/jira/browse/HADOOP-14600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263821#comment-16263821 ] Ping Liu commented on HADOOP-14600: --- [~chris.douglas] Finally, this round is green. That's great! Do you still need me verify it? If so, I need learn how to use "git apply " :) > LocatedFileStatus constructor forces RawLocalFS to exec a process to get the > permissions > > > Key: HADOOP-14600 > URL: https://issues.apache.org/jira/browse/HADOOP-14600 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.7.3 > Environment: file:// in a dir with many files >Reporter: Steve Loughran >Assignee: Ping Liu > Attachments: HADOOP-14600.001.patch, HADOOP-14600.002.patch, > HADOOP-14600.003.patch, HADOOP-14600.004.patch, HADOOP-14600.005.patch, > HADOOP-14600.006.patch, HADOOP-14600.007.patch, HADOOP-14600.008.patch, > HADOOP-14600.009.patch, TestRawLocalFileSystemContract.java > > > Reported in SPARK-21137. a {{FileSystem.listStatus}} call really craws > against the local FS, because {{FileStatus.getPemissions}} call forces > {{DeprecatedRawLocalFileStatus}} tp spawn a process to read the real UGI > values. > That is: for every other FS, what's a field lookup or even a no-op, on the > local FS it's a process exec/spawn, with all the costs. This gets expensive > if you have many files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15068) cancelToken and renewToken should use shortUserName consistently
Vihang Karajgaonkar created HADOOP-15068: Summary: cancelToken and renewToken should use shortUserName consistently Key: HADOOP-15068 URL: https://issues.apache.org/jira/browse/HADOOP-15068 Project: Hadoop Common Issue Type: Improvement Components: common Affects Versions: 2.8.2 Reporter: Vihang Karajgaonkar {{AbstractDelegationTokenSecretManager}} is used by many external projects including Hive. This class provides implementations of renewToken and cancelToken which are used for the delegation token management. The methods are semantically inconsistent. Specifically, when you call cancelToken, the string value of the canceller is used to get the Kerberos shortname and then compared with the renewer value of the token to be cancelled. While in case of renewToken, the string value which is passed in is used directly to compare with the renewer value of the token. This inconsistency means that applications need to know about this subtle difference and pass in the shortname while renewing the token, while it can pass the full kerberos username during cancellation. Can we change the renewToken method such that it uses the shortName similar to the cancelToken method? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15059) 3.0 deployment cannot work with old version MR tar ball which break rolling upgrade
[ https://issues.apache.org/jira/browse/HADOOP-15059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263739#comment-16263739 ] Rohith Sharma K S commented on HADOOP-15059: Update : Fortunately I couldn't reproduce the issue which I reported in earlier comment. I am able to install Hadoop-3.0-RC0 + HBase-1.2.6 in secure mode and run successfully today. I am not sure that any issues post Hadoop-alpha-2 has fixed this issue. IIRC, the build which I used to test this combination is Hadoop-3.0-alpha2/3 + HBase-1.2.4/5! Anyway its good news for ATSv2 folks which we were worried about this. I will be keep trying to reproduce this weekend as well. If there any issues found, I will be updating here. Till that time, please ignore that issue. I would appreciate if someone else can also validate the behavior. This gives additional confidence that wire compatibility across Hadoop-2 and Hadoop-3 is achieved! > 3.0 deployment cannot work with old version MR tar ball which break rolling > upgrade > --- > > Key: HADOOP-15059 > URL: https://issues.apache.org/jira/browse/HADOOP-15059 > Project: Hadoop Common > Issue Type: Bug > Components: security >Reporter: Junping Du >Priority: Blocker > > I tried to deploy 3.0 cluster with 2.9 MR tar ball. The MR job is failed > because following error: > {noformat} > 2017-11-21 12:42:50,911 INFO [main] > org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for > application appattempt_1511295641738_0003_01 > 2017-11-21 12:42:51,070 WARN [main] org.apache.hadoop.util.NativeCodeLoader: > Unable to load native-hadoop library for your platform... using builtin-java > classes where applicable > 2017-11-21 12:42:51,118 FATAL [main] > org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster > java.lang.RuntimeException: Unable to determine current user > at > org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:254) > at > org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:220) > at > org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:212) > at > org.apache.hadoop.conf.Configuration.addResource(Configuration.java:888) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1638) > Caused by: java.io.IOException: Exception reading > /tmp/nm-local-dir/usercache/jdu/appcache/application_1511295641738_0003/container_e03_1511295641738_0003_01_01/container_tokens > at > org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:208) > at > org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:907) > at > org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:820) > at > org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:689) > at > org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:252) > ... 4 more > Caused by: java.io.IOException: Unknown version 1 in token storage. > at > org.apache.hadoop.security.Credentials.readTokenStorageStream(Credentials.java:226) > at > org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:205) > ... 8 more > 2017-11-21 12:42:51,122 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting > with status 1: java.lang.RuntimeException: Unable to determine current user > {noformat} > I think it is due to token incompatiblity change between 2.9 and 3.0. As we > claim "rolling upgrade" is supported in Hadoop 3, we should fix this before > we ship 3.0 otherwise all MR running applications will get stuck during/after > upgrade. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-13478) Aliyun OSS phase I: some preparation and improvements before release
[ https://issues.apache.org/jira/browse/HADOOP-13478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Genmao Yu resolved HADOOP-13478. Resolution: Done > Aliyun OSS phase I: some preparation and improvements before release > > > Key: HADOOP-13478 > URL: https://issues.apache.org/jira/browse/HADOOP-13478 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: HADOOP-12756 >Reporter: Genmao Yu > Fix For: HADOOP-12756 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15067) GC time percentage reported in JvmMetrics should be a gauge, not counter
[ https://issues.apache.org/jira/browse/HADOOP-15067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263449#comment-16263449 ] Xiao Chen commented on HADOOP-15067: Plan to commit tonight. Erik please feel free to comment if you got a chance. Otherwise: happy thanksgiving! > GC time percentage reported in JvmMetrics should be a gauge, not counter > > > Key: HADOOP-15067 > URL: https://issues.apache.org/jira/browse/HADOOP-15067 > Project: Hadoop Common > Issue Type: Bug >Reporter: Misha Dmitriev >Assignee: Misha Dmitriev > Attachments: HADOOP-15067.01.patch > > > A new GcTimeMonitor class has been recently added, and the corresponding > metrics added in JvmMetrics.java, line 190: > {code} > if (gcTimeMonitor != null) { > rb.addCounter(GcTimePercentage, > gcTimeMonitor.getLatestGcData().getGcTimePercentage()); > } > {code} > Since GC time percentage can go up and down, a gauge rather than counter > should be used to report it. That is, {{addCounter}} should be replaced with > {{addGauge}} above. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15066) Spurious error stopping secure datanode
[ https://issues.apache.org/jira/browse/HADOOP-15066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263433#comment-16263433 ] Hadoop QA commented on HADOOP-15066: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 9s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 2s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 45s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} shellcheck {color} | {color:green} 0m 2s{color} | {color:green} There were no new shellcheck issues. {color} | | {color:green}+1{color} | {color:green} shelldocs {color} | {color:green} 0m 9s{color} | {color:green} There were no new shelldocs issues. {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 59s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 10s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 42m 54s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | HADOOP-15066 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12898940/HADOOP-15066.01.patch | | Optional Tests | asflicense mvnsite unit shellcheck shelldocs | | uname | Linux a5146a12d74f 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 738d1a2 | | maven | version: Apache Maven 3.3.9 | | shellcheck | v0.4.6 | | whitespace | https://builds.apache.org/job/PreCommit-HADOOP-Build/13742/artifact/out/whitespace-eol.txt | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/13742/testReport/ | | Max. process+thread count | 303 (vs. ulimit of 5000) | | modules | C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/13742/console | | Powered by | Apache Yetus 0.7.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Spurious error stopping secure datanode > --- > > Key: HADOOP-15066 > URL: https://issues.apache.org/jira/browse/HADOOP-15066 > Project: Hadoop Common > Issue Type: Bug > Components: scripts >Affects Versions: 3.0.0 >Reporter: Arpit Agarwal >Assignee: Bharat Viswanadham > Attachments: HADOOP-15066.00.patch, HADOOP-15066.01.patch > > > There is a spurious error when stopping a secure datanode. > {code} > # hdfs --daemon stop datanode > cat: /var/run/hadoop/hdfs//hadoop-hdfs-root-datanode.pid: No such file or > directory > WARNING: pid has changed for datanode, skip deleting pid file > cat: /var/run/hadoop/hdfs//hadoop-hdfs-root-datanode.pid: No such file or > directory > WARNING: daemon pid has changed for datanode, skip deleting daemon pid file > {code} > The error appears
[jira] [Commented] (HADOOP-15066) Spurious error stopping secure datanode
[ https://issues.apache.org/jira/browse/HADOOP-15066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263409#comment-16263409 ] Arpit Agarwal commented on HADOOP-15066: That's a good question. I am not sure why the daemon pid file deletion is attempted twice. If you setup a secure cluster and try to stop the secure DN the error shows up twice. > Spurious error stopping secure datanode > --- > > Key: HADOOP-15066 > URL: https://issues.apache.org/jira/browse/HADOOP-15066 > Project: Hadoop Common > Issue Type: Bug > Components: scripts >Affects Versions: 3.0.0 >Reporter: Arpit Agarwal >Assignee: Bharat Viswanadham > Attachments: HADOOP-15066.00.patch, HADOOP-15066.01.patch > > > There is a spurious error when stopping a secure datanode. > {code} > # hdfs --daemon stop datanode > cat: /var/run/hadoop/hdfs//hadoop-hdfs-root-datanode.pid: No such file or > directory > WARNING: pid has changed for datanode, skip deleting pid file > cat: /var/run/hadoop/hdfs//hadoop-hdfs-root-datanode.pid: No such file or > directory > WARNING: daemon pid has changed for datanode, skip deleting daemon pid file > {code} > The error appears benign. The service was stopped correctly. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Work started] (HADOOP-14898) Create official Docker images for development and testing features
[ https://issues.apache.org/jira/browse/HADOOP-14898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HADOOP-14898 started by Elek, Marton. - > Create official Docker images for development and testing features > --- > > Key: HADOOP-14898 > URL: https://issues.apache.org/jira/browse/HADOOP-14898 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Elek, Marton >Assignee: Elek, Marton > Attachments: HADOOP-14898.001.tar.gz, HADOOP-14898.002.tar.gz, > HADOOP-14898.003.tgz > > > This is the original mail from the mailing list: > {code} > TL;DR: I propose to create official hadoop images and upload them to the > dockerhub. > GOAL/SCOPE: I would like improve the existing documentation with easy-to-use > docker based recipes to start hadoop clusters with various configuration. > The images also could be used to test experimental features. For example > ozone could be tested easily with these compose file and configuration: > https://gist.github.com/elek/1676a97b98f4ba561c9f51fce2ab2ea6 > Or even the configuration could be included in the compose file: > https://github.com/elek/hadoop/blob/docker-2.8.0/example/docker-compose.yaml > I would like to create separated example compose files for federation, ha, > metrics usage, etc. to make it easier to try out and understand the features. > CONTEXT: There is an existing Jira > https://issues.apache.org/jira/browse/HADOOP-13397 > But it’s about a tool to generate production quality docker images (multiple > types, in a flexible way). If no objections, I will create a separated issue > to create simplified docker images for rapid prototyping and investigating > new features. And register the branch to the dockerhub to create the images > automatically. > MY BACKGROUND: I am working with docker based hadoop/spark clusters quite a > while and run them succesfully in different environments (kubernetes, > docker-swarm, nomad-based scheduling, etc.) My work is available from here: > https://github.com/flokkr but they could handle more complex use cases (eg. > instrumenting java processes with btrace, or read/reload configuration from > consul). > And IMHO in the official hadoop documentation it’s better to suggest to use > official apache docker images and not external ones (which could be changed). > {code} > The next list will enumerate the key decision points regarding to docker > image creating > A. automated dockerhub build / jenkins build > Docker images could be built on the dockerhub (a branch pattern should be > defined for a github repository and the location of the Docker files) or > could be built on a CI server and pushed. > The second one is more flexible (it's more easy to create matrix build, for > example) > The first one had the advantage that we can get an additional flag on the > dockerhub that the build is automated (and built from the source by the > dockerhub). > The decision is easy as ASF supports the first approach: (see > https://issues.apache.org/jira/browse/INFRA-12781?focusedCommentId=15824096=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15824096) > B. source: binary distribution or source build > The second question is about creating the docker image. One option is to > build the software on the fly during the creation of the docker image the > other one is to use the binary releases. > I suggest to use the second approach as: > 1. In that case the hadoop:2.7.3 could contain exactly the same hadoop > distrubution as the downloadable one > 2. We don't need to add development tools to the image, the image could be > more smaller (which is important as the goal for this image to getting > started as fast as possible) > 3. The docker definition will be more simple (and more easy to maintain) > Usually this approach is used in other projects (I checked Apache Zeppelin > and Apache Nutch) > C. branch usage > Other question is the location of the Docker file. It could be on the > official source-code branches (branch-2, trunk, etc.) or we can create > separated branches for the dockerhub (eg. docker/2.7 docker/2.8 docker/3.0) > For the first approach it's easier to find the docker images, but it's less > flexible. For example if we had a Dockerfile for on the source code it should > be used for every release (for example the Docker file from the tag > release-3.0.0 should be used for the 3.0 hadoop docker image). In that case > the release process is much more harder: in case of a Dockerfile error (which > could be test on dockerhub only after the taging), a new release should be > added after fixing the Dockerfile. > Another problem is that with using tags it's not possible to improve the > Dockerfiles. I can imagine that we would like to improve
[jira] [Commented] (HADOOP-15067) GC time percentage reported in JvmMetrics should be a gauge, not counter
[ https://issues.apache.org/jira/browse/HADOOP-15067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263373#comment-16263373 ] Hadoop QA commented on HADOOP-15067: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 8s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 37s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 11m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 43s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 51s{color} | {color:red} hadoop-common in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 32s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 79m 7s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.ipc.TestRPC | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | HADOOP-15067 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12898933/HADOOP-15067.01.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux b9605eaef434 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 738d1a2 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-HADOOP-Build/13740/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/13740/testReport/ | | Max. process+thread count | 1360 (vs. ulimit of 5000) | | modules | C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/13740/console | | Powered by | Apache Yetus 0.7.0-SNAPSHOT http://yetus.apache.org | This
[jira] [Commented] (HADOOP-15066) Spurious error stopping secure datanode
[ https://issues.apache.org/jira/browse/HADOOP-15066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263356#comment-16263356 ] Bharat Viswanadham commented on HADOOP-15066: - Thanks [~arpitagarwal] for review. But one question I have is we do delete daemonpid file in hadoop_stop_daemon {code:java} if [[ "${pid}" = "${cur_pid}" ]]; then rm -f "${pidfile}" >/dev/null 2>&1 {code} And again in hadoop_stop_secure_daemon {code:java} if [[ "${daemon_pid}" = "${cur_daemon_pid}" ]]; then rm -f "${daemonpidfile}" >/dev/null 2>&1 {code} again we are trying to delete same file, not clear why we have delete logic 2 times. > Spurious error stopping secure datanode > --- > > Key: HADOOP-15066 > URL: https://issues.apache.org/jira/browse/HADOOP-15066 > Project: Hadoop Common > Issue Type: Bug > Components: scripts >Affects Versions: 3.0.0 >Reporter: Arpit Agarwal >Assignee: Bharat Viswanadham > Attachments: HADOOP-15066.00.patch, HADOOP-15066.01.patch > > > There is a spurious error when stopping a secure datanode. > {code} > # hdfs --daemon stop datanode > cat: /var/run/hadoop/hdfs//hadoop-hdfs-root-datanode.pid: No such file or > directory > WARNING: pid has changed for datanode, skip deleting pid file > cat: /var/run/hadoop/hdfs//hadoop-hdfs-root-datanode.pid: No such file or > directory > WARNING: daemon pid has changed for datanode, skip deleting daemon pid file > {code} > The error appears benign. The service was stopped correctly. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15066) Spurious error stopping secure datanode
[ https://issues.apache.org/jira/browse/HADOOP-15066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263355#comment-16263355 ] Hadoop QA commented on HADOOP-15066: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 8s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 48s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} shellcheck {color} | {color:green} 0m 3s{color} | {color:green} There were no new shellcheck issues. {color} | | {color:green}+1{color} | {color:green} shelldocs {color} | {color:green} 0m 8s{color} | {color:green} There were no new shelldocs issues. {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 18s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 11s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 42m 21s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | HADOOP-15066 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12898937/HADOOP-15066.00.patch | | Optional Tests | asflicense mvnsite unit shellcheck shelldocs | | uname | Linux f59f8d5b2f41 3.13.0-129-generic #178-Ubuntu SMP Fri Aug 11 12:48:20 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 738d1a2 | | maven | version: Apache Maven 3.3.9 | | shellcheck | v0.4.6 | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/13741/testReport/ | | Max. process+thread count | 341 (vs. ulimit of 5000) | | modules | C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/13741/console | | Powered by | Apache Yetus 0.7.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Spurious error stopping secure datanode > --- > > Key: HADOOP-15066 > URL: https://issues.apache.org/jira/browse/HADOOP-15066 > Project: Hadoop Common > Issue Type: Bug > Components: scripts >Affects Versions: 3.0.0 >Reporter: Arpit Agarwal >Assignee: Bharat Viswanadham > Attachments: HADOOP-15066.00.patch, HADOOP-15066.01.patch > > > There is a spurious error when stopping a secure datanode. > {code} > # hdfs --daemon stop datanode > cat: /var/run/hadoop/hdfs//hadoop-hdfs-root-datanode.pid: No such file or > directory > WARNING: pid has changed for datanode, skip deleting pid file > cat: /var/run/hadoop/hdfs//hadoop-hdfs-root-datanode.pid: No such file or > directory > WARNING: daemon pid has changed for datanode, skip deleting daemon pid file > {code} > The error appears benign. The service was stopped correctly. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe,
[jira] [Updated] (HADOOP-15066) Spurious error stopping secure datanode
[ https://issues.apache.org/jira/browse/HADOOP-15066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham updated HADOOP-15066: Attachment: HADOOP-15066.01.patch > Spurious error stopping secure datanode > --- > > Key: HADOOP-15066 > URL: https://issues.apache.org/jira/browse/HADOOP-15066 > Project: Hadoop Common > Issue Type: Bug > Components: scripts >Affects Versions: 3.0.0 >Reporter: Arpit Agarwal >Assignee: Bharat Viswanadham > Attachments: HADOOP-15066.00.patch, HADOOP-15066.01.patch > > > There is a spurious error when stopping a secure datanode. > {code} > # hdfs --daemon stop datanode > cat: /var/run/hadoop/hdfs//hadoop-hdfs-root-datanode.pid: No such file or > directory > WARNING: pid has changed for datanode, skip deleting pid file > cat: /var/run/hadoop/hdfs//hadoop-hdfs-root-datanode.pid: No such file or > directory > WARNING: daemon pid has changed for datanode, skip deleting daemon pid file > {code} > The error appears benign. The service was stopped correctly. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-15066) Spurious error stopping secure datanode
[ https://issues.apache.org/jira/browse/HADOOP-15066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263308#comment-16263308 ] Arpit Agarwal edited comment on HADOOP-15066 at 11/22/17 8:40 PM: -- Thanks for the fix [~bharatviswa]. Couple of comments: # The following can be replaced with elif: {code} else if [[ -f "${pidfile}" ]]; then {code} # We'd also need a fix in hadoop_stop_secure_daemon here. The pid equality checks should be skipped if the pid file no longer exists. {code} cur_daemon_pid=$(cat "$daemonpidfile") cur_priv_pid=$(cat "$privpidfile") if [[ "${daemon_pid}" = "${cur_daemon_pid}" ]]; then rm -f "${daemonpidfile}" >/dev/null 2>&1 else hadoop_error "WARNING: daemon pid has changed for ${command}, skip deleting daemon pid file" fi if [[ "${priv_pid}" = "${cur_priv_pid}" ]]; then rm -f "${privpidfile}" >/dev/null 2>&1 else hadoop_error "WARNING: priv pid has changed for ${command}, skip deleting priv pid file" fi {code} was (Author: arpitagarwal): Thanks for the fix [~bharatviswa]. Couple of comments: # The following can be replaced with elif: {code} else if [[ -f "${pidfile}" ]]; then {code} # We'd also need a fix in hadoop_stop_secure_daemon here. The pid equality checks should be skipped if the pid file no longer exists. {code} if [[ "${daemon_pid}" = "${cur_daemon_pid}" ]]; then rm -f "${daemonpidfile}" >/dev/null 2>&1 else hadoop_error "WARNING: daemon pid has changed for ${command}, skip deleting daemon pid file" fi if [[ "${priv_pid}" = "${cur_priv_pid}" ]]; then rm -f "${privpidfile}" >/dev/null 2>&1 else hadoop_error "WARNING: priv pid has changed for ${command}, skip deleting priv pid file" fi {code} > Spurious error stopping secure datanode > --- > > Key: HADOOP-15066 > URL: https://issues.apache.org/jira/browse/HADOOP-15066 > Project: Hadoop Common > Issue Type: Bug > Components: scripts >Affects Versions: 3.0.0 >Reporter: Arpit Agarwal >Assignee: Bharat Viswanadham > Attachments: HADOOP-15066.00.patch > > > There is a spurious error when stopping a secure datanode. > {code} > # hdfs --daemon stop datanode > cat: /var/run/hadoop/hdfs//hadoop-hdfs-root-datanode.pid: No such file or > directory > WARNING: pid has changed for datanode, skip deleting pid file > cat: /var/run/hadoop/hdfs//hadoop-hdfs-root-datanode.pid: No such file or > directory > WARNING: daemon pid has changed for datanode, skip deleting daemon pid file > {code} > The error appears benign. The service was stopped correctly. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15066) Spurious error stopping secure datanode
[ https://issues.apache.org/jira/browse/HADOOP-15066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263308#comment-16263308 ] Arpit Agarwal commented on HADOOP-15066: Thanks for the fix [~bharatviswa]. Couple of comments: # The following can be replaced with elif: {code} else if [[ -f "${pidfile}" ]]; then {code} # We'd also need a fix in hadoop_stop_secure_daemon here. The pid equality checks should be skipped if the pid file no longer exists. {code} if [[ "${daemon_pid}" = "${cur_daemon_pid}" ]]; then rm -f "${daemonpidfile}" >/dev/null 2>&1 else hadoop_error "WARNING: daemon pid has changed for ${command}, skip deleting daemon pid file" fi if [[ "${priv_pid}" = "${cur_priv_pid}" ]]; then rm -f "${privpidfile}" >/dev/null 2>&1 else hadoop_error "WARNING: priv pid has changed for ${command}, skip deleting priv pid file" fi {code} > Spurious error stopping secure datanode > --- > > Key: HADOOP-15066 > URL: https://issues.apache.org/jira/browse/HADOOP-15066 > Project: Hadoop Common > Issue Type: Bug > Components: scripts >Affects Versions: 3.0.0 >Reporter: Arpit Agarwal >Assignee: Bharat Viswanadham > Attachments: HADOOP-15066.00.patch > > > There is a spurious error when stopping a secure datanode. > {code} > # hdfs --daemon stop datanode > cat: /var/run/hadoop/hdfs//hadoop-hdfs-root-datanode.pid: No such file or > directory > WARNING: pid has changed for datanode, skip deleting pid file > cat: /var/run/hadoop/hdfs//hadoop-hdfs-root-datanode.pid: No such file or > directory > WARNING: daemon pid has changed for datanode, skip deleting daemon pid file > {code} > The error appears benign. The service was stopped correctly. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Assigned] (HADOOP-15066) Spurious error stopping secure datanode
[ https://issues.apache.org/jira/browse/HADOOP-15066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal reassigned HADOOP-15066: -- Assignee: Bharat Viswanadham > Spurious error stopping secure datanode > --- > > Key: HADOOP-15066 > URL: https://issues.apache.org/jira/browse/HADOOP-15066 > Project: Hadoop Common > Issue Type: Bug > Components: scripts >Affects Versions: 3.0.0 >Reporter: Arpit Agarwal >Assignee: Bharat Viswanadham > Attachments: HADOOP-15066.00.patch > > > There is a spurious error when stopping a secure datanode. > {code} > # hdfs --daemon stop datanode > cat: /var/run/hadoop/hdfs//hadoop-hdfs-root-datanode.pid: No such file or > directory > WARNING: pid has changed for datanode, skip deleting pid file > cat: /var/run/hadoop/hdfs//hadoop-hdfs-root-datanode.pid: No such file or > directory > WARNING: daemon pid has changed for datanode, skip deleting daemon pid file > {code} > The error appears benign. The service was stopped correctly. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15067) GC time percentage reported in JvmMetrics should be a gauge, not counter
[ https://issues.apache.org/jira/browse/HADOOP-15067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263302#comment-16263302 ] Andrew Wang commented on HADOOP-15067: -- SGTM, looks like a simple fix. > GC time percentage reported in JvmMetrics should be a gauge, not counter > > > Key: HADOOP-15067 > URL: https://issues.apache.org/jira/browse/HADOOP-15067 > Project: Hadoop Common > Issue Type: Bug >Reporter: Misha Dmitriev >Assignee: Misha Dmitriev > Attachments: HADOOP-15067.01.patch > > > A new GcTimeMonitor class has been recently added, and the corresponding > metrics added in JvmMetrics.java, line 190: > {code} > if (gcTimeMonitor != null) { > rb.addCounter(GcTimePercentage, > gcTimeMonitor.getLatestGcData().getGcTimePercentage()); > } > {code} > Since GC time percentage can go up and down, a gauge rather than counter > should be used to report it. That is, {{addCounter}} should be replaced with > {{addGauge}} above. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15066) Spurious error stopping secure datanode
[ https://issues.apache.org/jira/browse/HADOOP-15066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham updated HADOOP-15066: Status: Patch Available (was: Open) > Spurious error stopping secure datanode > --- > > Key: HADOOP-15066 > URL: https://issues.apache.org/jira/browse/HADOOP-15066 > Project: Hadoop Common > Issue Type: Bug > Components: scripts >Affects Versions: 3.0.0 >Reporter: Arpit Agarwal > Attachments: HADOOP-15066.00.patch > > > There is a spurious error when stopping a secure datanode. > {code} > # hdfs --daemon stop datanode > cat: /var/run/hadoop/hdfs//hadoop-hdfs-root-datanode.pid: No such file or > directory > WARNING: pid has changed for datanode, skip deleting pid file > cat: /var/run/hadoop/hdfs//hadoop-hdfs-root-datanode.pid: No such file or > directory > WARNING: daemon pid has changed for datanode, skip deleting daemon pid file > {code} > The error appears benign. The service was stopped correctly. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15066) Spurious error stopping secure datanode
[ https://issues.apache.org/jira/browse/HADOOP-15066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham updated HADOOP-15066: Attachment: HADOOP-15066.00.patch > Spurious error stopping secure datanode > --- > > Key: HADOOP-15066 > URL: https://issues.apache.org/jira/browse/HADOOP-15066 > Project: Hadoop Common > Issue Type: Bug > Components: scripts >Affects Versions: 3.0.0 >Reporter: Arpit Agarwal > Attachments: HADOOP-15066.00.patch > > > There is a spurious error when stopping a secure datanode. > {code} > # hdfs --daemon stop datanode > cat: /var/run/hadoop/hdfs//hadoop-hdfs-root-datanode.pid: No such file or > directory > WARNING: pid has changed for datanode, skip deleting pid file > cat: /var/run/hadoop/hdfs//hadoop-hdfs-root-datanode.pid: No such file or > directory > WARNING: daemon pid has changed for datanode, skip deleting daemon pid file > {code} > The error appears benign. The service was stopped correctly. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14876) Create downstream developer docs from the compatibility guidelines
[ https://issues.apache.org/jira/browse/HADOOP-14876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton updated HADOOP-14876: -- Fix Version/s: (was: 3.0.1) 3.0.0 > Create downstream developer docs from the compatibility guidelines > -- > > Key: HADOOP-14876 > URL: https://issues.apache.org/jira/browse/HADOOP-14876 > Project: Hadoop Common > Issue Type: Improvement > Components: documentation >Affects Versions: 3.0.0-beta1 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Critical > Fix For: 3.0.0, 3.1.0 > > Attachments: Compatibility.pdf, DownstreamDev.pdf, > HADOOP-14876.001.patch, HADOOP-14876.002.patch, HADOOP-14876.003.patch, > HADOOP-14876.004.patch, HADOOP-14876.005.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-15067) GC time percentage reported in JvmMetrics should be a gauge, not counter
[ https://issues.apache.org/jira/browse/HADOOP-15067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263285#comment-16263285 ] Xiao Chen edited comment on HADOOP-15067 at 11/22/17 8:22 PM: -- +1 pending jenkins. Thanks Misha. Also thanks [~xkrogen] for the good catch. Do you have any other comments Erik? [~andrew.wang] FYI this would be a useful supportability fix that we'd like to add to 3.0.0, so downstream could use hadoop-3.0.0 package. was (Author: xiaochen): +1 pending jenkins. Thanks Misha. [~andrew.wang] FYI this would be a useful supportability fix that we'd like to add to 3.0.0, so downstream could use hadoop-3.0.0 package. > GC time percentage reported in JvmMetrics should be a gauge, not counter > > > Key: HADOOP-15067 > URL: https://issues.apache.org/jira/browse/HADOOP-15067 > Project: Hadoop Common > Issue Type: Bug >Reporter: Misha Dmitriev >Assignee: Misha Dmitriev > Attachments: HADOOP-15067.01.patch > > > A new GcTimeMonitor class has been recently added, and the corresponding > metrics added in JvmMetrics.java, line 190: > {code} > if (gcTimeMonitor != null) { > rb.addCounter(GcTimePercentage, > gcTimeMonitor.getLatestGcData().getGcTimePercentage()); > } > {code} > Since GC time percentage can go up and down, a gauge rather than counter > should be used to report it. That is, {{addCounter}} should be replaced with > {{addGauge}} above. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15067) GC time percentage reported in JvmMetrics should be a gauge, not counter
[ https://issues.apache.org/jira/browse/HADOOP-15067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263285#comment-16263285 ] Xiao Chen commented on HADOOP-15067: +1 pending jenkins. Thanks Misha. [~andrew.wang] FYI this would be a useful supportability fix that we'd like to add to 3.0.0, so downstream could use hadoop-3.0.0 package. > GC time percentage reported in JvmMetrics should be a gauge, not counter > > > Key: HADOOP-15067 > URL: https://issues.apache.org/jira/browse/HADOOP-15067 > Project: Hadoop Common > Issue Type: Bug >Reporter: Misha Dmitriev >Assignee: Misha Dmitriev > Attachments: HADOOP-15067.01.patch > > > A new GcTimeMonitor class has been recently added, and the corresponding > metrics added in JvmMetrics.java, line 190: > {code} > if (gcTimeMonitor != null) { > rb.addCounter(GcTimePercentage, > gcTimeMonitor.getLatestGcData().getGcTimePercentage()); > } > {code} > Since GC time percentage can go up and down, a gauge rather than counter > should be used to report it. That is, {{addCounter}} should be replaced with > {{addGauge}} above. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14876) Create downstream developer docs from the compatibility guidelines
[ https://issues.apache.org/jira/browse/HADOOP-14876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263264#comment-16263264 ] Andrew Wang commented on HADOOP-14876: -- I like better docs, please go ahead and backport. Thanks! > Create downstream developer docs from the compatibility guidelines > -- > > Key: HADOOP-14876 > URL: https://issues.apache.org/jira/browse/HADOOP-14876 > Project: Hadoop Common > Issue Type: Improvement > Components: documentation >Affects Versions: 3.0.0-beta1 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Critical > Fix For: 3.1.0, 3.0.1 > > Attachments: Compatibility.pdf, DownstreamDev.pdf, > HADOOP-14876.001.patch, HADOOP-14876.002.patch, HADOOP-14876.003.patch, > HADOOP-14876.004.patch, HADOOP-14876.005.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15067) GC time percentage reported in JvmMetrics should be a gauge, not counter
[ https://issues.apache.org/jira/browse/HADOOP-15067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Misha Dmitriev updated HADOOP-15067: Status: Patch Available (was: In Progress) > GC time percentage reported in JvmMetrics should be a gauge, not counter > > > Key: HADOOP-15067 > URL: https://issues.apache.org/jira/browse/HADOOP-15067 > Project: Hadoop Common > Issue Type: Bug >Reporter: Misha Dmitriev >Assignee: Misha Dmitriev > Attachments: HADOOP-15067.01.patch > > > A new GcTimeMonitor class has been recently added, and the corresponding > metrics added in JvmMetrics.java, line 190: > {code} > if (gcTimeMonitor != null) { > rb.addCounter(GcTimePercentage, > gcTimeMonitor.getLatestGcData().getGcTimePercentage()); > } > {code} > Since GC time percentage can go up and down, a gauge rather than counter > should be used to report it. That is, {{addCounter}} should be replaced with > {{addGauge}} above. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Work started] (HADOOP-15067) GC time percentage reported in JvmMetrics should be a gauge, not counter
[ https://issues.apache.org/jira/browse/HADOOP-15067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HADOOP-15067 started by Misha Dmitriev. --- > GC time percentage reported in JvmMetrics should be a gauge, not counter > > > Key: HADOOP-15067 > URL: https://issues.apache.org/jira/browse/HADOOP-15067 > Project: Hadoop Common > Issue Type: Bug >Reporter: Misha Dmitriev >Assignee: Misha Dmitriev > Attachments: HADOOP-15067.01.patch > > > A new GcTimeMonitor class has been recently added, and the corresponding > metrics added in JvmMetrics.java, line 190: > {code} > if (gcTimeMonitor != null) { > rb.addCounter(GcTimePercentage, > gcTimeMonitor.getLatestGcData().getGcTimePercentage()); > } > {code} > Since GC time percentage can go up and down, a gauge rather than counter > should be used to report it. That is, {{addCounter}} should be replaced with > {{addGauge}} above. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15067) GC time percentage reported in JvmMetrics should be a gauge, not counter
[ https://issues.apache.org/jira/browse/HADOOP-15067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Misha Dmitriev updated HADOOP-15067: Attachment: HADOOP-15067.01.patch > GC time percentage reported in JvmMetrics should be a gauge, not counter > > > Key: HADOOP-15067 > URL: https://issues.apache.org/jira/browse/HADOOP-15067 > Project: Hadoop Common > Issue Type: Bug >Reporter: Misha Dmitriev >Assignee: Misha Dmitriev > Attachments: HADOOP-15067.01.patch > > > A new GcTimeMonitor class has been recently added, and the corresponding > metrics added in JvmMetrics.java, line 190: > {code} > if (gcTimeMonitor != null) { > rb.addCounter(GcTimePercentage, > gcTimeMonitor.getLatestGcData().getGcTimePercentage()); > } > {code} > Since GC time percentage can go up and down, a gauge rather than counter > should be used to report it. That is, {{addCounter}} should be replaced with > {{addGauge}} above. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13282) S3 blob etags to be made visible in status/getFileChecksum() calls
[ https://issues.apache.org/jira/browse/HADOOP-13282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263248#comment-16263248 ] Hadoop QA commented on HADOOP-13282: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 8s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 17s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 10s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 58s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 18s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 11m 26s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 2m 0s{color} | {color:orange} root: The patch generated 2 new + 3 unchanged - 0 fixed = 5 total (was 3) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 38s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 3 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 46s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 17s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 34s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 39s{color} | {color:green} hadoop-aws in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 32s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 94m 28s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | HADOOP-13282 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12898907/HADOOP-13282-004.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 166552a9b586 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / d42a336 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | checkstyle |
[jira] [Commented] (HADOOP-14960) Add GC time percentage monitor/alerter
[ https://issues.apache.org/jira/browse/HADOOP-14960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263222#comment-16263222 ] Misha Dmitriev commented on HADOOP-14960: - Created https://issues.apache.org/jira/browse/HADOOP-15067, will post a patch momentarily. > Add GC time percentage monitor/alerter > -- > > Key: HADOOP-14960 > URL: https://issues.apache.org/jira/browse/HADOOP-14960 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Misha Dmitriev >Assignee: Misha Dmitriev > Fix For: 3.0.0, 2.10.0 > > Attachments: HADOOP-14960.01.patch, HADOOP-14960.02.patch, > HADOOP-14960.03.patch, HADOOP-14960.04.patch > > > Currently class {{org.apache.hadoop.metrics2.source.JvmMetrics}} provides > several metrics related to GC. Unfortunately, all these metrics are not as > useful as they could be, because they don't answer the first and most > important question related to GC and JVM health: what percentage of time my > JVM is paused in GC? This percentage, calculated as the sum of the GC pauses > over some period, like 1 minute, divided by that period - is the most > convenient measure of the GC health because: > - it is just one number, and it's clear that, say, 1..5% is good, but 80..90% > is really bad > - it allows for easy apple-to-apple comparison between runs, even between > different apps > - when this metric reaches some critical value like 70%, it almost always > indicates a "GC death spiral", from which the app can recover only if it > drops some task(s) etc. > The existing "total GC time", "total number of GCs" etc. metrics only give > numbers that can be used to rougly estimate this percentage. Thus it is > suggested to add a new metric to this class, and possibly allow users to > register handlers that will be automatically invoked if this metric reaches > the specified threshold. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Assigned] (HADOOP-15067) GC time percentage reported in JvmMetrics should be a gauge, not counter
[ https://issues.apache.org/jira/browse/HADOOP-15067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Misha Dmitriev reassigned HADOOP-15067: --- Assignee: Misha Dmitriev > GC time percentage reported in JvmMetrics should be a gauge, not counter > > > Key: HADOOP-15067 > URL: https://issues.apache.org/jira/browse/HADOOP-15067 > Project: Hadoop Common > Issue Type: Bug >Reporter: Misha Dmitriev >Assignee: Misha Dmitriev > > A new GcTimeMonitor class has been recently added, and the corresponding > metrics added in JvmMetrics.java, line 190: > {code} > if (gcTimeMonitor != null) { > rb.addCounter(GcTimePercentage, > gcTimeMonitor.getLatestGcData().getGcTimePercentage()); > } > {code} > Since GC time percentage can go up and down, a gauge rather than counter > should be used to report it. That is, {{addCounter}} should be replaced with > {{addGauge}} above. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15067) GC time percentage reported in JvmMetrics should be a gauge, not counter
Misha Dmitriev created HADOOP-15067: --- Summary: GC time percentage reported in JvmMetrics should be a gauge, not counter Key: HADOOP-15067 URL: https://issues.apache.org/jira/browse/HADOOP-15067 Project: Hadoop Common Issue Type: Bug Reporter: Misha Dmitriev A new GcTimeMonitor class has been recently added, and the corresponding metrics added in JvmMetrics.java, line 190: {code} if (gcTimeMonitor != null) { rb.addCounter(GcTimePercentage, gcTimeMonitor.getLatestGcData().getGcTimePercentage()); } {code} Since GC time percentage can go up and down, a gauge rather than counter should be used to report it. That is, {{addCounter}} should be replaced with {{addGauge}} above. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13493) Compatibility Docs should clarify the policy for what takes precedence when a conflict is found
[ https://issues.apache.org/jira/browse/HADOOP-13493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263219#comment-16263219 ] Hadoop QA commented on HADOOP-13493: | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 9m 47s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 26m 42s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 5s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 49m 9s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | HADOOP-13493 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12898916/HADOOP-13493.002.patch | | Optional Tests | asflicense mvnsite | | uname | Linux 900f2f4b879b 3.13.0-129-generic #178-Ubuntu SMP Fri Aug 11 12:48:20 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 785732c | | maven | version: Apache Maven 3.3.9 | | Max. process+thread count | 341 (vs. ulimit of 5000) | | modules | C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/13739/console | | Powered by | Apache Yetus 0.7.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Compatibility Docs should clarify the policy for what takes precedence when a > conflict is found > --- > > Key: HADOOP-13493 > URL: https://issues.apache.org/jira/browse/HADOOP-13493 > Project: Hadoop Common > Issue Type: Task > Components: documentation >Affects Versions: 2.7.2 >Reporter: Robert Kanter >Assignee: Daniel Templeton >Priority: Critical > Attachments: HADOOP-13493.001.patch, HADOOP-13493.002.patch > > > The Compatibility Docs > (https://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/Compatibility.html#Java_API) > list the policies for Private, Public, not annotated, etc Classes and > members, but it doesn't say what happens when there's a conflict. We should > try obviously try to avoid this situation, but it would be good to explicitly > state what takes precedence. > As an example, until YARN-3225 made it consistent, {{RefreshNodesRequest}} > looked like this: > {code:java} > @Private > @Stable > public abstract class RefreshNodesRequest { > @Public > @Stable > public static RefreshNodesRequest newInstance() { > RefreshNodesRequest request = > Records.newRecord(RefreshNodesRequest.class); > return request; > } > } > {code} > Note that the class is marked {{\@Private}}, but the method is marked > {{\@Public}}. > In this example, I'd say that the class level should have priority. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15066) Spurious error stopping secure datanode
[ https://issues.apache.org/jira/browse/HADOOP-15066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HADOOP-15066: --- Description: There is a spurious error when stopping a secure datanode. {code} # hdfs --daemon stop datanode cat: /var/run/hadoop/hdfs//hadoop-hdfs-root-datanode.pid: No such file or directory WARNING: pid has changed for datanode, skip deleting pid file cat: /var/run/hadoop/hdfs//hadoop-hdfs-root-datanode.pid: No such file or directory WARNING: daemon pid has changed for datanode, skip deleting daemon pid file {code} The error appears benign. The service was stopped correctly. was: Looks like there is a spurious error when stopping a secure datanode. {code} # hdfs --daemon stop datanode cat: /var/run/hadoop/hdfs//hadoop-hdfs-root-datanode.pid: No such file or directory WARNING: pid has changed for datanode, skip deleting pid file cat: /var/run/hadoop/hdfs//hadoop-hdfs-root-datanode.pid: No such file or directory WARNING: daemon pid has changed for datanode, skip deleting daemon pid file {code} > Spurious error stopping secure datanode > --- > > Key: HADOOP-15066 > URL: https://issues.apache.org/jira/browse/HADOOP-15066 > Project: Hadoop Common > Issue Type: Bug > Components: scripts >Affects Versions: 3.0.0 >Reporter: Arpit Agarwal > > There is a spurious error when stopping a secure datanode. > {code} > # hdfs --daemon stop datanode > cat: /var/run/hadoop/hdfs//hadoop-hdfs-root-datanode.pid: No such file or > directory > WARNING: pid has changed for datanode, skip deleting pid file > cat: /var/run/hadoop/hdfs//hadoop-hdfs-root-datanode.pid: No such file or > directory > WARNING: daemon pid has changed for datanode, skip deleting daemon pid file > {code} > The error appears benign. The service was stopped correctly. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15066) Spurious error stopping secure datanode
[ https://issues.apache.org/jira/browse/HADOOP-15066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263168#comment-16263168 ] Arpit Agarwal commented on HADOOP-15066: The error is from {{hadoop_stop_daemon}} in _hadoop-functions.sh_. {code} pid=$(cat "$pidfile") kill "${pid}" >/dev/null 2>&1 ... cur_pid=$(cat "$pidfile") ... if [[ "${pid}" = "${cur_pid}" ]]; then rm -f "${pidfile}" >/dev/null 2>&1 else hadoop_error "WARNING: pid has changed for ${cmd}, skip deleting pid file" {code} It looks like jsvc auto-deletes the pid file when the process is killed with SIGTERM. The check for changed pid likely needs to be skipped if the pid file doesn't exist. > Spurious error stopping secure datanode > --- > > Key: HADOOP-15066 > URL: https://issues.apache.org/jira/browse/HADOOP-15066 > Project: Hadoop Common > Issue Type: Bug > Components: scripts >Affects Versions: 3.0.0 >Reporter: Arpit Agarwal > > Looks like there is a spurious error when stopping a secure datanode. > {code} > # hdfs --daemon stop datanode > cat: /var/run/hadoop/hdfs//hadoop-hdfs-root-datanode.pid: No such file or > directory > WARNING: pid has changed for datanode, skip deleting pid file > cat: /var/run/hadoop/hdfs//hadoop-hdfs-root-datanode.pid: No such file or > directory > WARNING: daemon pid has changed for datanode, skip deleting daemon pid file > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15066) Spurious error stopping secure datanode
Arpit Agarwal created HADOOP-15066: -- Summary: Spurious error stopping secure datanode Key: HADOOP-15066 URL: https://issues.apache.org/jira/browse/HADOOP-15066 Project: Hadoop Common Issue Type: Bug Components: scripts Affects Versions: 3.0.0 Reporter: Arpit Agarwal Looks like there is a spurious error when stopping a secure datanode. {code} # hdfs --daemon stop datanode cat: /var/run/hadoop/hdfs//hadoop-hdfs-root-datanode.pid: No such file or directory WARNING: pid has changed for datanode, skip deleting pid file cat: /var/run/hadoop/hdfs//hadoop-hdfs-root-datanode.pid: No such file or directory WARNING: daemon pid has changed for datanode, skip deleting daemon pid file {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14600) LocatedFileStatus constructor forces RawLocalFS to exec a process to get the permissions
[ https://issues.apache.org/jira/browse/HADOOP-14600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263136#comment-16263136 ] Hadoop QA commented on HADOOP-14600: | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 9m 27s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 34s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 54s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 11m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 11m 56s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 41s{color} | {color:orange} hadoop-common-project/hadoop-common: The patch generated 19 new + 227 unchanged - 1 fixed = 246 total (was 228) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 50s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 15s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 34s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 89m 49s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | HADOOP-14600 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12898893/HADOOP-14600.009.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle cc | | uname | Linux ece29766f12c 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / de8b6ca | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-HADOOP-Build/13737/artifact/out/diff-checkstyle-hadoop-common-project_hadoop-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/13737/testReport/ | | Max. process+thread count | 1570 (vs. ulimit of 5000) | | modules | C: hadoop-common-project/hadoop-common U:
[jira] [Updated] (HADOOP-13493) Compatibility Docs should clarify the policy for what takes precedence when a conflict is found
[ https://issues.apache.org/jira/browse/HADOOP-13493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton updated HADOOP-13493: -- Attachment: HADOOP-13493.002.patch You're right--that patch was useless. :) Try this. > Compatibility Docs should clarify the policy for what takes precedence when a > conflict is found > --- > > Key: HADOOP-13493 > URL: https://issues.apache.org/jira/browse/HADOOP-13493 > Project: Hadoop Common > Issue Type: Task > Components: documentation >Affects Versions: 2.7.2 >Reporter: Robert Kanter >Assignee: Daniel Templeton >Priority: Critical > Attachments: HADOOP-13493.001.patch, HADOOP-13493.002.patch > > > The Compatibility Docs > (https://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/Compatibility.html#Java_API) > list the policies for Private, Public, not annotated, etc Classes and > members, but it doesn't say what happens when there's a conflict. We should > try obviously try to avoid this situation, but it would be good to explicitly > state what takes precedence. > As an example, until YARN-3225 made it consistent, {{RefreshNodesRequest}} > looked like this: > {code:java} > @Private > @Stable > public abstract class RefreshNodesRequest { > @Public > @Stable > public static RefreshNodesRequest newInstance() { > RefreshNodesRequest request = > Records.newRecord(RefreshNodesRequest.class); > return request; > } > } > {code} > Note that the class is marked {{\@Private}}, but the method is marked > {{\@Public}}. > In this example, I'd say that the class level should have priority. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13282) S3 blob etags to be made visible in status/getFileChecksum() calls
[ https://issues.apache.org/jira/browse/HADOOP-13282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-13282: Status: Patch Available (was: Open) > S3 blob etags to be made visible in status/getFileChecksum() calls > -- > > Key: HADOOP-13282 > URL: https://issues.apache.org/jira/browse/HADOOP-13282 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.9.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Attachments: HADOOP-13282-001.patch, HADOOP-13282-002.patch, > HADOOP-13282-003.patch, HADOOP-13282-004.patch > > > If the etags of blobs were exported via {{getFileChecksum()}}, it'd be > possible to probe for a blob being in sync with a local file. Distcp could > use this to decide whether to skip a file or not. > Now, there's a problem there: distcp needs source and dest filesystems to > implement the same algorithm. It'd only work out the box if you were copying > between S3 instances. There are also quirks with encryption and multipart: > [s3 > docs|http://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonResponseHeaders.html]. > At the very least, it's something which could be used when indexing the FS, > to check for changes later. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13282) S3 blob etags to be made visible in status/getFileChecksum() calls
[ https://issues.apache.org/jira/browse/HADOOP-13282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-13282: Attachment: HADOOP-13282-004.patch Patch 004 This applies to s3a trunk, uses once() to translate the underlying getObjectMetadata call (which is retryraw). Test: s3 london with default encryption > S3 blob etags to be made visible in status/getFileChecksum() calls > -- > > Key: HADOOP-13282 > URL: https://issues.apache.org/jira/browse/HADOOP-13282 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.9.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Attachments: HADOOP-13282-001.patch, HADOOP-13282-002.patch, > HADOOP-13282-003.patch, HADOOP-13282-004.patch > > > If the etags of blobs were exported via {{getFileChecksum()}}, it'd be > possible to probe for a blob being in sync with a local file. Distcp could > use this to decide whether to skip a file or not. > Now, there's a problem there: distcp needs source and dest filesystems to > implement the same algorithm. It'd only work out the box if you were copying > between S3 instances. There are also quirks with encryption and multipart: > [s3 > docs|http://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonResponseHeaders.html]. > At the very least, it's something which could be used when indexing the FS, > to check for changes later. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13493) Compatibility Docs should clarify the policy for what takes precedence when a conflict is found
[ https://issues.apache.org/jira/browse/HADOOP-13493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton updated HADOOP-13493: -- Target Version/s: 3.0.0, 3.1.0 (was: 3.1.0) > Compatibility Docs should clarify the policy for what takes precedence when a > conflict is found > --- > > Key: HADOOP-13493 > URL: https://issues.apache.org/jira/browse/HADOOP-13493 > Project: Hadoop Common > Issue Type: Task > Components: documentation >Affects Versions: 2.7.2 >Reporter: Robert Kanter >Assignee: Daniel Templeton >Priority: Critical > Attachments: HADOOP-13493.001.patch > > > The Compatibility Docs > (https://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/Compatibility.html#Java_API) > list the policies for Private, Public, not annotated, etc Classes and > members, but it doesn't say what happens when there's a conflict. We should > try obviously try to avoid this situation, but it would be good to explicitly > state what takes precedence. > As an example, until YARN-3225 made it consistent, {{RefreshNodesRequest}} > looked like this: > {code:java} > @Private > @Stable > public abstract class RefreshNodesRequest { > @Public > @Stable > public static RefreshNodesRequest newInstance() { > RefreshNodesRequest request = > Records.newRecord(RefreshNodesRequest.class); > return request; > } > } > {code} > Note that the class is marked {{\@Private}}, but the method is marked > {{\@Public}}. > In this example, I'd say that the class level should have priority. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14876) Create downstream developer docs from the compatibility guidelines
[ https://issues.apache.org/jira/browse/HADOOP-14876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263036#comment-16263036 ] Daniel Templeton commented on HADOOP-14876: --- [~andrew.wang], can we pull this in for the respin of 3.0.0 RC? > Create downstream developer docs from the compatibility guidelines > -- > > Key: HADOOP-14876 > URL: https://issues.apache.org/jira/browse/HADOOP-14876 > Project: Hadoop Common > Issue Type: Improvement > Components: documentation >Affects Versions: 3.0.0-beta1 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Critical > Fix For: 3.1.0, 3.0.1 > > Attachments: Compatibility.pdf, DownstreamDev.pdf, > HADOOP-14876.001.patch, HADOOP-14876.002.patch, HADOOP-14876.003.patch, > HADOOP-14876.004.patch, HADOOP-14876.005.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-14303) Review retry logic on all S3 SDK calls, implement where needed
[ https://issues.apache.org/jira/browse/HADOOP-14303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-14303. - Resolution: Fixed Fix Version/s: 3.1.0 Fixed in HADOOP-13786, with * Java 8 lambdas API for invoking S3A operations with retry and error translation * All methods calling of the S3 client marked up with their (current) retry logic to make clear what's happening and when you don't need to add retry code around retry code. * metrics & stats to track retries * testing through fault injection * What seems a good initial Policy (S3ARetryPolicy). Always scope for tuning there, especially "what to do about the 400 error code?" For now: treating as retryable on all call types (idempotent/non-idempotent) in the hope its transient. Fail fast, or at least "fail medium" may be better though. > Review retry logic on all S3 SDK calls, implement where needed > -- > > Key: HADOOP-14303 > URL: https://issues.apache.org/jira/browse/HADOOP-14303 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.8.0 >Reporter: Steve Loughran >Assignee: Steve Loughran > Fix For: 3.1.0 > > > AWS S3, IAM, KMS, DDB etc all throttle callers: the S3A code needs to handle > this without failing, as if it slows down its requests it can recover. > 1. Look at all the places where we are calling S3A via the AWS SDK and make > sure we are retrying with some backoff & jitter policy, ideally something > unified. This must be more systematic than the case-by-case, > problem-by-problem strategy we are implicitly using. > 2. Many of the AWS S3 SDK calls do implement retry (e.g PUT/multipart PUT), > but we need to check the other parts of the process: login, initiate/complete > MPU, ... > Related > HADOOP-13811 Failed to sanitize XML document destined for handler class > HADOOP-13664 S3AInputStream to use a retry policy on read failures > This stuff is all hard to test. A key need is to be able to differentiate > recoverable throttle & network failures from unrecoverable problems like: > auth, network config (e.g bad endpoint), etc. > May be the opportunity to add a faulting subclass of Amazon S3 client which > can be configured in IT Tests to fail at specific points. Ryan Blue's mcok S3 > client does this in HADOOP-13786, but it is for 100% mock. I'm thinking of > something with similar fault raising, but in front of the real S3A client -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-14161) Failed to rename file in S3A during FileOutputFormat commitTask
[ https://issues.apache.org/jira/browse/HADOOP-14161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-14161. - Resolution: Won't Fix Fix Version/s: 3.1.0 I'm closing this as a WONTFIX because the classic FileOutputFormat committer isn't the right way to work with data in S3. It should work with HADOOP-13345 and the consistent listings there, but performance will still suffer. # Short term (Hadoop 2.9+): use S3Guard for the consistency you need # Longer term: Hadoop 3.1+: use the S3A Committers for the performance you want > Failed to rename file in S3A during FileOutputFormat commitTask > --- > > Key: HADOOP-14161 > URL: https://issues.apache.org/jira/browse/HADOOP-14161 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 2.7.0, 2.7.1, 2.7.2, 2.7.3 > Environment: spark 2.0.2 with mesos > hadoop 2.7.2 >Reporter: Luke Miner >Priority: Minor > Fix For: 3.1.0 > > > I'm getting non deterministic rename errors while writing to S3 using spark > and hadoop. The proper permissions are set and this only happens > occasionally. It can happen on a job that is as simple as reading in json, > repartitioning and then writing out. After this failure occurs, the overall > job hangs indefinitely. > {code} > org.apache.spark.SparkException: Task failed while writing rows > at > org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:261) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) > at org.apache.spark.scheduler.Task.run(Task.scala:86) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.RuntimeException: Failed to commit task > at > org.apache.spark.sql.execution.datasources.DefaultWriterContainer.org$apache$spark$sql$execution$datasources$DefaultWriterContainer$$commitTask$1(WriterContainer.scala:275) > at > org.apache.spark.sql.execution.datasources.DefaultWriterContainer$$anonfun$writeRows$1.apply$mcV$sp(WriterContainer.scala:257) > at > org.apache.spark.sql.execution.datasources.DefaultWriterContainer$$anonfun$writeRows$1.apply(WriterContainer.scala:252) > at > org.apache.spark.sql.execution.datasources.DefaultWriterContainer$$anonfun$writeRows$1.apply(WriterContainer.scala:252) > at > org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1348) > at > org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:258) > ... 8 more > Caused by: java.io.IOException: Failed to rename > S3AFileStatus{path=s3a://foo/_temporary/0/_temporary/attempt_201703081855_0018_m_000966_0/part-r-00966-615ed714-58c1-4b89-be56-e47966737c75.snappy.parquet; > isDirectory=false; length=111225342; replication=1; blocksize=33554432; > modification_time=1488999342000; access_time=0; owner=; group=; > permission=rw-rw-rw-; isSymlink=false} to > s3a://foo/part-r-00966-615ed714-58c1-4b89-be56-e47966737c75.snappy.parquet > at > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:415) > at > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:428) > at > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:539) > at > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:502) > at > org.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:50) > at > org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:76) > at > org.apache.spark.sql.execution.datasources.BaseWriterContainer.commitTask(WriterContainer.scala:211) > at > org.apache.spark.sql.execution.datasources.DefaultWriterContainer.org$apache$spark$sql$execution$datasources$DefaultWriterContainer$$commitTask$1(WriterContainer.scala:270) > ... 13 more > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe,
[jira] [Resolved] (HADOOP-13811) s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to sanitize XML document destined for handler class
[ https://issues.apache.org/jira/browse/HADOOP-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-13811. - Resolution: Fixed Assignee: Steve Loughran Fix Version/s: 3.1.0 Fixed in HADOOP-13786; client calls are retried on idempotent calls, of which getFileStatus is > s3a: getFileStatus fails with com.amazonaws.AmazonClientException: Failed to > sanitize XML document destined for handler class > - > > Key: HADOOP-13811 > URL: https://issues.apache.org/jira/browse/HADOOP-13811 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.8.0, 2.7.3 >Reporter: Steve Loughran >Assignee: Steve Loughran > Fix For: 3.1.0 > > > Sometimes, occasionally, getFileStatus() fails with a stack trace starting > with {{com.amazonaws.AmazonClientException: Failed to sanitize XML document > destined for handler class}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-14381) S3AUtils.translateException to map 503 reponse to => throttling failure
[ https://issues.apache.org/jira/browse/HADOOP-14381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-14381. - Resolution: Fixed Fix Version/s: 3.1.0 Fixed in HADOOP-13786; inconsistent s3 client generated throttle events and so can be used to test this. There's a also a metric/statistic on the # fielded at the S3A level. AWS SDK handles a lot of throttling internally, these values aren't picked up > S3AUtils.translateException to map 503 reponse to => throttling failure > --- > > Key: HADOOP-14381 > URL: https://issues.apache.org/jira/browse/HADOOP-14381 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.8.0 >Reporter: Steve Loughran > Fix For: 3.1.0 > > > When AWS S3 returns "503", it means that the overall set of requests on a > part of an S3 bucket exceeds the permitted limit; the client(s) need to > throttle back or away for some rebalancing to complete. > The aws SDK retries 3 times on a 503, but then throws it up. Our code doesn't > do anything with that other than create a generic {{AWSS3IOException}}. > Proposed > * add a new exception, {{AWSOverloadedException}} > * raise it on a 503 from S3 (& for s3guard, on DDB complaints) > * have it include a link to a wiki page on the topic, as well as the path > * and any other diags > Code talking to S3 may then be able to catch this and choose to react. Some > retry with exponential backoff is the obvious option. Failing, well, that > could trigger task reattempts at that part of the query, then job retry > —which will again fail, *unless the number of tasks run in parallel is > reduced* > As this throttling is across all clients talking to the same part of a > bucket, fixing it is potentially a high level option. We can at least start > by reporting things better -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Assigned] (HADOOP-14381) S3AUtils.translateException to map 503 reponse to => throttling failure
[ https://issues.apache.org/jira/browse/HADOOP-14381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran reassigned HADOOP-14381: --- Assignee: Steve Loughran > S3AUtils.translateException to map 503 reponse to => throttling failure > --- > > Key: HADOOP-14381 > URL: https://issues.apache.org/jira/browse/HADOOP-14381 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.8.0 >Reporter: Steve Loughran >Assignee: Steve Loughran > Fix For: 3.1.0 > > > When AWS S3 returns "503", it means that the overall set of requests on a > part of an S3 bucket exceeds the permitted limit; the client(s) need to > throttle back or away for some rebalancing to complete. > The aws SDK retries 3 times on a 503, but then throws it up. Our code doesn't > do anything with that other than create a generic {{AWSS3IOException}}. > Proposed > * add a new exception, {{AWSOverloadedException}} > * raise it on a 503 from S3 (& for s3guard, on DDB complaints) > * have it include a link to a wiki page on the topic, as well as the path > * and any other diags > Code talking to S3 may then be able to catch this and choose to react. Some > retry with exponential backoff is the obvious option. Failing, well, that > could trigger task reattempts at that part of the query, then job retry > —which will again fail, *unless the number of tasks run in parallel is > reduced* > As this throttling is across all clients talking to the same part of a > bucket, fixing it is potentially a high level option. We can at least start > by reporting things better -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-13205) S3A to support custom retry policies; failfast on unknown host
[ https://issues.apache.org/jira/browse/HADOOP-13205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-13205. - Resolution: Fixed Fix Version/s: 3.1.0 Fixed in HADOOP-13786 > S3A to support custom retry policies; failfast on unknown host > -- > > Key: HADOOP-13205 > URL: https://issues.apache.org/jira/browse/HADOOP-13205 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.8.0 >Reporter: Steve Loughran >Assignee: Steve Loughran > Fix For: 3.1.0 > > > Noticed today that when connections are down, S3A retries on > UnknownHostExceptions logging noisily in the process. > # it should be possible to define or customize retry policies for an FS > instance (fail fast, exponential backoff, etc) > # we may want to explicitly have a fail-fast-if-offline retry policy, > catching the common connectivity ones. > Testing will be fun here. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-13664) S3AInputStream to use a retry policy on read failures
[ https://issues.apache.org/jira/browse/HADOOP-13664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-13664. - Resolution: Duplicate Assignee: Steve Loughran Fix Version/s: 3.1.0 included in HADOOP-13786: attempts to reopen the connection are wrapped with retry logic > S3AInputStream to use a retry policy on read failures > - > > Key: HADOOP-13664 > URL: https://issues.apache.org/jira/browse/HADOOP-13664 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.8.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Fix For: 3.1.0 > > > {{S3AInputStream}} has some retry logic to handle failures on a read: log and > retry. We should move this over to a (possibly hard coded RetryPolicy with > some sleep logic, so that longer-than-just-transient read failures can be > handled. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14971) Merge S3A committers into trunk
[ https://issues.apache.org/jira/browse/HADOOP-14971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-14971: Resolution: Duplicate Fix Version/s: 3.1.0 Status: Resolved (was: Patch Available) > Merge S3A committers into trunk > --- > > Key: HADOOP-14971 > URL: https://issues.apache.org/jira/browse/HADOOP-14971 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.0.0 >Reporter: Steve Loughran >Assignee: Steve Loughran > Fix For: 3.1.0 > > Attachments: HADOOP-13786-040.patch, HADOOP-13786-041.patch > > > Merge the HADOOP-13786 committer into trunk. This branch is being set up as a > github PR for review there & to keep it out the mailboxes of the watchers on > the main JIRA -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15003) Merge S3A committers into trunk: Yetus patch checker
[ https://issues.apache.org/jira/browse/HADOOP-15003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-15003: Resolution: Duplicate Fix Version/s: 3.1.0 Status: Resolved (was: Patch Available) thanks, committed under the main JIRA, closing this as a duplicate. > Merge S3A committers into trunk: Yetus patch checker > > > Key: HADOOP-15003 > URL: https://issues.apache.org/jira/browse/HADOOP-15003 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.0.0 >Reporter: Steve Loughran >Assignee: Steve Loughran > Fix For: 3.1.0 > > Attachments: HADOOP-13786-041.patch, HADOOP-13786-042.patch, > HADOOP-13786-043.patch, HADOOP-13786-044.patch, HADOOP-13786-045.patch, > HADOOP-13786-046.patch, HADOOP-13786-047.patch, HADOOP-13786-048.patch, > HADOOP-13786-049.patch, HADOOP-13786-050.patch, HADOOP-13786-051.patch, > HADOOP-13786-052.patch, HADOOP-13786-053.patch, HADOOP-15033-testfix-1.diff > > > This is a Yetus only JIRA created to have Yetus review the > HADOOP-13786/HADOOP-14971 patch as a .patch file, as the review PR > [https://github.com/apache/hadoop/pull/282] is stopping this happening in > HADOOP-14971. > Reviews should go into the PR/other task -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13786) Add S3A committer for zero-rename commits to S3 endpoints
[ https://issues.apache.org/jira/browse/HADOOP-13786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-13786: Resolution: Fixed Fix Version/s: 3.1.0 Status: Resolved (was: Patch Available) This now committed! Thank you all for your support, insight, testing, reviews! Special mention of : Sanjay Radia, Ryan Blue, Ewan Higgs, Mingliang Liu and extra especially Aaron Fabbri! Not only does this patch add the committer, it adds (configurable) retry policy to every single call s3a makes of the AWS s3 SDK, with the inconsistent s3 client now configurable to simulate throttling events. Everyone gets to see how their code handles the presence of transient throttle failures. Finally, I now know more about Hadoop & Spark commit protocols than I never knew I needed to, as well as all those nuances of S3 which matter for that. I'll have to make more use of that knowledge, somehow. > Add S3A committer for zero-rename commits to S3 endpoints > - > > Key: HADOOP-13786 > URL: https://issues.apache.org/jira/browse/HADOOP-13786 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/s3 >Affects Versions: 3.0.0-beta1 >Reporter: Steve Loughran >Assignee: Steve Loughran > Fix For: 3.1.0 > > Attachments: HADOOP-13786-036.patch, HADOOP-13786-037.patch, > HADOOP-13786-038.patch, HADOOP-13786-039.patch, > HADOOP-13786-HADOOP-13345-001.patch, HADOOP-13786-HADOOP-13345-002.patch, > HADOOP-13786-HADOOP-13345-003.patch, HADOOP-13786-HADOOP-13345-004.patch, > HADOOP-13786-HADOOP-13345-005.patch, HADOOP-13786-HADOOP-13345-006.patch, > HADOOP-13786-HADOOP-13345-006.patch, HADOOP-13786-HADOOP-13345-007.patch, > HADOOP-13786-HADOOP-13345-009.patch, HADOOP-13786-HADOOP-13345-010.patch, > HADOOP-13786-HADOOP-13345-011.patch, HADOOP-13786-HADOOP-13345-012.patch, > HADOOP-13786-HADOOP-13345-013.patch, HADOOP-13786-HADOOP-13345-015.patch, > HADOOP-13786-HADOOP-13345-016.patch, HADOOP-13786-HADOOP-13345-017.patch, > HADOOP-13786-HADOOP-13345-018.patch, HADOOP-13786-HADOOP-13345-019.patch, > HADOOP-13786-HADOOP-13345-020.patch, HADOOP-13786-HADOOP-13345-021.patch, > HADOOP-13786-HADOOP-13345-022.patch, HADOOP-13786-HADOOP-13345-023.patch, > HADOOP-13786-HADOOP-13345-024.patch, HADOOP-13786-HADOOP-13345-025.patch, > HADOOP-13786-HADOOP-13345-026.patch, HADOOP-13786-HADOOP-13345-027.patch, > HADOOP-13786-HADOOP-13345-028.patch, HADOOP-13786-HADOOP-13345-028.patch, > HADOOP-13786-HADOOP-13345-029.patch, HADOOP-13786-HADOOP-13345-030.patch, > HADOOP-13786-HADOOP-13345-031.patch, HADOOP-13786-HADOOP-13345-032.patch, > HADOOP-13786-HADOOP-13345-033.patch, HADOOP-13786-HADOOP-13345-035.patch, > MAPREDUCE-6823-003.patch, cloud-intergration-test-failure.log, > objectstore.pdf, s3committer-master.zip > > > A goal of this code is "support O(1) commits to S3 repositories in the > presence of failures". Implement it, including whatever is needed to > demonstrate the correctness of the algorithm. (that is, assuming that s3guard > provides a consistent view of the presence/absence of blobs, show that we can > commit directly). > I consider ourselves free to expose the blobstore-ness of the s3 output > streams (ie. not visible until the close()), if we need to use that to allow > us to abort commit operations. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14600) LocatedFileStatus constructor forces RawLocalFS to exec a process to get the permissions
[ https://issues.apache.org/jira/browse/HADOOP-14600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated HADOOP-14600: --- Attachment: HADOOP-14600.009.patch Attaching identical patch, to retry Jenkins... > LocatedFileStatus constructor forces RawLocalFS to exec a process to get the > permissions > > > Key: HADOOP-14600 > URL: https://issues.apache.org/jira/browse/HADOOP-14600 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.7.3 > Environment: file:// in a dir with many files >Reporter: Steve Loughran >Assignee: Ping Liu > Attachments: HADOOP-14600.001.patch, HADOOP-14600.002.patch, > HADOOP-14600.003.patch, HADOOP-14600.004.patch, HADOOP-14600.005.patch, > HADOOP-14600.006.patch, HADOOP-14600.007.patch, HADOOP-14600.008.patch, > HADOOP-14600.009.patch, TestRawLocalFileSystemContract.java > > > Reported in SPARK-21137. a {{FileSystem.listStatus}} call really craws > against the local FS, because {{FileStatus.getPemissions}} call forces > {{DeprecatedRawLocalFileStatus}} tp spawn a process to read the real UGI > values. > That is: for every other FS, what's a field lookup or even a no-op, on the > local FS it's a process exec/spawn, with all the costs. This gets expensive > if you have many files. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15065) Make mapreduce specific GenericOptionsParser arguments optional
Elek, Marton created HADOOP-15065: - Summary: Make mapreduce specific GenericOptionsParser arguments optional Key: HADOOP-15065 URL: https://issues.apache.org/jira/browse/HADOOP-15065 Project: Hadoop Common Issue Type: Improvement Reporter: Elek, Marton Priority: Minor org.apache.hadoop.util.GenericOptionsParser is widely used to use common arguments in all the command line applications. Some of the common arguments are really generic: {code} -D
[jira] [Commented] (HADOOP-15059) 3.0 deployment cannot work with old version MR tar ball which break rolling upgrade
[ https://issues.apache.org/jira/browse/HADOOP-15059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16262809#comment-16262809 ] Jason Lowe commented on HADOOP-15059: - bq. Are we going to keep binary compatibility across hadoop-2.x and hadoop-3.x? Wire compatibility between 2.x clients and 3.x servers is a prerequisite to supporting a rolling upgrade from 2.x to 3.x, but I do not think everyone realizes wire compatibility between a 3.x client and a 2.x server is also very important to many of our users. There are many cases where more than one cluster is involved in a workflow. Requiring that all clusters upgrade from 2.x to 3.x simultaneously is a huge hurdle for adoption, and most users will upgrade them one at a time. As individual clusters upgrade there will be clients/jobs on a newly upgraded 3.x cluster trying to interact with an older 2.x cluster. Back to the issue of launching jobs using an incompatible token format -- here's a couple of options we could consider: 1) YARN nodemanager writes out *two* token credential files, the original 2.x file for backwards compatibility and a new 3.x file. The 3.x UGI code looks for the new file and falls back to the old one if it cannot find it. The 2.x code will simply load the old format from the original filename as it does today. 2) Application submission context contains information on which version of credentials to use for an application. This gets transferred to the container launch context for each container, and the nodemanager writes out the appropriate credentials version based on what was specified in the container launch context. In other words, the nodemanager knows which version of the credentials format the container is expecting to find and writes the token file in that format. > 3.0 deployment cannot work with old version MR tar ball which break rolling > upgrade > --- > > Key: HADOOP-15059 > URL: https://issues.apache.org/jira/browse/HADOOP-15059 > Project: Hadoop Common > Issue Type: Bug > Components: security >Reporter: Junping Du >Priority: Blocker > > I tried to deploy 3.0 cluster with 2.9 MR tar ball. The MR job is failed > because following error: > {noformat} > 2017-11-21 12:42:50,911 INFO [main] > org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for > application appattempt_1511295641738_0003_01 > 2017-11-21 12:42:51,070 WARN [main] org.apache.hadoop.util.NativeCodeLoader: > Unable to load native-hadoop library for your platform... using builtin-java > classes where applicable > 2017-11-21 12:42:51,118 FATAL [main] > org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster > java.lang.RuntimeException: Unable to determine current user > at > org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:254) > at > org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:220) > at > org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:212) > at > org.apache.hadoop.conf.Configuration.addResource(Configuration.java:888) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1638) > Caused by: java.io.IOException: Exception reading > /tmp/nm-local-dir/usercache/jdu/appcache/application_1511295641738_0003/container_e03_1511295641738_0003_01_01/container_tokens > at > org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:208) > at > org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:907) > at > org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:820) > at > org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:689) > at > org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:252) > ... 4 more > Caused by: java.io.IOException: Unknown version 1 in token storage. > at > org.apache.hadoop.security.Credentials.readTokenStorageStream(Credentials.java:226) > at > org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:205) > ... 8 more > 2017-11-21 12:42:51,122 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting > with status 1: java.lang.RuntimeException: Unable to determine current user > {noformat} > I think it is due to token incompatiblity change between 2.9 and 3.0. As we > claim "rolling upgrade" is supported in Hadoop 3, we should fix this before > we ship 3.0 otherwise all MR running applications will get stuck during/after > upgrade. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HADOOP-15054) upgrade hadoop dependency on commons-codec to 1.11
[ https://issues.apache.org/jira/browse/HADOOP-15054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16262801#comment-16262801 ] Wei-Chiu Chuang commented on HADOOP-15054: -- +1 will commit after Thanksgiving. > upgrade hadoop dependency on commons-codec to 1.11 > -- > > Key: HADOOP-15054 > URL: https://issues.apache.org/jira/browse/HADOOP-15054 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.0.0-beta1 >Reporter: PJ Fanning >Assignee: Bharat Viswanadham > Attachments: HADOOP-15054.00.patch > > > https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-auth/3.0.0-beta1 > retains the dependency on an old commons-codec version (1.4). > And hadoop-common. > Would it be possible to consider an upgrade to 1.11? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15033) Use java.util.zip.CRC32C for Java 9 and above
[ https://issues.apache.org/jira/browse/HADOOP-15033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16262782#comment-16262782 ] Dmitry Chuyko commented on HADOOP-15033: So attached HADOOP-15033.004.patch which is a copy of https://patch-diff.githubusercontent.com/raw/apache/hadoop/pull/291.patch passes pre-commit QA checks and shows ~4x improvement in benchmarks. Could someone please review it? > Use java.util.zip.CRC32C for Java 9 and above > - > > Key: HADOOP-15033 > URL: https://issues.apache.org/jira/browse/HADOOP-15033 > Project: Hadoop Common > Issue Type: Improvement > Components: performance, util >Affects Versions: 3.0.0 >Reporter: Dmitry Chuyko > Attachments: HADOOP-15033.001.patch, HADOOP-15033.001.patch, > HADOOP-15033.002.patch, HADOOP-15033.003.patch, HADOOP-15033.003.patch, > HADOOP-15033.004.patch > > > java.util.zip.CRC32C implementation is available since Java 9. > https://docs.oracle.com/javase/9/docs/api/java/util/zip/CRC32C.html > Platform specific assembler intrinsics make it more effective than any pure > Java implementation. > Hadoop is compiled against Java 8 but class constructor may be accessible > with method handle on 9 to instances implementing Checksum in runtime. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15039) move SemaphoredDelegatingExecutor to hadoop-common
[ https://issues.apache.org/jira/browse/HADOOP-15039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16262756#comment-16262756 ] Genmao Yu commented on HADOOP-15039: [~ste...@apache.org] take a look please. > move SemaphoredDelegatingExecutor to hadoop-common > -- > > Key: HADOOP-15039 > URL: https://issues.apache.org/jira/browse/HADOOP-15039 > Project: Hadoop Common > Issue Type: Improvement > Components: fs, fs/oss, fs/s3 >Affects Versions: 3.0.0-beta1 >Reporter: Genmao Yu >Assignee: Genmao Yu >Priority: Minor > Attachments: HADOOP-15039.001.patch, HADOOP-15039.002.patch, > HADOOP-15039.003.patch > > > Detailed discussions in HADOOP-14999 and HADOOP-15027. > share {{SemaphoredDelegatingExecutor}} and move it to {{hadoop-common}}. > cc [~ste...@apache.org] -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13786) Add S3A committer for zero-rename commits to S3 endpoints
[ https://issues.apache.org/jira/browse/HADOOP-13786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-13786: Summary: Add S3A committer for zero-rename commits to S3 endpoints (was: Add S3Guard committer for zero-rename commits to S3 endpoints) > Add S3A committer for zero-rename commits to S3 endpoints > - > > Key: HADOOP-13786 > URL: https://issues.apache.org/jira/browse/HADOOP-13786 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/s3 >Affects Versions: 3.0.0-beta1 >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: HADOOP-13786-036.patch, HADOOP-13786-037.patch, > HADOOP-13786-038.patch, HADOOP-13786-039.patch, > HADOOP-13786-HADOOP-13345-001.patch, HADOOP-13786-HADOOP-13345-002.patch, > HADOOP-13786-HADOOP-13345-003.patch, HADOOP-13786-HADOOP-13345-004.patch, > HADOOP-13786-HADOOP-13345-005.patch, HADOOP-13786-HADOOP-13345-006.patch, > HADOOP-13786-HADOOP-13345-006.patch, HADOOP-13786-HADOOP-13345-007.patch, > HADOOP-13786-HADOOP-13345-009.patch, HADOOP-13786-HADOOP-13345-010.patch, > HADOOP-13786-HADOOP-13345-011.patch, HADOOP-13786-HADOOP-13345-012.patch, > HADOOP-13786-HADOOP-13345-013.patch, HADOOP-13786-HADOOP-13345-015.patch, > HADOOP-13786-HADOOP-13345-016.patch, HADOOP-13786-HADOOP-13345-017.patch, > HADOOP-13786-HADOOP-13345-018.patch, HADOOP-13786-HADOOP-13345-019.patch, > HADOOP-13786-HADOOP-13345-020.patch, HADOOP-13786-HADOOP-13345-021.patch, > HADOOP-13786-HADOOP-13345-022.patch, HADOOP-13786-HADOOP-13345-023.patch, > HADOOP-13786-HADOOP-13345-024.patch, HADOOP-13786-HADOOP-13345-025.patch, > HADOOP-13786-HADOOP-13345-026.patch, HADOOP-13786-HADOOP-13345-027.patch, > HADOOP-13786-HADOOP-13345-028.patch, HADOOP-13786-HADOOP-13345-028.patch, > HADOOP-13786-HADOOP-13345-029.patch, HADOOP-13786-HADOOP-13345-030.patch, > HADOOP-13786-HADOOP-13345-031.patch, HADOOP-13786-HADOOP-13345-032.patch, > HADOOP-13786-HADOOP-13345-033.patch, HADOOP-13786-HADOOP-13345-035.patch, > MAPREDUCE-6823-003.patch, cloud-intergration-test-failure.log, > objectstore.pdf, s3committer-master.zip > > > A goal of this code is "support O(1) commits to S3 repositories in the > presence of failures". Implement it, including whatever is needed to > demonstrate the correctness of the algorithm. (that is, assuming that s3guard > provides a consistent view of the presence/absence of blobs, show that we can > commit directly). > I consider ourselves free to expose the blobstore-ness of the s3 output > streams (ie. not visible until the close()), if we need to use that to allow > us to abort commit operations. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14898) Create official Docker images for development and testing features
[ https://issues.apache.org/jira/browse/HADOOP-14898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elek, Marton updated HADOOP-14898: -- Attachment: HADOOP-14898.003.tgz Third version of the base image. It includes the support of Ozone SCM creation (can be turned on with env variable). > Create official Docker images for development and testing features > --- > > Key: HADOOP-14898 > URL: https://issues.apache.org/jira/browse/HADOOP-14898 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Elek, Marton >Assignee: Elek, Marton > Attachments: HADOOP-14898.001.tar.gz, HADOOP-14898.002.tar.gz, > HADOOP-14898.003.tgz > > > This is the original mail from the mailing list: > {code} > TL;DR: I propose to create official hadoop images and upload them to the > dockerhub. > GOAL/SCOPE: I would like improve the existing documentation with easy-to-use > docker based recipes to start hadoop clusters with various configuration. > The images also could be used to test experimental features. For example > ozone could be tested easily with these compose file and configuration: > https://gist.github.com/elek/1676a97b98f4ba561c9f51fce2ab2ea6 > Or even the configuration could be included in the compose file: > https://github.com/elek/hadoop/blob/docker-2.8.0/example/docker-compose.yaml > I would like to create separated example compose files for federation, ha, > metrics usage, etc. to make it easier to try out and understand the features. > CONTEXT: There is an existing Jira > https://issues.apache.org/jira/browse/HADOOP-13397 > But it’s about a tool to generate production quality docker images (multiple > types, in a flexible way). If no objections, I will create a separated issue > to create simplified docker images for rapid prototyping and investigating > new features. And register the branch to the dockerhub to create the images > automatically. > MY BACKGROUND: I am working with docker based hadoop/spark clusters quite a > while and run them succesfully in different environments (kubernetes, > docker-swarm, nomad-based scheduling, etc.) My work is available from here: > https://github.com/flokkr but they could handle more complex use cases (eg. > instrumenting java processes with btrace, or read/reload configuration from > consul). > And IMHO in the official hadoop documentation it’s better to suggest to use > official apache docker images and not external ones (which could be changed). > {code} > The next list will enumerate the key decision points regarding to docker > image creating > A. automated dockerhub build / jenkins build > Docker images could be built on the dockerhub (a branch pattern should be > defined for a github repository and the location of the Docker files) or > could be built on a CI server and pushed. > The second one is more flexible (it's more easy to create matrix build, for > example) > The first one had the advantage that we can get an additional flag on the > dockerhub that the build is automated (and built from the source by the > dockerhub). > The decision is easy as ASF supports the first approach: (see > https://issues.apache.org/jira/browse/INFRA-12781?focusedCommentId=15824096=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15824096) > B. source: binary distribution or source build > The second question is about creating the docker image. One option is to > build the software on the fly during the creation of the docker image the > other one is to use the binary releases. > I suggest to use the second approach as: > 1. In that case the hadoop:2.7.3 could contain exactly the same hadoop > distrubution as the downloadable one > 2. We don't need to add development tools to the image, the image could be > more smaller (which is important as the goal for this image to getting > started as fast as possible) > 3. The docker definition will be more simple (and more easy to maintain) > Usually this approach is used in other projects (I checked Apache Zeppelin > and Apache Nutch) > C. branch usage > Other question is the location of the Docker file. It could be on the > official source-code branches (branch-2, trunk, etc.) or we can create > separated branches for the dockerhub (eg. docker/2.7 docker/2.8 docker/3.0) > For the first approach it's easier to find the docker images, but it's less > flexible. For example if we had a Dockerfile for on the source code it should > be used for every release (for example the Docker file from the tag > release-3.0.0 should be used for the 3.0 hadoop docker image). In that case > the release process is much more harder: in case of a Dockerfile error (which > could be test on dockerhub only after the taging), a new release should be > added after fixing the Dockerfile. >
[jira] [Updated] (HADOOP-15063) IOException may be thrown when read from Aliyun OSS in some case
[ https://issues.apache.org/jira/browse/HADOOP-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu updated HADOOP-15063: - Resolution: Duplicate Status: Resolved (was: Patch Available) > IOException may be thrown when read from Aliyun OSS in some case > > > Key: HADOOP-15063 > URL: https://issues.apache.org/jira/browse/HADOOP-15063 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0-alpha2, 3.0.0-beta1 >Reporter: wujinhu >Assignee: wujinhu > Attachments: HADOOP-15063.001.patch > > > IOException will be thrown in this case > 1. set part size = n(102400) > 2. assume current position = 0, then partRemaining = 102400 > 3. we call seek(pos = 101802), with pos > position && pos < position + > partRemaining, so it will skip pos - position bytes, but partRemaining > remains the same > 4. if we read bytes more than n - pos, it will throw IOException. > Current code: > {code:java} > @Override > public synchronized void seek(long pos) throws IOException { > checkNotClosed(); > if (position == pos) { > return; > } else if (pos > position && pos < position + partRemaining) { > AliyunOSSUtils.skipFully(wrappedStream, pos - position); > // we need update partRemaining here > position = pos; > } else { > reopen(pos); > } > } > {code} > Logs: > java.io.IOException: Failed to read from stream. Remaining:101802 > at > org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182) > at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75) > at > org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92) > How to re-produce: > 1. create a file with 10MB size > 2. > {code:java} > int seekTimes = 150; > for (int i = 0; i < seekTimes; i++) { > long pos = size / (seekTimes - i) - 1; > LOG.info("begin seeking for pos: " + pos); > byte []buf = new byte[1024]; > instream.read(pos, buf, 0, 1024); > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15063) IOException may be thrown when read from Aliyun OSS in some case
[ https://issues.apache.org/jira/browse/HADOOP-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16262396#comment-16262396 ] wujinhu commented on HADOOP-15063: -- Thanks for the review. I found it is the same with https://issues.apache.org/jira/browse/HADOOP-14072 I will close this. > IOException may be thrown when read from Aliyun OSS in some case > > > Key: HADOOP-15063 > URL: https://issues.apache.org/jira/browse/HADOOP-15063 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0-alpha2, 3.0.0-beta1 >Reporter: wujinhu >Assignee: wujinhu > Attachments: HADOOP-15063.001.patch > > > IOException will be thrown in this case > 1. set part size = n(102400) > 2. assume current position = 0, then partRemaining = 102400 > 3. we call seek(pos = 101802), with pos > position && pos < position + > partRemaining, so it will skip pos - position bytes, but partRemaining > remains the same > 4. if we read bytes more than n - pos, it will throw IOException. > Current code: > {code:java} > @Override > public synchronized void seek(long pos) throws IOException { > checkNotClosed(); > if (position == pos) { > return; > } else if (pos > position && pos < position + partRemaining) { > AliyunOSSUtils.skipFully(wrappedStream, pos - position); > // we need update partRemaining here > position = pos; > } else { > reopen(pos); > } > } > {code} > Logs: > java.io.IOException: Failed to read from stream. Remaining:101802 > at > org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182) > at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75) > at > org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92) > How to re-produce: > 1. create a file with 10MB size > 2. > {code:java} > int seekTimes = 150; > for (int i = 0; i < seekTimes; i++) { > long pos = size / (seekTimes - i) - 1; > LOG.info("begin seeking for pos: " + pos); > byte []buf = new byte[1024]; > instream.read(pos, buf, 0, 1024); > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15063) IOException may be thrown when read from Aliyun OSS in some case
[ https://issues.apache.org/jira/browse/HADOOP-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16262356#comment-16262356 ] Steve Loughran commented on HADOOP-15063: - + [~uncleGen] + [~drankye] I'll leave it to the OSS experts to review the production code; but the argument makes sense, and the patch appears to fix it. But I don't know the code well enough to be the reviewer there —let's see what the others say Test-wise: which endpoint did you run the full module test suite against? test code comments: * use try-with-resrouces to autoamtcially close the input stream, even on an assert failure * use assertEquals(56, bytesRead) for an automatic message if the check fails * if the store is eventually consistent, use a different filename for the different test. This guarantees that you don't accidentally get the file from a previous test case. * minor layout change: and use {{bye[] buf}} as the layout for declaring the variable (i.e. ) > IOException may be thrown when read from Aliyun OSS in some case > > > Key: HADOOP-15063 > URL: https://issues.apache.org/jira/browse/HADOOP-15063 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0-alpha2, 3.0.0-beta1 >Reporter: wujinhu >Assignee: wujinhu > Attachments: HADOOP-15063.001.patch > > > IOException will be thrown in this case > 1. set part size = n(102400) > 2. assume current position = 0, then partRemaining = 102400 > 3. we call seek(pos = 101802), with pos > position && pos < position + > partRemaining, so it will skip pos - position bytes, but partRemaining > remains the same > 4. if we read bytes more than n - pos, it will throw IOException. > Current code: > {code:java} > @Override > public synchronized void seek(long pos) throws IOException { > checkNotClosed(); > if (position == pos) { > return; > } else if (pos > position && pos < position + partRemaining) { > AliyunOSSUtils.skipFully(wrappedStream, pos - position); > // we need update partRemaining here > position = pos; > } else { > reopen(pos); > } > } > {code} > Logs: > java.io.IOException: Failed to read from stream. Remaining:101802 > at > org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182) > at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75) > at > org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92) > How to re-produce: > 1. create a file with 10MB size > 2. > {code:java} > int seekTimes = 150; > for (int i = 0; i < seekTimes; i++) { > long pos = size / (seekTimes - i) - 1; > LOG.info("begin seeking for pos: " + pos); > byte []buf = new byte[1024]; > instream.read(pos, buf, 0, 1024); > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15063) IOException may be thrown when read from Aliyun OSS in some case
[ https://issues.apache.org/jira/browse/HADOOP-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-15063: Issue Type: Sub-task (was: Bug) Parent: HADOOP-13377 > IOException may be thrown when read from Aliyun OSS in some case > > > Key: HADOOP-15063 > URL: https://issues.apache.org/jira/browse/HADOOP-15063 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/oss >Affects Versions: 3.0.0-alpha2, 3.0.0-beta1 >Reporter: wujinhu >Assignee: wujinhu > Attachments: HADOOP-15063.001.patch > > > IOException will be thrown in this case > 1. set part size = n(102400) > 2. assume current position = 0, then partRemaining = 102400 > 3. we call seek(pos = 101802), with pos > position && pos < position + > partRemaining, so it will skip pos - position bytes, but partRemaining > remains the same > 4. if we read bytes more than n - pos, it will throw IOException. > Current code: > {code:java} > @Override > public synchronized void seek(long pos) throws IOException { > checkNotClosed(); > if (position == pos) { > return; > } else if (pos > position && pos < position + partRemaining) { > AliyunOSSUtils.skipFully(wrappedStream, pos - position); > // we need update partRemaining here > position = pos; > } else { > reopen(pos); > } > } > {code} > Logs: > java.io.IOException: Failed to read from stream. Remaining:101802 > at > org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182) > at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75) > at > org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92) > How to re-produce: > 1. create a file with 10MB size > 2. > {code:java} > int seekTimes = 150; > for (int i = 0; i < seekTimes; i++) { > long pos = size / (seekTimes - i) - 1; > LOG.info("begin seeking for pos: " + pos); > byte []buf = new byte[1024]; > instream.read(pos, buf, 0, 1024); > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15053) new Server(out).write call delay occurs
[ https://issues.apache.org/jira/browse/HADOOP-15053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suganya updated HADOOP-15053: - Priority: Major (was: Critical) Description: In createBlockOutputStream, Block details write call takes more time to get acknowledgement. new Sender(out).writeBlock(this.block, . pipeline has three data nodes. was: Hadoop datastream thread runs after 80th packet (5mb). till then datastreamer thread was waiting. Hadoop file write takes more time for the first 5mb data write process. *Thread waits here - code* while (((!this.streamerClosed) && (!this.hasError) && (DFSOutputStream.this.dfsClient.clientRunning) && (DFSOutputStream.this.dataQueue.size() == 0) && ((this.stage != BlockConstructionStage.DATA_STREAMING) || ((this.stage == BlockConstructionStage.DATA_STREAMING) && (now - lastPacket < DFSOutputStream.this.dfsClient.getConf().socketTimeout / 2 || (doSleep)) { long timeout = DFSOutputStream.this.dfsClient.getConf().socketTimeout / 2 - (now - lastPacket); timeout = timeout <= 0L ? 1000L : timeout; timeout = this.stage == BlockConstructionStage.DATA_STREAMING ? timeout : 1000L; *Thread dump:* Thread-9" #32 daemon prio=5 os_prio=0 tid=0x7fcb79401800 nid=0x2c1b in Object.wait() [0x7fcb2c7a2000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:503) - locked <0x0006c6f95fd0> (a java.util.LinkedList) *Debug logs:* - here DataStreamer for seq no 0 started after adding 80th packet in queue 1646 [main] DEBUG org.apache.hadoop.hdfs.DFSClient - Queued packet 80 1646 [main] DEBUG org.apache.hadoop.hdfs.DFSClient - computePacketChunkSize: src=/1/test/file4.txt, chunkSize=516, chunksPerPacket=127, packetSize=65532 1646 [main] DEBUG org.apache.hadoop.hdfs.DFSClient - DFSClient writeChunk allocating new packet seqno=81, src=/1/test/file4.txt, packetSize=65532, chunksPerPacket=127, bytesCurBlock=5266944 1646 [main] DEBUG org.apache.hadoop.hdfs.DFSClient - DFSClient writeChunk packet full seqno=81, src=/1/test/file4.txt, bytesCurBlock=5331968, blockSize=134217728, appendChunk=false 2022 [Thread-9] DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine - Call: addBlock took 34ms 2022 [Thread-9] DEBUG org.apache.hadoop.hdfs.DFSClient - pipeline = 172.20.19.76:50010 2022 [Thread-9] DEBUG org.apache.hadoop.hdfs.DFSClient - pipeline = 172.20.9.13:50010 2022 [Thread-9] DEBUG org.apache.hadoop.hdfs.DFSClient - pipeline = 172.20.19.70:50010 2022 [Thread-9] DEBUG org.apache.hadoop.hdfs.DFSClient - Connecting to datanode 172.20.19.76:50010 2048 [Thread-9] DEBUG org.apache.hadoop.hdfs.DFSClient - Send buf size 131072 2090 [DataStreamer for file /1/test/file4.txt block BP-2107533656-172.20.14.104-1483595560691:blk_1074141603_401350] DEBUG org.apache.hadoop.hdfs.DFSClient - DataStreamer block BP-2107533656-172.20.14.104-1483595560691:blk_1074141603_401350 sending packet packet seqno:0 offsetInBlock:0 lastPacketInBlock:false lastByteOffsetInBlock: 65024 2091 [DataStreamer for file /1/test/file4.txt block BP-2107533656-172.20.14.104-1483595560691:blk_1074141603_401350] DEBUG org.apache.hadoop.hdfs.DFSClient - DataStreamer block BP-2107533656-172.20.14.104-1483595560691:blk_1074141603_401350 sending packet packet seqno:1 offsetInBlock:65024 lastPacketInBlock:false lastByteOffsetInBlock: 130048 2091 [DataStreamer for file /1/test/file4.txt block BP-2107533656-172.20.14.104-1483595560691:blk_1074141603_401350] DEBUG org.apache.hadoop.hdfs.DFSClient - DataStreamer block BP-2107533656-172.20.14.104-1483595560691:blk_1074141603_401350 sending packet packet seqno:2 offsetInBlock:130048 lastPacketInBlock:false lastByteOffsetInBlock: 195072 2233 [DataStreamer for file /1/test/file4.txt block BP-2107533656-172.20.14.104-1483595560691:blk_1074141603_401350] DEBUG org.apache.hadoop.hdfs.DFSClient - DataStreamer block BP-2107533656-172.20.14.104-1483595560691:blk_1074141603_401350 sending packet packet seqno:3 offsetInBlock:195072 lastPacketInBlock:false lastByteOffsetInBlock: 260096 2333 [ResponseProcessor for block BP-2107533656-172.20.14.104-1483595560691:blk_1074141603_401350] DEBUG org.apache.hadoop.hdfs.DFSClient - DFSClient seqno: 0 status: SUCCESS status: SUCCESS status: SUCCESS downstreamAckTimeNanos: 6015384 2333 [main] DEBUG org.apache.hadoop.hdfs.DFSClient - Queued packet 81 2334 [main] DEBUG org.apache.hadoop.hdfs.DFSClient - computePacketChunkSize: src=/1/test/file4.txt, chunkSize=516, chunksPerPacket=127, packetSize=65532 2334 [main] DEBUG org.apache.hadoop.hdfs.DFSClient - DFSClient writeChunk allocating new packet seqno=82, src=/1/test/file4.txt, packetSize=65532, chunksPerPacket=127, bytesCurBlock=5331968 Summary:
[jira] [Commented] (HADOOP-15063) IOException may be thrown when read from Aliyun OSS in some case
[ https://issues.apache.org/jira/browse/HADOOP-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16262240#comment-16262240 ] Hadoop QA commented on HADOOP-15063: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 4s{color} | {color:red} HADOOP-15063 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HADOOP-15063 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12898815/HADOOP-15063.001.patch | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/13736/console | | Powered by | Apache Yetus 0.7.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > IOException may be thrown when read from Aliyun OSS in some case > > > Key: HADOOP-15063 > URL: https://issues.apache.org/jira/browse/HADOOP-15063 > Project: Hadoop Common > Issue Type: Bug > Components: fs/oss >Affects Versions: 3.0.0-alpha2, 3.0.0-beta1 >Reporter: wujinhu >Assignee: wujinhu > Attachments: HADOOP-15063.001.patch > > > IOException will be thrown in this case > 1. set part size = n(102400) > 2. assume current position = 0, then partRemaining = 102400 > 3. we call seek(pos = 101802), with pos > position && pos < position + > partRemaining, so it will skip pos - position bytes, but partRemaining > remains the same > 4. if we read bytes more than n - pos, it will throw IOException. > Current code: > {code:java} > @Override > public synchronized void seek(long pos) throws IOException { > checkNotClosed(); > if (position == pos) { > return; > } else if (pos > position && pos < position + partRemaining) { > AliyunOSSUtils.skipFully(wrappedStream, pos - position); > // we need update partRemaining here > position = pos; > } else { > reopen(pos); > } > } > {code} > Logs: > java.io.IOException: Failed to read from stream. Remaining:101802 > at > org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182) > at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75) > at > org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92) > How to re-produce: > 1. create a file with 10MB size > 2. > {code:java} > int seekTimes = 150; > for (int i = 0; i < seekTimes; i++) { > long pos = size / (seekTimes - i) - 1; > LOG.info("begin seeking for pos: " + pos); > byte []buf = new byte[1024]; > instream.read(pos, buf, 0, 1024); > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15064) hadoop-common 3.0.0-beta1 exposes a dependency on slf4j-log4j12
[ https://issues.apache.org/jira/browse/HADOOP-15064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] PJ Fanning updated HADOOP-15064: Description: https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common/3.0.0-beta1 https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-auth/3.0.0-beta1 One of the ideas of SLF4J is that you should depend on the API jar and it is up to users of your lib to add a dependency to their preferred SLF4J implementation. You can only have one such implementation jar on your classpath. If the hadoop build uses log4j in its tests, then this can be made a test dependency and not a general compile or runtime dependency. was: https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common/3.0.0-beta1 One of the ideas of SLF4J is that you should depend on the API jar and it is up to users of your lib to add a dependency to their preferred SLF4J implementation. You can only have one such implementation jar on your classpath. If the hadoop build uses log4j in its tests, then this can be made a test dependency and not a general compile or runtime dependency. > hadoop-common 3.0.0-beta1 exposes a dependency on slf4j-log4j12 > --- > > Key: HADOOP-15064 > URL: https://issues.apache.org/jira/browse/HADOOP-15064 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.0.0-beta1 >Reporter: PJ Fanning > > https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common/3.0.0-beta1 > https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-auth/3.0.0-beta1 > One of the ideas of SLF4J is that you should depend on the API jar and it is > up to users of your lib to add a dependency to their preferred SLF4J > implementation. You can only have one such implementation jar on your > classpath. > If the hadoop build uses log4j in its tests, then this can be made a test > dependency and not a general compile or runtime dependency. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15064) hadoop-common and hadoop-auth 3.0.0-beta1 expose a dependency on slf4j-log4j12
[ https://issues.apache.org/jira/browse/HADOOP-15064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] PJ Fanning updated HADOOP-15064: Summary: hadoop-common and hadoop-auth 3.0.0-beta1 expose a dependency on slf4j-log4j12 (was: hadoop-common 3.0.0-beta1 exposes a dependency on slf4j-log4j12) > hadoop-common and hadoop-auth 3.0.0-beta1 expose a dependency on slf4j-log4j12 > -- > > Key: HADOOP-15064 > URL: https://issues.apache.org/jira/browse/HADOOP-15064 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.0.0-beta1 >Reporter: PJ Fanning > > https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common/3.0.0-beta1 > https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-auth/3.0.0-beta1 > One of the ideas of SLF4J is that you should depend on the API jar and it is > up to users of your lib to add a dependency to their preferred SLF4J > implementation. You can only have one such implementation jar on your > classpath. > If the hadoop build uses log4j in its tests, then this can be made a test > dependency and not a general compile or runtime dependency. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15064) hadoop-common 3.0.0-beta1 exposes a dependency on slf4j-log4j12
[ https://issues.apache.org/jira/browse/HADOOP-15064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] PJ Fanning updated HADOOP-15064: Affects Version/s: 3.0.0-beta1 > hadoop-common 3.0.0-beta1 exposes a dependency on slf4j-log4j12 > --- > > Key: HADOOP-15064 > URL: https://issues.apache.org/jira/browse/HADOOP-15064 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.0.0-beta1 >Reporter: PJ Fanning > > https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common/3.0.0-beta1 > One of the ideas of SLF4J is that you should depend on the API jar and it is > up to users of your lib to add a dependency to their preferred SLF4J > implementation. You can only have one such implementation jar on your > classpath. > If the hadoop build uses log4j in its tests, then this can be made a test > dependency and not a general compile or runtime dependency. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15064) hadoop-common 3.0.0-beta1 exposes a dependency on slf4j-log4j12
[ https://issues.apache.org/jira/browse/HADOOP-15064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] PJ Fanning updated HADOOP-15064: Environment: (was: https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common/3.0.0-beta1 One of the ideas of SLF4J is that you should depend on the API jar and it is up to users of your lib to add a dependency to their preferred SLF4J implementation. You can only have one such implementation jar on your classpath. If the hadoop build uses log4j in its tests, then this can be made a test dependency and not a general compile or runtime dependency.) > hadoop-common 3.0.0-beta1 exposes a dependency on slf4j-log4j12 > --- > > Key: HADOOP-15064 > URL: https://issues.apache.org/jira/browse/HADOOP-15064 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.0.0-beta1 >Reporter: PJ Fanning > > https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common/3.0.0-beta1 > One of the ideas of SLF4J is that you should depend on the API jar and it is > up to users of your lib to add a dependency to their preferred SLF4J > implementation. You can only have one such implementation jar on your > classpath. > If the hadoop build uses log4j in its tests, then this can be made a test > dependency and not a general compile or runtime dependency. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15064) hadoop-common 3.0.0-beta1 exposes a dependency on slf4j-log4j12
[ https://issues.apache.org/jira/browse/HADOOP-15064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] PJ Fanning updated HADOOP-15064: Description: https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common/3.0.0-beta1 One of the ideas of SLF4J is that you should depend on the API jar and it is up to users of your lib to add a dependency to their preferred SLF4J implementation. You can only have one such implementation jar on your classpath. If the hadoop build uses log4j in its tests, then this can be made a test dependency and not a general compile or runtime dependency. > hadoop-common 3.0.0-beta1 exposes a dependency on slf4j-log4j12 > --- > > Key: HADOOP-15064 > URL: https://issues.apache.org/jira/browse/HADOOP-15064 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.0.0-beta1 > Environment: > https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common/3.0.0-beta1 > One of the ideas of SLF4J is that you should depend on the API jar and it is > up to users of your lib to add a dependency to their preferred SLF4J > implementation. You can only have one such implementation jar on your > classpath. > If the hadoop build uses log4j in its tests, then this can be made a test > dependency and not a general compile or runtime dependency. >Reporter: PJ Fanning > > https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common/3.0.0-beta1 > One of the ideas of SLF4J is that you should depend on the API jar and it is > up to users of your lib to add a dependency to their preferred SLF4J > implementation. You can only have one such implementation jar on your > classpath. > If the hadoop build uses log4j in its tests, then this can be made a test > dependency and not a general compile or runtime dependency. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15064) hadoop-common 3.0.0-beta1 exposes a dependency on slf4j-log4j12
PJ Fanning created HADOOP-15064: --- Summary: hadoop-common 3.0.0-beta1 exposes a dependency on slf4j-log4j12 Key: HADOOP-15064 URL: https://issues.apache.org/jira/browse/HADOOP-15064 Project: Hadoop Common Issue Type: Bug Environment: https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common/3.0.0-beta1 One of the ideas of SLF4J is that you should depend on the API jar and it is up to users of your lib to add a dependency to their preferred SLF4J implementation. You can only have one such implementation jar on your classpath. If the hadoop build uses log4j in its tests, then this can be made a test dependency and not a general compile or runtime dependency. Reporter: PJ Fanning -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15063) IOException may be thrown when read from Aliyun OSS in some case
[ https://issues.apache.org/jira/browse/HADOOP-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu updated HADOOP-15063: - Status: Patch Available (was: In Progress) > IOException may be thrown when read from Aliyun OSS in some case > > > Key: HADOOP-15063 > URL: https://issues.apache.org/jira/browse/HADOOP-15063 > Project: Hadoop Common > Issue Type: Bug > Components: fs/oss >Affects Versions: 3.0.0-beta1, 3.0.0-alpha2 >Reporter: wujinhu >Assignee: wujinhu > Attachments: HADOOP-15063.001.patch > > > IOException will be thrown in this case > 1. set part size = n(102400) > 2. assume current position = 0, then partRemaining = 102400 > 3. we call seek(pos = 101802), with pos > position && pos < position + > partRemaining, so it will skip pos - position bytes, but partRemaining > remains the same > 4. if we read bytes more than n - pos, it will throw IOException. > Current code: > {code:java} > @Override > public synchronized void seek(long pos) throws IOException { > checkNotClosed(); > if (position == pos) { > return; > } else if (pos > position && pos < position + partRemaining) { > AliyunOSSUtils.skipFully(wrappedStream, pos - position); > // we need update partRemaining here > position = pos; > } else { > reopen(pos); > } > } > {code} > Logs: > java.io.IOException: Failed to read from stream. Remaining:101802 > at > org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182) > at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75) > at > org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92) > How to re-produce: > 1. create a file with 10MB size > 2. > {code:java} > int seekTimes = 150; > for (int i = 0; i < seekTimes; i++) { > long pos = size / (seekTimes - i) - 1; > LOG.info("begin seeking for pos: " + pos); > byte []buf = new byte[1024]; > instream.read(pos, buf, 0, 1024); > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Work started] (HADOOP-15063) IOException may be thrown when read from Aliyun OSS in some case
[ https://issues.apache.org/jira/browse/HADOOP-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HADOOP-15063 started by wujinhu. > IOException may be thrown when read from Aliyun OSS in some case > > > Key: HADOOP-15063 > URL: https://issues.apache.org/jira/browse/HADOOP-15063 > Project: Hadoop Common > Issue Type: Bug > Components: fs/oss >Affects Versions: 3.0.0-alpha2, 3.0.0-beta1 >Reporter: wujinhu >Assignee: wujinhu > Attachments: HADOOP-15063.001.patch > > > IOException will be thrown in this case > 1. set part size = n(102400) > 2. assume current position = 0, then partRemaining = 102400 > 3. we call seek(pos = 101802), with pos > position && pos < position + > partRemaining, so it will skip pos - position bytes, but partRemaining > remains the same > 4. if we read bytes more than n - pos, it will throw IOException. > Current code: > {code:java} > @Override > public synchronized void seek(long pos) throws IOException { > checkNotClosed(); > if (position == pos) { > return; > } else if (pos > position && pos < position + partRemaining) { > AliyunOSSUtils.skipFully(wrappedStream, pos - position); > // we need update partRemaining here > position = pos; > } else { > reopen(pos); > } > } > {code} > Logs: > java.io.IOException: Failed to read from stream. Remaining:101802 > at > org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182) > at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75) > at > org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92) > How to re-produce: > 1. create a file with 10MB size > 2. > {code:java} > int seekTimes = 150; > for (int i = 0; i < seekTimes; i++) { > long pos = size / (seekTimes - i) - 1; > LOG.info("begin seeking for pos: " + pos); > byte []buf = new byte[1024]; > instream.read(pos, buf, 0, 1024); > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15063) IOException may be thrown when read from Aliyun OSS in some case
[ https://issues.apache.org/jira/browse/HADOOP-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu updated HADOOP-15063: - Priority: Major (was: Critical) > IOException may be thrown when read from Aliyun OSS in some case > > > Key: HADOOP-15063 > URL: https://issues.apache.org/jira/browse/HADOOP-15063 > Project: Hadoop Common > Issue Type: Bug > Components: fs/oss >Affects Versions: 3.0.0-alpha2, 3.0.0-beta1 >Reporter: wujinhu >Assignee: wujinhu > Attachments: HADOOP-15063.001.patch > > > IOException will be thrown in this case > 1. set part size = n(102400) > 2. assume current position = 0, then partRemaining = 102400 > 3. we call seek(pos = 101802), with pos > position && pos < position + > partRemaining, so it will skip pos - position bytes, but partRemaining > remains the same > 4. if we read bytes more than n - pos, it will throw IOException. > Current code: > {code:java} > @Override > public synchronized void seek(long pos) throws IOException { > checkNotClosed(); > if (position == pos) { > return; > } else if (pos > position && pos < position + partRemaining) { > AliyunOSSUtils.skipFully(wrappedStream, pos - position); > // we need update partRemaining here > position = pos; > } else { > reopen(pos); > } > } > {code} > Logs: > java.io.IOException: Failed to read from stream. Remaining:101802 > at > org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182) > at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75) > at > org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92) > How to re-produce: > 1. create a file with 10MB size > 2. > {code:java} > int seekTimes = 150; > for (int i = 0; i < seekTimes; i++) { > long pos = size / (seekTimes - i) - 1; > LOG.info("begin seeking for pos: " + pos); > byte []buf = new byte[1024]; > instream.read(pos, buf, 0, 1024); > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15063) IOException may be thrown when read from Aliyun OSS in some case
[ https://issues.apache.org/jira/browse/HADOOP-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu updated HADOOP-15063: - Summary: IOException may be thrown when read from Aliyun OSS in some case (was: IOException will be thrown when read from Aliyun OSS) > IOException may be thrown when read from Aliyun OSS in some case > > > Key: HADOOP-15063 > URL: https://issues.apache.org/jira/browse/HADOOP-15063 > Project: Hadoop Common > Issue Type: Bug > Components: fs/oss >Affects Versions: 3.0.0-alpha2, 3.0.0-beta1 >Reporter: wujinhu >Assignee: wujinhu >Priority: Critical > Attachments: HADOOP-15063.001.patch > > > IOException will be thrown in this case > 1. set part size = n(102400) > 2. assume current position = 0, then partRemaining = 102400 > 3. we call seek(pos = 101802), with pos > position && pos < position + > partRemaining, so it will skip pos - position bytes, but partRemaining > remains the same > 4. if we read bytes more than n - pos, it will throw IOException. > Current code: > {code:java} > @Override > public synchronized void seek(long pos) throws IOException { > checkNotClosed(); > if (position == pos) { > return; > } else if (pos > position && pos < position + partRemaining) { > AliyunOSSUtils.skipFully(wrappedStream, pos - position); > // we need update partRemaining here > position = pos; > } else { > reopen(pos); > } > } > {code} > Logs: > java.io.IOException: Failed to read from stream. Remaining:101802 > at > org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182) > at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75) > at > org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92) > How to re-produce: > 1. create a file with 10MB size > 2. > {code:java} > int seekTimes = 150; > for (int i = 0; i < seekTimes; i++) { > long pos = size / (seekTimes - i) - 1; > LOG.info("begin seeking for pos: " + pos); > byte []buf = new byte[1024]; > instream.read(pos, buf, 0, 1024); > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15063) IOException will be thrown when read from Aliyun OSS
[ https://issues.apache.org/jira/browse/HADOOP-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu updated HADOOP-15063: - Attachment: HADOOP-15063.001.patch > IOException will be thrown when read from Aliyun OSS > > > Key: HADOOP-15063 > URL: https://issues.apache.org/jira/browse/HADOOP-15063 > Project: Hadoop Common > Issue Type: Bug > Components: fs/oss >Affects Versions: 3.0.0-alpha2, 3.0.0-beta1 >Reporter: wujinhu >Assignee: wujinhu >Priority: Critical > Attachments: HADOOP-15063.001.patch > > > IOException will be thrown in this case > 1. set part size = n(102400) > 2. assume current position = 0, then partRemaining = 102400 > 3. we call seek(pos = 101802), with pos > position && pos < position + > partRemaining, so it will skip pos - position bytes, but partRemaining > remains the same > 4. if we read bytes more than n - pos, it will throw IOException. > Current code: > {code:java} > @Override > public synchronized void seek(long pos) throws IOException { > checkNotClosed(); > if (position == pos) { > return; > } else if (pos > position && pos < position + partRemaining) { > AliyunOSSUtils.skipFully(wrappedStream, pos - position); > // we need update partRemaining here > position = pos; > } else { > reopen(pos); > } > } > {code} > Logs: > java.io.IOException: Failed to read from stream. Remaining:101802 > at > org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182) > at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75) > at > org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92) > How to re-produce: > 1. create a file with 10MB size > 2. > {code:java} > int seekTimes = 150; > for (int i = 0; i < seekTimes; i++) { > long pos = size / (seekTimes - i) - 1; > LOG.info("begin seeking for pos: " + pos); > byte []buf = new byte[1024]; > instream.read(pos, buf, 0, 1024); > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15063) IOException will be thrown when read from Aliyun OSS
[ https://issues.apache.org/jira/browse/HADOOP-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu updated HADOOP-15063: - Attachment: (was: HADOOP-15063.001.patch) > IOException will be thrown when read from Aliyun OSS > > > Key: HADOOP-15063 > URL: https://issues.apache.org/jira/browse/HADOOP-15063 > Project: Hadoop Common > Issue Type: Bug > Components: fs/oss >Affects Versions: 3.0.0-alpha2, 3.0.0-beta1 >Reporter: wujinhu >Assignee: wujinhu >Priority: Critical > > IOException will be thrown in this case > 1. set part size = n(102400) > 2. assume current position = 0, then partRemaining = 102400 > 3. we call seek(pos = 101802), with pos > position && pos < position + > partRemaining, so it will skip pos - position bytes, but partRemaining > remains the same > 4. if we read bytes more than n - pos, it will throw IOException. > Current code: > {code:java} > @Override > public synchronized void seek(long pos) throws IOException { > checkNotClosed(); > if (position == pos) { > return; > } else if (pos > position && pos < position + partRemaining) { > AliyunOSSUtils.skipFully(wrappedStream, pos - position); > // we need update partRemaining here > position = pos; > } else { > reopen(pos); > } > } > {code} > Logs: > java.io.IOException: Failed to read from stream. Remaining:101802 > at > org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182) > at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75) > at > org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92) > How to re-produce: > 1. create a file with 10MB size > 2. > {code:java} > int seekTimes = 150; > for (int i = 0; i < seekTimes; i++) { > long pos = size / (seekTimes - i) - 1; > LOG.info("begin seeking for pos: " + pos); > byte []buf = new byte[1024]; > instream.read(pos, buf, 0, 1024); > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15063) IOException will be thrown when read from Aliyun OSS
[ https://issues.apache.org/jira/browse/HADOOP-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu updated HADOOP-15063: - Attachment: HADOOP-15063.001.patch > IOException will be thrown when read from Aliyun OSS > > > Key: HADOOP-15063 > URL: https://issues.apache.org/jira/browse/HADOOP-15063 > Project: Hadoop Common > Issue Type: Bug > Components: fs/oss >Affects Versions: 3.0.0-alpha2, 3.0.0-beta1 >Reporter: wujinhu >Assignee: wujinhu >Priority: Critical > Attachments: HADOOP-15063.001.patch > > > IOException will be thrown in this case > 1. set part size = n(102400) > 2. assume current position = 0, then partRemaining = 102400 > 3. we call seek(pos = 101802), with pos > position && pos < position + > partRemaining, so it will skip pos - position bytes, but partRemaining > remains the same > 4. if we read bytes more than n - pos, it will throw IOException. > Current code: > {code:java} > @Override > public synchronized void seek(long pos) throws IOException { > checkNotClosed(); > if (position == pos) { > return; > } else if (pos > position && pos < position + partRemaining) { > AliyunOSSUtils.skipFully(wrappedStream, pos - position); > // we need update partRemaining here > position = pos; > } else { > reopen(pos); > } > } > {code} > Logs: > java.io.IOException: Failed to read from stream. Remaining:101802 > at > org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182) > at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75) > at > org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92) > How to re-produce: > 1. create a file with 10MB size > 2. > {code:java} > int seekTimes = 150; > for (int i = 0; i < seekTimes; i++) { > long pos = size / (seekTimes - i) - 1; > LOG.info("begin seeking for pos: " + pos); > byte []buf = new byte[1024]; > instream.read(pos, buf, 0, 1024); > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15063) IOException will be thrown when read from Aliyun OSS
[ https://issues.apache.org/jira/browse/HADOOP-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16262132#comment-16262132 ] wujinhu commented on HADOOP-15063: -- Upload patch file. > IOException will be thrown when read from Aliyun OSS > > > Key: HADOOP-15063 > URL: https://issues.apache.org/jira/browse/HADOOP-15063 > Project: Hadoop Common > Issue Type: Bug > Components: fs/oss >Affects Versions: 3.0.0-alpha2, 3.0.0-beta1 >Reporter: wujinhu >Assignee: wujinhu >Priority: Critical > Attachments: HADOOP-15063.001.patch > > > IOException will be thrown in this case > 1. set part size = n(102400) > 2. assume current position = 0, then partRemaining = 102400 > 3. we call seek(pos = 101802), with pos > position && pos < position + > partRemaining, so it will skip pos - position bytes, but partRemaining > remains the same > 4. if we read bytes more than n - pos, it will throw IOException. > Current code: > {code:java} > @Override > public synchronized void seek(long pos) throws IOException { > checkNotClosed(); > if (position == pos) { > return; > } else if (pos > position && pos < position + partRemaining) { > AliyunOSSUtils.skipFully(wrappedStream, pos - position); > // we need update partRemaining here > position = pos; > } else { > reopen(pos); > } > } > {code} > Logs: > java.io.IOException: Failed to read from stream. Remaining:101802 > at > org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182) > at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75) > at > org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92) > How to re-produce: > 1. create a file with 10MB size > 2. > {code:java} > int seekTimes = 150; > for (int i = 0; i < seekTimes; i++) { > long pos = size / (seekTimes - i) - 1; > LOG.info("begin seeking for pos: " + pos); > byte []buf = new byte[1024]; > instream.read(pos, buf, 0, 1024); > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15063) IOException will be thrown when read from Aliyun OSS
[ https://issues.apache.org/jira/browse/HADOOP-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu updated HADOOP-15063: - Affects Version/s: 3.0.0-beta1 > IOException will be thrown when read from Aliyun OSS > > > Key: HADOOP-15063 > URL: https://issues.apache.org/jira/browse/HADOOP-15063 > Project: Hadoop Common > Issue Type: Bug > Components: fs/oss >Affects Versions: 3.0.0-alpha2, 3.0.0-beta1 >Reporter: wujinhu >Assignee: wujinhu >Priority: Critical > > IOException will be thrown in this case > 1. set part size = n(102400) > 2. assume current position = 0, then partRemaining = 102400 > 3. we call seek(pos = 101802), with pos > position && pos < position + > partRemaining, so it will skip pos - position bytes, but partRemaining > remains the same > 4. if we read bytes more than n - pos, it will throw IOException. > Current code: > {code:java} > @Override > public synchronized void seek(long pos) throws IOException { > checkNotClosed(); > if (position == pos) { > return; > } else if (pos > position && pos < position + partRemaining) { > AliyunOSSUtils.skipFully(wrappedStream, pos - position); > // we need update partRemaining here > position = pos; > } else { > reopen(pos); > } > } > {code} > Logs: > java.io.IOException: Failed to read from stream. Remaining:101802 > at > org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182) > at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75) > at > org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92) > How to re-produce: > 1. create a file with 10MB size > 2. > {code:java} > int seekTimes = 150; > for (int i = 0; i < seekTimes; i++) { > long pos = size / (seekTimes - i) - 1; > LOG.info("begin seeking for pos: " + pos); > byte []buf = new byte[1024]; > instream.read(pos, buf, 0, 1024); > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Assigned] (HADOOP-15063) IOException will be thrown when read from Aliyun OSS
[ https://issues.apache.org/jira/browse/HADOOP-15063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wujinhu reassigned HADOOP-15063: Assignee: wujinhu > IOException will be thrown when read from Aliyun OSS > > > Key: HADOOP-15063 > URL: https://issues.apache.org/jira/browse/HADOOP-15063 > Project: Hadoop Common > Issue Type: Bug > Components: fs/oss >Affects Versions: 3.0.0-alpha2 >Reporter: wujinhu >Assignee: wujinhu >Priority: Critical > > IOException will be thrown in this case > 1. set part size = n(102400) > 2. assume current position = 0, then partRemaining = 102400 > 3. we call seek(pos = 101802), with pos > position && pos < position + > partRemaining, so it will skip pos - position bytes, but partRemaining > remains the same > 4. if we read bytes more than n - pos, it will throw IOException. > Current code: > {code:java} > @Override > public synchronized void seek(long pos) throws IOException { > checkNotClosed(); > if (position == pos) { > return; > } else if (pos > position && pos < position + partRemaining) { > AliyunOSSUtils.skipFully(wrappedStream, pos - position); > // we need update partRemaining here > position = pos; > } else { > reopen(pos); > } > } > {code} > Logs: > java.io.IOException: Failed to read from stream. Remaining:101802 > at > org.apache.hadoop.fs.aliyun.oss.AliyunOSSInputStream.read(AliyunOSSInputStream.java:182) > at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:75) > at > org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92) > How to re-produce: > 1. create a file with 10MB size > 2. > {code:java} > int seekTimes = 150; > for (int i = 0; i < seekTimes; i++) { > long pos = size / (seekTimes - i) - 1; > LOG.info("begin seeking for pos: " + pos); > byte []buf = new byte[1024]; > instream.read(pos, buf, 0, 1024); > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org