[jira] [Created] (HDFS-14383) Compute datanode load based on StoragePolicy
Karthik Palanisamy created HDFS-14383: - Summary: Compute datanode load based on StoragePolicy Key: HDFS-14383 URL: https://issues.apache.org/jira/browse/HDFS-14383 Project: Hadoop HDFS Issue Type: Bug Components: hdfs, namenode Affects Versions: 3.1.2, 2.7.3 Reporter: Karthik Palanisamy Assignee: Karthik Palanisamy Datanode load check logic needs to be changed because existing computation will not consider StoragePolicy. DatanodeManager#getInServiceXceiverAverage {code} public double getInServiceXceiverAverage() { double avgLoad = 0; final int nodes = getNumDatanodesInService(); if (nodes != 0) { final int xceivers = heartbeatManager .getInServiceXceiverCount(); avgLoad = (double)xceivers/nodes; } return avgLoad; } {code} For example: with 10 nodes (HOT), average 50 xceivers and 90 nodes (COLD) with average 10 xceivers the calculated threshold by the NN is 28 (((500 + 900)/100)*2), which means those 10 nodes (the whole HOT tier) becomes unavailable when the COLD tier nodes are barely in use. Turning this check off helps to mitigate this issue, however the dfs.namenode.replication.considerLoad helps to "balance" the load of the DNs, upon turning it off can lead to situations where specific DNs are "overloaded". -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-747) Update MiniOzoneCluster to work with security protocol from SCM
[ https://issues.apache.org/jira/browse/HDDS-747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao resolved HDDS-747. - Resolution: Invalid This won't work with different components require separate Kerberos login of different principles in the same JVM. We will look into [https://www.testcontainers.org/] to test secure docker compose in the next release. > Update MiniOzoneCluster to work with security protocol from SCM > --- > > Key: HDDS-747 > URL: https://issues.apache.org/jira/browse/HDDS-747 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Ajay Kumar >Priority: Major > Labels: ozone-security > > [HDDS-103] introduces a new security protocol in SCM. MiniOzoneCluster should > be updated to utilize it once its implementation is completed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
Re: [DISCUSS] Docker build process
Hi Jonathan, Thank you for your input. There are 15,300 matches for querying Google: dockerfile-maven-plugin site:github.com and 377 matches for query apache hosted projects. I see that many projects opt in to use profile to work around building docker images all the time while others stay true to have the process inline. People have the rights to opt out using effective root user to compile by giving -DskipDocker flag. Hence, the effective root user requirement is not permanent. People did not change their view point after the discussions of this email thread. I understand the reason that no one likes disruptive changes. I don’t expect calling vote on this issue will change the outcome. There are sufficient facts presented from both point of views in this email thread. I feel enough push back from the community on mandatory inline process and flexible to make the change to a profile-based process. I don’t need to feel guilty for implementing a half-baked release process and respect the community decision. Let’s digest the presented facts for rest of the day. I am ok for not calling the vote unless others think a voting procedure is required. Regards, Eric From: Jonathan Eagles Date: Tuesday, March 19, 2019 at 11:48 AM To: Eric Yang Cc: "Elek, Marton" , Hadoop Common , "yarn-...@hadoop.apache.org" , Hdfs-dev , Eric Badger , Eric Payne , Jim Brennan Subject: Re: [DISCUSS] Docker build process This email discussion thread is the result of failing to reach consensus in the JIRA. If you participate in this discussion thread, please recognize that a considerable effort has been made by contributors on this JIRA. On the other hand, contributors to this JIRA need to listen carefully to the comments in this discussion thread since they represent the thoughts and voices of the open source community that will a) benefit from and b) bear the burden of this feature. Failing to listen to these voices is failing to deliver a feature in its best form. My thoughts- As shown from my comments on YARN-7129, I have particular concerns that resonate other posters on this thread. https://issues.apache.org/jira/browse/YARN-7129?focusedCommentId=16790842=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16790842 - Docker images don't evolve at the same rate as Hadoop (tends to favor a separate release cycle, perhaps project) - Docker images could have many flavors and favoring one flavor (say ubuntu, or windows) over another takes away from Apache Hadoop's platform neutral stance (providing a single "one image fits all" stance is optimistic). - Introduces release processes that could limit the community's ability to produce releases at a regular rate. (Effective root user permissions needed to create image limiting who can release, extra Docker image only releases) - In addition, I worry this send a complicated message to our consumers and will stagnate release adoption. > I will make adjustment accordingly unless 7 more people comes out and say > otherwise. I'm sorry if this is a bit of humor which is lost on me. However, Apache Hadoop has a set of bylaws that dictate the community's process on decision making. https://hadoop.apache.org/bylaws.html Best Regards, jeagles
Re: [DISCUSS] Docker build process
Hi Arpit, On Docker Hub, Hadoop tagged with version number that looks like: docker-hadoop-runner-latest, or jdk11. It is hard to tell if jdk11 image is Hadoop 3 or Hadoop 2 because there is no consistency in tag format usage. This is my reasoning against tag as your heart desired because flexible naming causes confusion over the long run. There is a good article for perform maven release with M2_Release_Plugin in Jenkins: https://dzone.com/articles/running-maven-release-plugin Jenkins perform maven release, tags the source code with version number and automatically upload artifacts to Nexus, then reset version number to next SNAPSHOT number. If dockerfile plugin is used, it can upload the artifact to Dockerhub as part of the release. The proposed adjustment is to put docker build in a maven profile. User who wants to build it, will need to add -Pdocker flag to trigger the build. Regards, Eric On 3/19/19, 12:48 PM, "Arpit Agarwal" wrote: Hi Eric, > Dockerfile is most likely to change to apply the security fix. I am not sure this is always. Marton’s point about revising docker images independent of Hadoop versions is valid. > When maven release is automated through Jenkins, this is a breeze > of clicking a button. Jenkins even increment the target version > automatically with option to edit. I did not understand this suggestion. Could you please explain in simpler terms or share a link to the description? > I will make adjustment accordingly unless 7 more people comes > out and say otherwise. What adjustment is this? Thanks, Arpit > On Mar 19, 2019, at 10:19 AM, Eric Yang wrote: > > Hi Marton, > > Thank you for your input. I agree with most of what you said with a few exceptions. Security fix should result in a different version of the image instead of replace of a certain version. Dockerfile is most likely to change to apply the security fix. If it did not change, the source has instability over time, and result in non-buildable code over time. When maven release is automated through Jenkins, this is a breeze of clicking a button. Jenkins even increment the target version automatically with option to edit. It makes release manager's job easier than Homer Simpson's job. > > If versioning is done correctly, older branches can have the same docker subproject, and Hadoop 2.7.8 can be released for older Hadoop branches. We don't generate timeline paradox to allow changing the history of Hadoop 2.7.1. That release has passed and let it stay that way. > > There are mounting evidence that Hadoop community wants docker profile for developer image. Precommit build will not catch some build errors because more codes are allowed to slip through using profile build process. I will make adjustment accordingly unless 7 more people comes out and say otherwise. > > Regards, > Eric > > On 3/19/19, 1:18 AM, "Elek, Marton" wrote: > > > >Thank you Eric to describe the problem. > >I have multiple small comments, trying to separate them. > >I. separated vs in-build container image creation > >> The disadvantages are: >> >> 1. Require developer to have access to docker. >> 2. Default build takes longer. > > >These are not the only disadvantages (IMHO) as I wrote it in in the >previous thread and the issue [1] > >Using in-build container image creation doesn't enable: > >1. to modify the image later (eg. apply security fixes to the container >itself or apply improvements for the startup scripts) >2. create images for older releases (eg. hadoop 2.7.1) > >I think there are two kind of images: > >a) images for released artifacts >b) developer images > >I would prefer to manage a) with separated branch repositories but b) >with (optional!) in-build process. > >II. Agree with Steve. I think it's better to make it optional as most of >the time it's not required. I think it's better to support the default >dev build with the default settings (=just enough to start) > >III. Maven best practices > >(https://dzone.com/articles/maven-profile-best-practices) > >I think this is a good article. But this is not against profiles but >creating multiple versions from the same artifact with the same name >(eg. jdk8/jdk11). In Hadoop, profiles are used to introduce optional >steps. I think it's fine as the maven lifecycle/phase model is very >static (compare it with the tree based approach in Gradle). > >Marton > >[1]: https://issues.apache.org/jira/browse/HADOOP-16091 > >On 3/13/19 11:24 PM, Eric Yang
[jira] [Created] (HDDS-1312) Add more unit tests to verify BlockOutputStream functionalities
Shashikant Banerjee created HDDS-1312: - Summary: Add more unit tests to verify BlockOutputStream functionalities Key: HDDS-1312 URL: https://issues.apache.org/jira/browse/HDDS-1312 Project: Hadoop Distributed Data Store Issue Type: Improvement Affects Versions: 0.5.0 Reporter: Shashikant Banerjee Assignee: Shashikant Banerjee Fix For: 0.5.0 This jira aims to add more unit test coverage for BlockOutputStream functionalities. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
Re: [DISCUSS] Docker build process
Hi Eric, > Dockerfile is most likely to change to apply the security fix. I am not sure this is always. Marton’s point about revising docker images independent of Hadoop versions is valid. > When maven release is automated through Jenkins, this is a breeze > of clicking a button. Jenkins even increment the target version > automatically with option to edit. I did not understand this suggestion. Could you please explain in simpler terms or share a link to the description? > I will make adjustment accordingly unless 7 more people comes > out and say otherwise. What adjustment is this? Thanks, Arpit > On Mar 19, 2019, at 10:19 AM, Eric Yang wrote: > > Hi Marton, > > Thank you for your input. I agree with most of what you said with a few > exceptions. Security fix should result in a different version of the image > instead of replace of a certain version. Dockerfile is most likely to change > to apply the security fix. If it did not change, the source has instability > over time, and result in non-buildable code over time. When maven release is > automated through Jenkins, this is a breeze of clicking a button. Jenkins > even increment the target version automatically with option to edit. It > makes release manager's job easier than Homer Simpson's job. > > If versioning is done correctly, older branches can have the same docker > subproject, and Hadoop 2.7.8 can be released for older Hadoop branches. We > don't generate timeline paradox to allow changing the history of Hadoop > 2.7.1. That release has passed and let it stay that way. > > There are mounting evidence that Hadoop community wants docker profile for > developer image. Precommit build will not catch some build errors because > more codes are allowed to slip through using profile build process. I will > make adjustment accordingly unless 7 more people comes out and say otherwise. > > Regards, > Eric > > On 3/19/19, 1:18 AM, "Elek, Marton" wrote: > > > >Thank you Eric to describe the problem. > >I have multiple small comments, trying to separate them. > >I. separated vs in-build container image creation > >> The disadvantages are: >> >> 1. Require developer to have access to docker. >> 2. Default build takes longer. > > >These are not the only disadvantages (IMHO) as I wrote it in in the >previous thread and the issue [1] > >Using in-build container image creation doesn't enable: > >1. to modify the image later (eg. apply security fixes to the container >itself or apply improvements for the startup scripts) >2. create images for older releases (eg. hadoop 2.7.1) > >I think there are two kind of images: > >a) images for released artifacts >b) developer images > >I would prefer to manage a) with separated branch repositories but b) >with (optional!) in-build process. > >II. Agree with Steve. I think it's better to make it optional as most of >the time it's not required. I think it's better to support the default >dev build with the default settings (=just enough to start) > >III. Maven best practices > >(https://dzone.com/articles/maven-profile-best-practices) > >I think this is a good article. But this is not against profiles but >creating multiple versions from the same artifact with the same name >(eg. jdk8/jdk11). In Hadoop, profiles are used to introduce optional >steps. I think it's fine as the maven lifecycle/phase model is very >static (compare it with the tree based approach in Gradle). > >Marton > >[1]: https://issues.apache.org/jira/browse/HADOOP-16091 > >On 3/13/19 11:24 PM, Eric Yang wrote: >> Hi Hadoop developers, >> >> In the recent months, there were various discussions on creating docker >> build process for Hadoop. There was convergence to make docker build >> process inline in the mailing list last month when Ozone team is planning >> new repository for Hadoop/ozone docker images. New feature has started to >> add docker image build process inline in Hadoop build. >> A few lessons learnt from making docker build inline in YARN-7129. The >> build environment must have docker to have a successful docker build. >> BUILD.txt stated for easy build environment use Docker. There is logic in >> place to ensure that absence of docker does not trigger docker build. The >> inline process tries to be as non-disruptive as possible to existing >> development environment with one exception. If docker’s presence is >> detected, but user does not have rights to run docker. This will cause the >> build to fail. >> >> Now, some developers are pushing back on inline docker build process because >> existing environment did not make docker build process mandatory. However, >> there are benefits to use inline docker build process. The listed benefits >> are: >> >> 1. Source code tag, maven repository artifacts and docker hub artifacts can >> all be
Re: [DISCUSS] Docker build process
This email discussion thread is the result of failing to reach consensus in the JIRA. If you participate in this discussion thread, please recognize that a considerable effort has been made by contributors on this JIRA. On the other hand, contributors to this JIRA need to listen carefully to the comments in this discussion thread since they represent the thoughts and voices of the open source community that will a) benefit from and b) bear the burden of this feature. Failing to listen to these voices is failing to deliver a feature in its best form. My thoughts- As shown from my comments on YARN-7129, I have particular concerns that resonate other posters on this thread. https://issues.apache.org/jira/browse/YARN-7129?focusedCommentId=16790842=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16790842 - Docker images don't evolve at the same rate as Hadoop (tends to favor a separate release cycle, perhaps project) - Docker images could have many flavors and favoring one flavor (say ubuntu, or windows) over another takes away from Apache Hadoop's platform neutral stance (providing a single "one image fits all" stance is optimistic). - Introduces release processes that could limit the community's ability to produce releases at a regular rate. (Effective root user permissions needed to create image limiting who can release, extra Docker image only releases) - In addition, I worry this send a complicated message to our consumers and will stagnate release adoption. > I will make adjustment accordingly unless 7 more people comes out and say otherwise. I'm sorry if this is a bit of humor which is lost on me. However, Apache Hadoop has a set of bylaws that dictate the community's process on decision making. https://hadoop.apache.org/bylaws.html Best Regards, jeagles
[jira] [Created] (HDDS-1311) Make Install Snapshot option configurable
Hanisha Koneru created HDDS-1311: Summary: Make Install Snapshot option configurable Key: HDDS-1311 URL: https://issues.apache.org/jira/browse/HDDS-1311 Project: Hadoop Distributed Data Store Issue Type: New Feature Reporter: Hanisha Koneru Assignee: Hanisha Koneru This Jira aims to make the install snapshot command from leader to follower configurable. By default, install snapshot should be enabled. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
Re: [DISCUSS] Docker build process
I agree with Steve and Marton. I am ok with having the docker build as an option, but I don't want it to be the default. Jim On Tue, Mar 19, 2019 at 12:19 PM Eric Yang wrote: > Hi Marton, > > Thank you for your input. I agree with most of what you said with a few > exceptions. Security fix should result in a different version of the image > instead of replace of a certain version. Dockerfile is most likely to > change to apply the security fix. If it did not change, the source has > instability over time, and result in non-buildable code over time. When > maven release is automated through Jenkins, this is a breeze of clicking a > button. Jenkins even increment the target version automatically with > option to edit. It makes release manager's job easier than Homer Simpson's > job. > > If versioning is done correctly, older branches can have the same docker > subproject, and Hadoop 2.7.8 can be released for older Hadoop branches. We > don't generate timeline paradox to allow changing the history of Hadoop > 2.7.1. That release has passed and let it stay that way. > > There are mounting evidence that Hadoop community wants docker profile for > developer image. Precommit build will not catch some build errors because > more codes are allowed to slip through using profile build process. I will > make adjustment accordingly unless 7 more people comes out and say > otherwise. > > Regards, > Eric > > On 3/19/19, 1:18 AM, "Elek, Marton" wrote: > > > > Thank you Eric to describe the problem. > > I have multiple small comments, trying to separate them. > > I. separated vs in-build container image creation > > > The disadvantages are: > > > > 1. Require developer to have access to docker. > > 2. Default build takes longer. > > > These are not the only disadvantages (IMHO) as I wrote it in in the > previous thread and the issue [1] > > Using in-build container image creation doesn't enable: > > 1. to modify the image later (eg. apply security fixes to the container > itself or apply improvements for the startup scripts) > 2. create images for older releases (eg. hadoop 2.7.1) > > I think there are two kind of images: > > a) images for released artifacts > b) developer images > > I would prefer to manage a) with separated branch repositories but b) > with (optional!) in-build process. > > II. Agree with Steve. I think it's better to make it optional as most > of > the time it's not required. I think it's better to support the default > dev build with the default settings (=just enough to start) > > III. Maven best practices > > (https://dzone.com/articles/maven-profile-best-practices) > > I think this is a good article. But this is not against profiles but > creating multiple versions from the same artifact with the same name > (eg. jdk8/jdk11). In Hadoop, profiles are used to introduce optional > steps. I think it's fine as the maven lifecycle/phase model is very > static (compare it with the tree based approach in Gradle). > > Marton > > [1]: https://issues.apache.org/jira/browse/HADOOP-16091 > > On 3/13/19 11:24 PM, Eric Yang wrote: > > Hi Hadoop developers, > > > > In the recent months, there were various discussions on creating > docker build process for Hadoop. There was convergence to make docker > build process inline in the mailing list last month when Ozone team is > planning new repository for Hadoop/ozone docker images. New feature has > started to add docker image build process inline in Hadoop build. > > A few lessons learnt from making docker build inline in YARN-7129. > The build environment must have docker to have a successful docker build. > BUILD.txt stated for easy build environment use Docker. There is logic in > place to ensure that absence of docker does not trigger docker build. The > inline process tries to be as non-disruptive as possible to existing > development environment with one exception. If docker’s presence is > detected, but user does not have rights to run docker. This will cause the > build to fail. > > > > Now, some developers are pushing back on inline docker build process > because existing environment did not make docker build process mandatory. > However, there are benefits to use inline docker build process. The listed > benefits are: > > > > 1. Source code tag, maven repository artifacts and docker hub > artifacts can all be produced in one build. > > 2. Less manual labor to tag different source branches. > > 3. Reduce intermediate build caches that may exist in multi-stage > builds. > > 4. Release engineers and developers do not need to search a maze of > build flags to acquire artifacts. > > > > The disadvantages are: > > > > 1. Require developer to have access to docker. > > 2. Default build takes longer. > > > > There is workaround for above
Re: [DISCUSS] Docker build process
Hi Marton, Thank you for your input. I agree with most of what you said with a few exceptions. Security fix should result in a different version of the image instead of replace of a certain version. Dockerfile is most likely to change to apply the security fix. If it did not change, the source has instability over time, and result in non-buildable code over time. When maven release is automated through Jenkins, this is a breeze of clicking a button. Jenkins even increment the target version automatically with option to edit. It makes release manager's job easier than Homer Simpson's job. If versioning is done correctly, older branches can have the same docker subproject, and Hadoop 2.7.8 can be released for older Hadoop branches. We don't generate timeline paradox to allow changing the history of Hadoop 2.7.1. That release has passed and let it stay that way. There are mounting evidence that Hadoop community wants docker profile for developer image. Precommit build will not catch some build errors because more codes are allowed to slip through using profile build process. I will make adjustment accordingly unless 7 more people comes out and say otherwise. Regards, Eric On 3/19/19, 1:18 AM, "Elek, Marton" wrote: Thank you Eric to describe the problem. I have multiple small comments, trying to separate them. I. separated vs in-build container image creation > The disadvantages are: > > 1. Require developer to have access to docker. > 2. Default build takes longer. These are not the only disadvantages (IMHO) as I wrote it in in the previous thread and the issue [1] Using in-build container image creation doesn't enable: 1. to modify the image later (eg. apply security fixes to the container itself or apply improvements for the startup scripts) 2. create images for older releases (eg. hadoop 2.7.1) I think there are two kind of images: a) images for released artifacts b) developer images I would prefer to manage a) with separated branch repositories but b) with (optional!) in-build process. II. Agree with Steve. I think it's better to make it optional as most of the time it's not required. I think it's better to support the default dev build with the default settings (=just enough to start) III. Maven best practices (https://dzone.com/articles/maven-profile-best-practices) I think this is a good article. But this is not against profiles but creating multiple versions from the same artifact with the same name (eg. jdk8/jdk11). In Hadoop, profiles are used to introduce optional steps. I think it's fine as the maven lifecycle/phase model is very static (compare it with the tree based approach in Gradle). Marton [1]: https://issues.apache.org/jira/browse/HADOOP-16091 On 3/13/19 11:24 PM, Eric Yang wrote: > Hi Hadoop developers, > > In the recent months, there were various discussions on creating docker build process for Hadoop. There was convergence to make docker build process inline in the mailing list last month when Ozone team is planning new repository for Hadoop/ozone docker images. New feature has started to add docker image build process inline in Hadoop build. > A few lessons learnt from making docker build inline in YARN-7129. The build environment must have docker to have a successful docker build. BUILD.txt stated for easy build environment use Docker. There is logic in place to ensure that absence of docker does not trigger docker build. The inline process tries to be as non-disruptive as possible to existing development environment with one exception. If docker’s presence is detected, but user does not have rights to run docker. This will cause the build to fail. > > Now, some developers are pushing back on inline docker build process because existing environment did not make docker build process mandatory. However, there are benefits to use inline docker build process. The listed benefits are: > > 1. Source code tag, maven repository artifacts and docker hub artifacts can all be produced in one build. > 2. Less manual labor to tag different source branches. > 3. Reduce intermediate build caches that may exist in multi-stage builds. > 4. Release engineers and developers do not need to search a maze of build flags to acquire artifacts. > > The disadvantages are: > > 1. Require developer to have access to docker. > 2. Default build takes longer. > > There is workaround for above disadvantages by using -DskipDocker flag to avoid docker build completely or -pl !modulename to bypass subprojects. > Hadoop development did not follow Maven best practice because a full Hadoop build requires a number of profile and configuration parameters. Some
[jira] [Created] (HDFS-14382) The hdfs fsck command docs do not explain the meaning of the reported fields
Daniel Templeton created HDFS-14382: --- Summary: The hdfs fsck command docs do not explain the meaning of the reported fields Key: HDFS-14382 URL: https://issues.apache.org/jira/browse/HDFS-14382 Project: Hadoop HDFS Issue Type: Improvement Components: documentation Affects Versions: 3.2.0 Reporter: Daniel Templeton The {{hdfs fsck}} command shows something like: {noformat}FSCK started by root (auth:SIMPLE) from /172.17.0.2 for path /tmp at Tue Mar 19 15:50:24 UTC 2019 .Status: HEALTHY Total size:179159051 B Total dirs:11 Total files: 1 Total symlinks:0 Total blocks (validated): 2 (avg. block size 89579525 B) Minimally replicated blocks: 2 (100.0 %) Over-replicated blocks:0 (0.0 %) Under-replicated blocks: 0 (0.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor:1 Average block replication: 1.0 Corrupt blocks:0 Missing replicas: 0 (0.0 %) Number of data-nodes: 1 Number of racks: 1 FSCK ended at Tue Mar 19 15:50:24 UTC 2019 in 3 milliseconds The filesystem under path '/tmp' is HEALTHY{noformat} The fields are presumed to be self-explanatory, but I think that's a bold assumption. In particular, it's not obvious how "mis-replicated" blocks differ from "under-replicated" or "over-replicated" blocks. It would be nice to explain the meaning of all the fields clearly in the docs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14381) Add option to hdfs dfs -cat to ignore corrupt blocks
Daniel Templeton created HDFS-14381: --- Summary: Add option to hdfs dfs -cat to ignore corrupt blocks Key: HDFS-14381 URL: https://issues.apache.org/jira/browse/HDFS-14381 Project: Hadoop HDFS Issue Type: Improvement Components: tools Affects Versions: 3.2.0 Reporter: Daniel Templeton If I have a file in HDFS that contains 100 blocks, and I happen to lose the first block (for whatever obscure/unlikely/dumb reason), I can no longer access the 99% of the file that's still there and accessible. In the case of some data formats (e.g. text), the remaining data may still be useful. It would be nice to have a way to extract the remaining data without having to manually reassemble the file contents from the block files. Something like {{hdfs dfs -cat -ignoreCorrupt }}. It could insert some marker to show where the missing blocks are. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86
For more details, see https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1080/ [Mar 17, 2019 9:08:29 AM] (sammichen) HDDS-699. Detect Ozone Network topology. Contributed by Sammi Chen. [Mar 18, 2019 11:45:01 AM] (templedf) MAPREDUCE-7188. [Clean-up] Remove NULL check before instanceof and fix [Mar 18, 2019 1:18:08 PM] (stevel) HADOOP-16182. Update abfs storage back-end with "close" flag when [Mar 18, 2019 2:08:37 PM] (templedf) YARN-9340. [Clean-up] Remove NULL check before instanceof in [Mar 18, 2019 2:10:26 PM] (templedf) HDFS-14328. [Clean-up] Remove NULL check before instanceof in TestGSet [Mar 18, 2019 3:13:43 PM] (xkrogen) HADOOP-16192. Fix CallQueue backoff bugs: perform backoff when add() is [Mar 18, 2019 3:38:55 PM] (7813154+ajayydv) HDDS-1296. Fix checkstyle issue from Nightly run. Contributed by Xiaoyu [Mar 18, 2019 5:04:49 PM] (eyang) HADOOP-16167. Fixed Hadoop shell script for Ubuntu 18. [Mar 18, 2019 5:16:34 PM] (eyang) YARN-9385. Fixed ApiServiceClient to use current UGI. [Mar 18, 2019 5:57:18 PM] (eyang) YARN-9363. Replaced debug logging with SLF4J parameterized log message. [Mar 18, 2019 7:13:13 PM] (stevel) HADOOP-16124. Extend documentation in testing.md about S3 endpoint [Mar 18, 2019 8:51:44 PM] (bharat) HDDS-1250. In OM HA AllocateBlock call where connecting to SCM from OM [Mar 18, 2019 9:21:57 PM] (arp) Revert "HDDS-1284. Adjust default values of pipline recovery for more [Mar 18, 2019 11:58:42 PM] (eyang) YARN-9364. Remove commons-logging dependency from YARN. -1 overall The following subsystems voted -1: asflicense findbugs hadolint pathlen unit xml The following subsystems voted -1 but were configured to be filtered/ignored: cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace The following subsystems are considered long running: (runtime bigger than 1h 0m 0s) unit Specific tests: XML : Parsing Error(s): hadoop-build-tools/src/main/resources/checkstyle/checkstyle.xml hadoop-build-tools/src/main/resources/checkstyle/suppressions.xml hadoop-tools/hadoop-azure/src/config/checkstyle-suppressions.xml hadoop-tools/hadoop-azure/src/config/checkstyle.xml hadoop-tools/hadoop-resourceestimator/src/config/checkstyle.xml FindBugs : module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-documentstore org.apache.hadoop.yarn.server.timelineservice.documentstore.collection.document.entity.TimelineEntityDocument.setEvents(Map) makes inefficient use of keySet iterator instead of entrySet iterator At TimelineEntityDocument.java:keySet iterator instead of entrySet iterator At TimelineEntityDocument.java:[line 159] org.apache.hadoop.yarn.server.timelineservice.documentstore.collection.document.entity.TimelineEntityDocument.setMetrics(Map) makes inefficient use of keySet iterator instead of entrySet iterator At TimelineEntityDocument.java:keySet iterator instead of entrySet iterator At TimelineEntityDocument.java:[line 142] Unread field:TimelineEventSubDoc.java:[line 56] Unread field:TimelineMetricSubDoc.java:[line 44] Switch statement found in org.apache.hadoop.yarn.server.timelineservice.documentstore.collection.document.flowrun.FlowRunDocument.aggregate(TimelineMetric, TimelineMetric) where default case is missing At FlowRunDocument.java:TimelineMetric) where default case is missing At FlowRunDocument.java:[lines 121-136] org.apache.hadoop.yarn.server.timelineservice.documentstore.collection.document.flowrun.FlowRunDocument.aggregateMetrics(Map) makes inefficient use of keySet iterator instead of entrySet iterator At FlowRunDocument.java:keySet iterator instead of entrySet iterator At FlowRunDocument.java:[line 103] Possible doublecheck on org.apache.hadoop.yarn.server.timelineservice.documentstore.reader.cosmosdb.CosmosDBDocumentStoreReader.client in new org.apache.hadoop.yarn.server.timelineservice.documentstore.reader.cosmosdb.CosmosDBDocumentStoreReader(Configuration) At CosmosDBDocumentStoreReader.java:new org.apache.hadoop.yarn.server.timelineservice.documentstore.reader.cosmosdb.CosmosDBDocumentStoreReader(Configuration) At CosmosDBDocumentStoreReader.java:[lines 73-75] Possible doublecheck on org.apache.hadoop.yarn.server.timelineservice.documentstore.writer.cosmosdb.CosmosDBDocumentStoreWriter.client in new org.apache.hadoop.yarn.server.timelineservice.documentstore.writer.cosmosdb.CosmosDBDocumentStoreWriter(Configuration) At CosmosDBDocumentStoreWriter.java:new org.apache.hadoop.yarn.server.timelineservice.documentstore.writer.cosmosdb.CosmosDBDocumentStoreWriter(Configuration) At CosmosDBDocumentStoreWriter.java:[lines 66-68] Failed junit tests : hadoop.hdfs.server.datanode.TestBPOfferService hadoop.hdfs.server.namenode.ha.TestBootstrapAliasmap
[jira] [Created] (HDDS-1310) In datanode once a container becomes unhealthy, datanode restart fails.
Sandeep Nemuri created HDDS-1310: Summary: In datanode once a container becomes unhealthy, datanode restart fails. Key: HDDS-1310 URL: https://issues.apache.org/jira/browse/HDDS-1310 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone Datanode Affects Versions: 0.3.0 Reporter: Sandeep Nemuri When a container is marked as {{UNHEALTHY}} in a datanode, subsequent restart of that datanode fails as it cannot generate ContainerReports anymore. Unhealthy state of a container is not handled in ContainerReport generation inside a datanode. We get the below exception when a datanode tries to generate the ContainerReport which contains unhealthy container(s) {noformat} 2019-03-19 13:51:13,646 [Datanode State Machine Thread - 0] ERROR - Unable to communicate to SCM server at x.x.xxx:9861 for past 3300 seconds. org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: Invalid Container state found: 86 at org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.getHddsState(KeyValueContainer.java:623) at org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.getContainerReport(KeyValueContainer.java:593) at org.apache.hadoop.ozone.container.common.impl.ContainerSet.getContainerReport(ContainerSet.java:204) at org.apache.hadoop.ozone.container.ozoneimpl.ContainerController.getContainerReport(ContainerController.java:82) at org.apache.hadoop.ozone.container.common.states.endpoint.RegisterEndpointTask.call(RegisterEndpointTask.java:114) at org.apache.hadoop.ozone.container.common.states.endpoint.RegisterEndpointTask.call(RegisterEndpointTask.java:47) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14380) webhdfs failover append to stand-by namenode fails
gael URBAUER created HDFS-14380: --- Summary: webhdfs failover append to stand-by namenode fails Key: HDFS-14380 URL: https://issues.apache.org/jira/browse/HDFS-14380 Project: Hadoop HDFS Issue Type: Sub-task Components: webhdfs Affects Versions: 2.7.3 Environment: HDP 2.6.2 HA namenode activated Reporter: gael URBAUER I'm using datastage to create file in Hadoop through webhdfs. It happens that when namenode failover happens, datastage is sometimes talking to the standby namenode. Then create operation succeed but when files are bigger than the buffer size, datastage calls the APPEND operation and get back a 403 response. It seems not very coherent that some write operation are allowed on the stand-by and other aren't. Regards, Gaël -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86
For more details, see https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/265/ [Mar 18, 2019 4:00:40 PM] (xkrogen) HADOOP-16192. Fix CallQueue backoff bugs: perform backoff when add() is -1 overall The following subsystems voted -1: asflicense findbugs hadolint pathlen unit xml The following subsystems voted -1 but were configured to be filtered/ignored: cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace The following subsystems are considered long running: (runtime bigger than 1h 0m 0s) unit Specific tests: XML : Parsing Error(s): hadoop-build-tools/src/main/resources/checkstyle/checkstyle.xml hadoop-build-tools/src/main/resources/checkstyle/suppressions.xml hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/conf/empty-configuration.xml hadoop-tools/hadoop-azure/src/config/checkstyle.xml hadoop-tools/hadoop-resourceestimator/src/config/checkstyle.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/public/crossdomain.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/public/crossdomain.xml FindBugs : module:hadoop-common-project/hadoop-common Class org.apache.hadoop.fs.GlobalStorageStatistics defines non-transient non-serializable instance field map In GlobalStorageStatistics.java:instance field map In GlobalStorageStatistics.java FindBugs : module:hadoop-hdfs-project/hadoop-hdfs Dead store to state in org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Saver.save(OutputStream, INodeSymlink) At FSImageFormatPBINode.java:org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Saver.save(OutputStream, INodeSymlink) At FSImageFormatPBINode.java:[line 623] FindBugs : module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase/hadoop-yarn-server-timelineservice-hbase-client Boxed value is unboxed and then immediately reboxed in org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnRWHelper.readResultsWithTimestamps(Result, byte[], byte[], KeyConverter, ValueConverter, boolean) At ColumnRWHelper.java:then immediately reboxed in org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnRWHelper.readResultsWithTimestamps(Result, byte[], byte[], KeyConverter, ValueConverter, boolean) At ColumnRWHelper.java:[line 335] Failed junit tests : hadoop.security.authentication.client.TestKerberosAuthenticator hadoop.util.TestBasicDiskValidator hadoop.util.TestDiskCheckerWithDiskIo hadoop.crypto.key.kms.server.TestKMS hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys hadoop.hdfs.server.balancer.TestBalancerRPCDelay hadoop.yarn.client.api.impl.TestAMRMProxy hadoop.registry.secure.TestSecureLogins hadoop.yarn.server.nodemanager.amrmproxy.TestFederationInterceptor hadoop.yarn.server.timelineservice.security.TestTimelineAuthFilterForV2 cc: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/265/artifact/out/diff-compile-cc-root-jdk1.7.0_95.txt [4.0K] javac: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/265/artifact/out/diff-compile-javac-root-jdk1.7.0_95.txt [328K] cc: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/265/artifact/out/diff-compile-cc-root-jdk1.8.0_191.txt [4.0K] javac: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/265/artifact/out/diff-compile-javac-root-jdk1.8.0_191.txt [308K] checkstyle: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/265/artifact/out/diff-checkstyle-root.txt [16M] hadolint: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/265/artifact/out/diff-patch-hadolint.txt [4.0K] pathlen: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/265/artifact/out/pathlen.txt [12K] pylint: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/265/artifact/out/diff-patch-pylint.txt [24K] shellcheck: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/265/artifact/out/diff-patch-shellcheck.txt [72K] shelldocs: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/265/artifact/out/diff-patch-shelldocs.txt [8.0K] whitespace: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/265/artifact/out/whitespace-eol.txt [12M] https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/265/artifact/out/whitespace-tabs.txt [1.2M] xml: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/265/artifact/out/xml.txt [20K] findbugs:
[jira] [Created] (HDFS-14379) WebHdfsFileSystem.toUrl double encodes characters
Boris Vulikh created HDFS-14379: --- Summary: WebHdfsFileSystem.toUrl double encodes characters Key: HDFS-14379 URL: https://issues.apache.org/jira/browse/HDFS-14379 Project: Hadoop HDFS Issue Type: Bug Components: hdfs, hdfs-client Affects Versions: 3.2.0 Reporter: Boris Vulikh When using DistCP over HTTPFS with data that contains Spark partitions, DistCP fails to access the partitioned parquet files since the "=" characters in file path gets double encoded: {{"/test/spark/partition/year=2019/month=1/day=1"}} to {{"/test/spark/partition/year%253D2019/month%253D1/day%253D1"}} This happens since {{fsPathItem}} containing the character {color:#d04437}'='{color} is encoded by {{URLEncoder._encode_(fsPathItem, "UTF-8")}} to {color:#d04437}'%3D'{color} and then encoded again by {{new Path()}} to {color:#d04437}'%253D'{color}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
Re: [DISCUSS] Docker build process
Thank you Eric to describe the problem. I have multiple small comments, trying to separate them. I. separated vs in-build container image creation > The disadvantages are: > > 1. Require developer to have access to docker. > 2. Default build takes longer. These are not the only disadvantages (IMHO) as I wrote it in in the previous thread and the issue [1] Using in-build container image creation doesn't enable: 1. to modify the image later (eg. apply security fixes to the container itself or apply improvements for the startup scripts) 2. create images for older releases (eg. hadoop 2.7.1) I think there are two kind of images: a) images for released artifacts b) developer images I would prefer to manage a) with separated branch repositories but b) with (optional!) in-build process. II. Agree with Steve. I think it's better to make it optional as most of the time it's not required. I think it's better to support the default dev build with the default settings (=just enough to start) III. Maven best practices (https://dzone.com/articles/maven-profile-best-practices) I think this is a good article. But this is not against profiles but creating multiple versions from the same artifact with the same name (eg. jdk8/jdk11). In Hadoop, profiles are used to introduce optional steps. I think it's fine as the maven lifecycle/phase model is very static (compare it with the tree based approach in Gradle). Marton [1]: https://issues.apache.org/jira/browse/HADOOP-16091 On 3/13/19 11:24 PM, Eric Yang wrote: > Hi Hadoop developers, > > In the recent months, there were various discussions on creating docker build > process for Hadoop. There was convergence to make docker build process > inline in the mailing list last month when Ozone team is planning new > repository for Hadoop/ozone docker images. New feature has started to add > docker image build process inline in Hadoop build. > A few lessons learnt from making docker build inline in YARN-7129. The build > environment must have docker to have a successful docker build. BUILD.txt > stated for easy build environment use Docker. There is logic in place to > ensure that absence of docker does not trigger docker build. The inline > process tries to be as non-disruptive as possible to existing development > environment with one exception. If docker’s presence is detected, but user > does not have rights to run docker. This will cause the build to fail. > > Now, some developers are pushing back on inline docker build process because > existing environment did not make docker build process mandatory. However, > there are benefits to use inline docker build process. The listed benefits > are: > > 1. Source code tag, maven repository artifacts and docker hub artifacts can > all be produced in one build. > 2. Less manual labor to tag different source branches. > 3. Reduce intermediate build caches that may exist in multi-stage builds. > 4. Release engineers and developers do not need to search a maze of build > flags to acquire artifacts. > > The disadvantages are: > > 1. Require developer to have access to docker. > 2. Default build takes longer. > > There is workaround for above disadvantages by using -DskipDocker flag to > avoid docker build completely or -pl !modulename to bypass subprojects. > Hadoop development did not follow Maven best practice because a full Hadoop > build requires a number of profile and configuration parameters. Some > evolutions are working against Maven design and require fork of separate > source trees for different subprojects and pom files. Maven best practice > (https://dzone.com/articles/maven-profile-best-practices) has explained that > do not use profile to trigger different artifact builds because it will > introduce maven artifact naming conflicts on maven repository using this > pattern. Maven offers flags to skip certain operations, such as -DskipTests > -Dmaven.javadoc.skip=true -pl or -DskipDocker. It seems worthwhile to make > some corrections to follow best practice for Hadoop build. > > Some developers have advocated for separate build process for docker images. > We need consensus on the direction that will work best for Hadoop development > community. Hence, my questions are: > > Do we want to have inline docker build process in maven? > If yes, it would be developer’s responsibility to pass -DskipDocker flag to > skip docker. Docker is mandatory for default build. > If no, what is the release flow for docker images going to look like? > > Thank you for your feedback. > > Regards, > Eric > - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org