[jira] [Updated] (HADOOP-17857) Check real user ACLs in addition to proxied user ACLs
[ https://issues.apache.org/jira/browse/HADOOP-17857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated HADOOP-17857: Fix Version/s: 3.3.3 3.2.4 2.10.2 > Check real user ACLs in addition to proxied user ACLs > - > > Key: HADOOP-17857 > URL: https://issues.apache.org/jira/browse/HADOOP-17857 > Project: Hadoop Common > Issue Type: Improvement >Affects Versions: 3.2.2, 2.10.1, 3.3.1 >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Major > Fix For: 3.4.0, 2.10.2, 3.2.4, 3.3.3 > > Attachments: HADOOP-17857.001.patch, HADOOP-17857.002.patch > > > In a secure cluster, it is possible to configure the services to allow a > super-user to proxy to a regular user and perform actions on behalf of the > proxied user (see [Proxy user - Superusers Acting On Behalf Of Other > Users|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html]). > This is useful for automating server access for multiple different users in a > multi-tenant cluster. For example, this can be used by a super user > submitting jobs to a YARN queue, accessing HDFS files, scheduling Oozie > workflows, etc, which will then execute the service as the proxied user. > Usually when these services check ACLs to determine if the user has access to > the requested resources, the service only needs to check the ACLs for the > proxied user. However, it is sometimes desirable to allow the proxied user to > have access to the resources when only the real user has open ACLs. > For instance, let's say the user {{adm}} is the only user with submit ACLs to > the {{dataload}} queue, and the {{adm}} user wants to submit apps to the > {{dataload}} queue on behalf of users {{headless1}} and {{headless2}}. In > addition, we want to be able to bill {{headless1}} and {{headless2}} > separately for the YARN resources used in the {{dataload}} queue. In order to > do this, the apps need to run in the {{dataload}} queue as the respective > headless users. We could open up the ACLs to the {{dataload}} queue to allow > {{headless1}} and {{headless2}} to submit apps. But this would allow those > users to submit any app to that queue, and not be limited to just the data > loading apps, and we don't trust the {{headless1}} and {{headless2}} owners > to honor that restriction. > This JIRA proposes that we define a way to set up ACLs to restrict a > resource's access to a super-user, but when the access happens, run it as > the proxied user. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-17857) Check real user ACLs in addition to proxied user ACLs
[ https://issues.apache.org/jira/browse/HADOOP-17857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17430759#comment-17430759 ] Eric Payne commented on HADOOP-17857: - I would like to backport this to the previous branches, back to branch-2.10. > Check real user ACLs in addition to proxied user ACLs > - > > Key: HADOOP-17857 > URL: https://issues.apache.org/jira/browse/HADOOP-17857 > Project: Hadoop Common > Issue Type: Improvement >Affects Versions: 3.2.2, 2.10.1, 3.3.1 >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Major > Fix For: 3.4.0 > > Attachments: HADOOP-17857.001.patch, HADOOP-17857.002.patch > > > In a secure cluster, it is possible to configure the services to allow a > super-user to proxy to a regular user and perform actions on behalf of the > proxied user (see [Proxy user - Superusers Acting On Behalf Of Other > Users|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html]). > This is useful for automating server access for multiple different users in a > multi-tenant cluster. For example, this can be used by a super user > submitting jobs to a YARN queue, accessing HDFS files, scheduling Oozie > workflows, etc, which will then execute the service as the proxied user. > Usually when these services check ACLs to determine if the user has access to > the requested resources, the service only needs to check the ACLs for the > proxied user. However, it is sometimes desirable to allow the proxied user to > have access to the resources when only the real user has open ACLs. > For instance, let's say the user {{adm}} is the only user with submit ACLs to > the {{dataload}} queue, and the {{adm}} user wants to submit apps to the > {{dataload}} queue on behalf of users {{headless1}} and {{headless2}}. In > addition, we want to be able to bill {{headless1}} and {{headless2}} > separately for the YARN resources used in the {{dataload}} queue. In order to > do this, the apps need to run in the {{dataload}} queue as the respective > headless users. We could open up the ACLs to the {{dataload}} queue to allow > {{headless1}} and {{headless2}} to submit apps. But this would allow those > users to submit any app to that queue, and not be limited to just the data > loading apps, and we don't trust the {{headless1}} and {{headless2}} owners > to honor that restriction. > This JIRA proposes that we define a way to set up ACLs to restrict a > resource's access to a super-user, but when the access happens, run it as > the proxied user. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-17857) Check real user ACLs in addition to proxied user ACLs
[ https://issues.apache.org/jira/browse/HADOOP-17857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17413365#comment-17413365 ] Eric Payne commented on HADOOP-17857: - [~snemeth], Thanks a lot for the review and commit! Yes, I will open a follow-up for documenting this. Good catch. > Check real user ACLs in addition to proxied user ACLs > - > > Key: HADOOP-17857 > URL: https://issues.apache.org/jira/browse/HADOOP-17857 > Project: Hadoop Common > Issue Type: Improvement >Affects Versions: 3.2.2, 2.10.1, 3.3.1 >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Major > Fix For: 3.4.0 > > Attachments: HADOOP-17857.001.patch, HADOOP-17857.002.patch > > > In a secure cluster, it is possible to configure the services to allow a > super-user to proxy to a regular user and perform actions on behalf of the > proxied user (see [Proxy user - Superusers Acting On Behalf Of Other > Users|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html]). > This is useful for automating server access for multiple different users in a > multi-tenant cluster. For example, this can be used by a super user > submitting jobs to a YARN queue, accessing HDFS files, scheduling Oozie > workflows, etc, which will then execute the service as the proxied user. > Usually when these services check ACLs to determine if the user has access to > the requested resources, the service only needs to check the ACLs for the > proxied user. However, it is sometimes desirable to allow the proxied user to > have access to the resources when only the real user has open ACLs. > For instance, let's say the user {{adm}} is the only user with submit ACLs to > the {{dataload}} queue, and the {{adm}} user wants to submit apps to the > {{dataload}} queue on behalf of users {{headless1}} and {{headless2}}. In > addition, we want to be able to bill {{headless1}} and {{headless2}} > separately for the YARN resources used in the {{dataload}} queue. In order to > do this, the apps need to run in the {{dataload}} queue as the respective > headless users. We could open up the ACLs to the {{dataload}} queue to allow > {{headless1}} and {{headless2}} to submit apps. But this would allow those > users to submit any app to that queue, and not be limited to just the data > loading apps, and we don't trust the {{headless1}} and {{headless2}} owners > to honor that restriction. > This JIRA proposes that we define a way to set up ACLs to restrict a > resource's access to a super-user, but when the access happens, run it as > the proxied user. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-17857) Check real user ACLs in addition to proxied user ACLs
[ https://issues.apache.org/jira/browse/HADOOP-17857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17404659#comment-17404659 ] Eric Payne commented on HADOOP-17857: - [~zhoukang] , [~snemeth], I believe that this JIRA is the first step to addressing the requirements outlined in YARN-9975. Can you please review the changes and let me know what you think? > Check real user ACLs in addition to proxied user ACLs > - > > Key: HADOOP-17857 > URL: https://issues.apache.org/jira/browse/HADOOP-17857 > Project: Hadoop Common > Issue Type: Improvement >Affects Versions: 3.2.2, 2.10.1, 3.3.1 >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Major > Attachments: HADOOP-17857.001.patch, HADOOP-17857.002.patch > > > In a secure cluster, it is possible to configure the services to allow a > super-user to proxy to a regular user and perform actions on behalf of the > proxied user (see [Proxy user - Superusers Acting On Behalf Of Other > Users|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html]). > This is useful for automating server access for multiple different users in a > multi-tenant cluster. For example, this can be used by a super user > submitting jobs to a YARN queue, accessing HDFS files, scheduling Oozie > workflows, etc, which will then execute the service as the proxied user. > Usually when these services check ACLs to determine if the user has access to > the requested resources, the service only needs to check the ACLs for the > proxied user. However, it is sometimes desirable to allow the proxied user to > have access to the resources when only the real user has open ACLs. > For instance, let's say the user {{adm}} is the only user with submit ACLs to > the {{dataload}} queue, and the {{adm}} user wants to submit apps to the > {{dataload}} queue on behalf of users {{headless1}} and {{headless2}}. In > addition, we want to be able to bill {{headless1}} and {{headless2}} > separately for the YARN resources used in the {{dataload}} queue. In order to > do this, the apps need to run in the {{dataload}} queue as the respective > headless users. We could open up the ACLs to the {{dataload}} queue to allow > {{headless1}} and {{headless2}} to submit apps. But this would allow those > users to submit any app to that queue, and not be limited to just the data > loading apps, and we don't trust the {{headless1}} and {{headless2}} owners > to honor that restriction. > This JIRA proposes that we define a way to set up ACLs to restrict a > resource's access to a super-user, but when the access happens, run it as > the proxied user. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-17857) Check real user ACLs in addition to proxied user ACLs
[ https://issues.apache.org/jira/browse/HADOOP-17857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17404656#comment-17404656 ] Eric Payne commented on HADOOP-17857: - Thanks [~ahussein] for the review and comments! I have attached v2 of the patch. > Check real user ACLs in addition to proxied user ACLs > - > > Key: HADOOP-17857 > URL: https://issues.apache.org/jira/browse/HADOOP-17857 > Project: Hadoop Common > Issue Type: Improvement >Affects Versions: 3.2.2, 2.10.1, 3.3.1 >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Major > Attachments: HADOOP-17857.001.patch, HADOOP-17857.002.patch > > > In a secure cluster, it is possible to configure the services to allow a > super-user to proxy to a regular user and perform actions on behalf of the > proxied user (see [Proxy user - Superusers Acting On Behalf Of Other > Users|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html]). > This is useful for automating server access for multiple different users in a > multi-tenant cluster. For example, this can be used by a super user > submitting jobs to a YARN queue, accessing HDFS files, scheduling Oozie > workflows, etc, which will then execute the service as the proxied user. > Usually when these services check ACLs to determine if the user has access to > the requested resources, the service only needs to check the ACLs for the > proxied user. However, it is sometimes desirable to allow the proxied user to > have access to the resources when only the real user has open ACLs. > For instance, let's say the user {{adm}} is the only user with submit ACLs to > the {{dataload}} queue, and the {{adm}} user wants to submit apps to the > {{dataload}} queue on behalf of users {{headless1}} and {{headless2}}. In > addition, we want to be able to bill {{headless1}} and {{headless2}} > separately for the YARN resources used in the {{dataload}} queue. In order to > do this, the apps need to run in the {{dataload}} queue as the respective > headless users. We could open up the ACLs to the {{dataload}} queue to allow > {{headless1}} and {{headless2}} to submit apps. But this would allow those > users to submit any app to that queue, and not be limited to just the data > loading apps, and we don't trust the {{headless1}} and {{headless2}} owners > to honor that restriction. > This JIRA proposes that we define a way to set up ACLs to restrict a > resource's access to a super-user, but when the access happens, run it as > the proxied user. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-17857) Check real user ACLs in addition to proxied user ACLs
[ https://issues.apache.org/jira/browse/HADOOP-17857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated HADOOP-17857: Attachment: HADOOP-17857.002.patch > Check real user ACLs in addition to proxied user ACLs > - > > Key: HADOOP-17857 > URL: https://issues.apache.org/jira/browse/HADOOP-17857 > Project: Hadoop Common > Issue Type: Improvement >Affects Versions: 3.2.2, 2.10.1, 3.3.1 >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Major > Attachments: HADOOP-17857.001.patch, HADOOP-17857.002.patch > > > In a secure cluster, it is possible to configure the services to allow a > super-user to proxy to a regular user and perform actions on behalf of the > proxied user (see [Proxy user - Superusers Acting On Behalf Of Other > Users|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html]). > This is useful for automating server access for multiple different users in a > multi-tenant cluster. For example, this can be used by a super user > submitting jobs to a YARN queue, accessing HDFS files, scheduling Oozie > workflows, etc, which will then execute the service as the proxied user. > Usually when these services check ACLs to determine if the user has access to > the requested resources, the service only needs to check the ACLs for the > proxied user. However, it is sometimes desirable to allow the proxied user to > have access to the resources when only the real user has open ACLs. > For instance, let's say the user {{adm}} is the only user with submit ACLs to > the {{dataload}} queue, and the {{adm}} user wants to submit apps to the > {{dataload}} queue on behalf of users {{headless1}} and {{headless2}}. In > addition, we want to be able to bill {{headless1}} and {{headless2}} > separately for the YARN resources used in the {{dataload}} queue. In order to > do this, the apps need to run in the {{dataload}} queue as the respective > headless users. We could open up the ACLs to the {{dataload}} queue to allow > {{headless1}} and {{headless2}} to submit apps. But this would allow those > users to submit any app to that queue, and not be limited to just the data > loading apps, and we don't trust the {{headless1}} and {{headless2}} owners > to honor that restriction. > This JIRA proposes that we define a way to set up ACLs to restrict a > resource's access to a super-user, but when the access happens, run it as > the proxied user. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Assigned] (HADOOP-17857) Check real user ACLs in addition to proxied user ACLs
[ https://issues.apache.org/jira/browse/HADOOP-17857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne reassigned HADOOP-17857: --- Assignee: Eric Payne > Check real user ACLs in addition to proxied user ACLs > - > > Key: HADOOP-17857 > URL: https://issues.apache.org/jira/browse/HADOOP-17857 > Project: Hadoop Common > Issue Type: Improvement >Affects Versions: 3.2.2, 2.10.1, 3.3.1 >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Major > Attachments: HADOOP-17857.001.patch > > > In a secure cluster, it is possible to configure the services to allow a > super-user to proxy to a regular user and perform actions on behalf of the > proxied user (see [Proxy user - Superusers Acting On Behalf Of Other > Users|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html]). > This is useful for automating server access for multiple different users in a > multi-tenant cluster. For example, this can be used by a super user > submitting jobs to a YARN queue, accessing HDFS files, scheduling Oozie > workflows, etc, which will then execute the service as the proxied user. > Usually when these services check ACLs to determine if the user has access to > the requested resources, the service only needs to check the ACLs for the > proxied user. However, it is sometimes desirable to allow the proxied user to > have access to the resources when only the real user has open ACLs. > For instance, let's say the user {{adm}} is the only user with submit ACLs to > the {{dataload}} queue, and the {{adm}} user wants to submit apps to the > {{dataload}} queue on behalf of users {{headless1}} and {{headless2}}. In > addition, we want to be able to bill {{headless1}} and {{headless2}} > separately for the YARN resources used in the {{dataload}} queue. In order to > do this, the apps need to run in the {{dataload}} queue as the respective > headless users. We could open up the ACLs to the {{dataload}} queue to allow > {{headless1}} and {{headless2}} to submit apps. But this would allow those > users to submit any app to that queue, and not be limited to just the data > loading apps, and we don't trust the {{headless1}} and {{headless2}} owners > to honor that restriction. > This JIRA proposes that we define a way to set up ACLs to restrict a > resource's access to a super-user, but when the access happens, run it as > the proxied user. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-17857) Check real user ACLs in addition to proxied user ACLs
[ https://issues.apache.org/jira/browse/HADOOP-17857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated HADOOP-17857: Attachment: HADOOP-17857.001.patch > Check real user ACLs in addition to proxied user ACLs > - > > Key: HADOOP-17857 > URL: https://issues.apache.org/jira/browse/HADOOP-17857 > Project: Hadoop Common > Issue Type: Improvement >Affects Versions: 3.2.2, 2.10.1, 3.3.1 >Reporter: Eric Payne >Priority: Major > Attachments: HADOOP-17857.001.patch > > > In a secure cluster, it is possible to configure the services to allow a > super-user to proxy to a regular user and perform actions on behalf of the > proxied user (see [Proxy user - Superusers Acting On Behalf Of Other > Users|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html]). > This is useful for automating server access for multiple different users in a > multi-tenant cluster. For example, this can be used by a super user > submitting jobs to a YARN queue, accessing HDFS files, scheduling Oozie > workflows, etc, which will then execute the service as the proxied user. > Usually when these services check ACLs to determine if the user has access to > the requested resources, the service only needs to check the ACLs for the > proxied user. However, it is sometimes desirable to allow the proxied user to > have access to the resources when only the real user has open ACLs. > For instance, let's say the user {{adm}} is the only user with submit ACLs to > the {{dataload}} queue, and the {{adm}} user wants to submit apps to the > {{dataload}} queue on behalf of users {{headless1}} and {{headless2}}. In > addition, we want to be able to bill {{headless1}} and {{headless2}} > separately for the YARN resources used in the {{dataload}} queue. In order to > do this, the apps need to run in the {{dataload}} queue as the respective > headless users. We could open up the ACLs to the {{dataload}} queue to allow > {{headless1}} and {{headless2}} to submit apps. But this would allow those > users to submit any app to that queue, and not be limited to just the data > loading apps, and we don't trust the {{headless1}} and {{headless2}} owners > to honor that restriction. > This JIRA proposes that we define a way to set up ACLs to restrict a > resource's access to a super-user, but when the access happens, run it as > the proxied user. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-17857) Check real user ACLs in addition to proxied user ACLs
[ https://issues.apache.org/jira/browse/HADOOP-17857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated HADOOP-17857: Status: Patch Available (was: Open) > Check real user ACLs in addition to proxied user ACLs > - > > Key: HADOOP-17857 > URL: https://issues.apache.org/jira/browse/HADOOP-17857 > Project: Hadoop Common > Issue Type: Improvement >Affects Versions: 3.3.1, 2.10.1, 3.2.2 >Reporter: Eric Payne >Priority: Major > Attachments: HADOOP-17857.001.patch > > > In a secure cluster, it is possible to configure the services to allow a > super-user to proxy to a regular user and perform actions on behalf of the > proxied user (see [Proxy user - Superusers Acting On Behalf Of Other > Users|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html]). > This is useful for automating server access for multiple different users in a > multi-tenant cluster. For example, this can be used by a super user > submitting jobs to a YARN queue, accessing HDFS files, scheduling Oozie > workflows, etc, which will then execute the service as the proxied user. > Usually when these services check ACLs to determine if the user has access to > the requested resources, the service only needs to check the ACLs for the > proxied user. However, it is sometimes desirable to allow the proxied user to > have access to the resources when only the real user has open ACLs. > For instance, let's say the user {{adm}} is the only user with submit ACLs to > the {{dataload}} queue, and the {{adm}} user wants to submit apps to the > {{dataload}} queue on behalf of users {{headless1}} and {{headless2}}. In > addition, we want to be able to bill {{headless1}} and {{headless2}} > separately for the YARN resources used in the {{dataload}} queue. In order to > do this, the apps need to run in the {{dataload}} queue as the respective > headless users. We could open up the ACLs to the {{dataload}} queue to allow > {{headless1}} and {{headless2}} to submit apps. But this would allow those > users to submit any app to that queue, and not be limited to just the data > loading apps, and we don't trust the {{headless1}} and {{headless2}} owners > to honor that restriction. > This JIRA proposes that we define a way to set up ACLs to restrict a > resource's access to a super-user, but when the access happens, run it as > the proxied user. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-17857) Check real user ACLs in addition to proxied user ACLs
[ https://issues.apache.org/jira/browse/HADOOP-17857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17401820#comment-17401820 ] Eric Payne commented on HADOOP-17857: - I suggest that we define the ACLs so that a special character tells the AccessControlList system to check the ACLs for the real user and not those for the proxied user. Let's take and example for submitting jobs in the Capacity Scheduler into the {{dataload}} queue. The use case is to only allow {{adm}} to submit to the {{dataload}} queue, but once the apps are submitted, they run as a proxy user (like {{headless1}}): The following is the current syntax, and it only allows the {{adm}} user, as itself, to submit jobs to the {{dataload}} queue. {code:xml} yarn.scheduler.capacity.root.dataload.acl_submit_applications adm {code} With this syntax, if the {{adm}} user proxies to {{headless1}} and submits the job, the Capacity Scheduler will reject the submission because {{headless1}} does not have submit ACL permissions. *PROPOSED CHANGES:* - Add a tilde (~) to the beginning of the {{adm}} user in the value section of the property. In the above example, note the additon of the tilde (~): {code:xml} yarn.scheduler.capacity.root.dataload.acl_submit_applications ~adm {code} - With the tilde (~}, any proxied user submitted by the {{adm}} user will be allowed to run in the {{dataload}} queue. - That same proxied user will _not_ be allowed to submit by themselves if they are not first proxied by {{adm}}. - NOTE: with this syntax, {{adm}} will not be able to directly submit as itself to the {{dataload}} queue. In order to both submit as {{adm}} and also allow an {{adm}}-proxied user to submit to the {{dataload}} queue, both {{~adm}} and {{adm}} must be specified, as follows: {code:xml} yarn.scheduler.capacity.root.dataload.acl_submit_applications ~adm,adm {code} This example could be extended to other ACL properties in other Hadoop systems. We have been running with this change in production for over a year now, and it works well. > Check real user ACLs in addition to proxied user ACLs > - > > Key: HADOOP-17857 > URL: https://issues.apache.org/jira/browse/HADOOP-17857 > Project: Hadoop Common > Issue Type: Improvement >Affects Versions: 3.2.2, 2.10.1, 3.3.1 >Reporter: Eric Payne >Priority: Major > > In a secure cluster, it is possible to configure the services to allow a > super-user to proxy to a regular user and perform actions on behalf of the > proxied user (see [Proxy user - Superusers Acting On Behalf Of Other > Users|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html]). > This is useful for automating server access for multiple different users in a > multi-tenant cluster. For example, this can be used by a super user > submitting jobs to a YARN queue, accessing HDFS files, scheduling Oozie > workflows, etc, which will then execute the service as the proxied user. > Usually when these services check ACLs to determine if the user has access to > the requested resources, the service only needs to check the ACLs for the > proxied user. However, it is sometimes desirable to allow the proxied user to > have access to the resources when only the real user has open ACLs. > For instance, let's say the user {{adm}} is the only user with submit ACLs to > the {{dataload}} queue, and the {{adm}} user wants to submit apps to the > {{dataload}} queue on behalf of users {{headless1}} and {{headless2}}. In > addition, we want to be able to bill {{headless1}} and {{headless2}} > separately for the YARN resources used in the {{dataload}} queue. In order to > do this, the apps need to run in the {{dataload}} queue as the respective > headless users. We could open up the ACLs to the {{dataload}} queue to allow > {{headless1}} and {{headless2}} to submit apps. But this would allow those > users to submit any app to that queue, and not be limited to just the data > loading apps, and we don't trust the {{headless1}} and {{headless2}} owners > to honor that restriction. > This JIRA proposes that we define a way to set up ACLs to restrict a > resource's access to a super-user, but when the access happens, run it as > the proxied user. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-17857) Check real user ACLs in addition to proxied user ACLs
Eric Payne created HADOOP-17857: --- Summary: Check real user ACLs in addition to proxied user ACLs Key: HADOOP-17857 URL: https://issues.apache.org/jira/browse/HADOOP-17857 Project: Hadoop Common Issue Type: Improvement Affects Versions: 3.3.1, 2.10.1, 3.2.2 Reporter: Eric Payne In a secure cluster, it is possible to configure the services to allow a super-user to proxy to a regular user and perform actions on behalf of the proxied user (see [Proxy user - Superusers Acting On Behalf Of Other Users|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html]). This is useful for automating server access for multiple different users in a multi-tenant cluster. For example, this can be used by a super user submitting jobs to a YARN queue, accessing HDFS files, scheduling Oozie workflows, etc, which will then execute the service as the proxied user. Usually when these services check ACLs to determine if the user has access to the requested resources, the service only needs to check the ACLs for the proxied user. However, it is sometimes desirable to allow the proxied user to have access to the resources when only the real user has open ACLs. For instance, let's say the user {{adm}} is the only user with submit ACLs to the {{dataload}} queue, and the {{adm}} user wants to submit apps to the {{dataload}} queue on behalf of users {{headless1}} and {{headless2}}. In addition, we want to be able to bill {{headless1}} and {{headless2}} separately for the YARN resources used in the {{dataload}} queue. In order to do this, the apps need to run in the {{dataload}} queue as the respective headless users. We could open up the ACLs to the {{dataload}} queue to allow {{headless1}} and {{headless2}} to submit apps. But this would allow those users to submit any app to that queue, and not be limited to just the data loading apps, and we don't trust the {{headless1}} and {{headless2}} owners to honor that restriction. This JIRA proposes that we define a way to set up ACLs to restrict a resource's access to a super-user, but when the access happens, run it as the proxied user. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-17346) Fair call queue is defeated by abusive service principals
[ https://issues.apache.org/jira/browse/HADOOP-17346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne resolved HADOOP-17346. - Fix Version/s: 3.2.3 3.1.5 3.4.0 3.3.1 Resolution: Fixed Thanks [~ahussein]. I committed to branch-3.1, branch-3.2, branch-3.3, and trunk. > Fair call queue is defeated by abusive service principals > - > > Key: HADOOP-17346 > URL: https://issues.apache.org/jira/browse/HADOOP-17346 > Project: Hadoop Common > Issue Type: Bug > Components: common, ipc >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0, 3.1.5, 3.2.3 > > Attachments: HADOOP-17346-branch-3.1.001.patch, > HADOOP-17346.branch-3.2.001.patch, HADOOP-17346.branch-3.3.001.patch > > Time Spent: 50m > Remaining Estimate: 0h > > [~daryn] reported that the FCQ prioritizes based on the full kerberos > principal (ie. "user/host@realm") rather than short name (ie. "user") to > prevent service principals like the DNs and NMs being de-prioritized since > service principals are expected to be well behaved. Notably the DNs > contribute a significant but important load so the intent is not to > de-prioritize all DNs because their sum total load is high relative to users. > This has the unfortunate side effect of allowing misbehaving & non-critical > service principals to abuse the FCQ. The gstorm/* principals are a prime > example. Each server is spamming opens as fast as possible which ensures > that none of the gstorm servers can be de-prioritized because each principal > is a fraction of the total load from all principals. > The secondary and more devasting problem is other abusive non-service > principals cannot be effectively de-prioritized. The sum total of all gstorm > load prevents other principals from surpassing the priority thresholds. > Principals stay in the highest priority queues which allows the abusive > principals to overflow the entire call queue for extended periods of time. > Notably it prevents the FCQ from moderating the heavy create loads from p_gup > @ DB which cause significant performance degradation. > Prioritization should be based on short name with configurable exemptions for > services like the DN/NM. > [~daryn] suggested a solution that we applied on our clusters. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-17346) Fair call queue is defeated by abusive service principals
[ https://issues.apache.org/jira/browse/HADOOP-17346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237704#comment-17237704 ] Eric Payne commented on HADOOP-17346: - bq. If there wasn't a specific reason for doing this, I would suggest we reverse them in branch-3.1 to match the other releases. BTW, [~ahussein], you don't need to put up another patch. I'll do this as part of the commit. > Fair call queue is defeated by abusive service principals > - > > Key: HADOOP-17346 > URL: https://issues.apache.org/jira/browse/HADOOP-17346 > Project: Hadoop Common > Issue Type: Bug > Components: common, ipc >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Labels: pull-request-available > Attachments: HADOOP-17346-branch-3.1.001.patch, > HADOOP-17346.branch-3.2.001.patch, HADOOP-17346.branch-3.3.001.patch > > Time Spent: 50m > Remaining Estimate: 0h > > [~daryn] reported that the FCQ prioritizes based on the full kerberos > principal (ie. "user/host@realm") rather than short name (ie. "user") to > prevent service principals like the DNs and NMs being de-prioritized since > service principals are expected to be well behaved. Notably the DNs > contribute a significant but important load so the intent is not to > de-prioritize all DNs because their sum total load is high relative to users. > This has the unfortunate side effect of allowing misbehaving & non-critical > service principals to abuse the FCQ. The gstorm/* principals are a prime > example. Each server is spamming opens as fast as possible which ensures > that none of the gstorm servers can be de-prioritized because each principal > is a fraction of the total load from all principals. > The secondary and more devasting problem is other abusive non-service > principals cannot be effectively de-prioritized. The sum total of all gstorm > load prevents other principals from surpassing the priority thresholds. > Principals stay in the highest priority queues which allows the abusive > principals to overflow the entire call queue for extended periods of time. > Notably it prevents the FCQ from moderating the heavy create loads from p_gup > @ DB which cause significant performance degradation. > Prioritization should be based on short name with configurable exemptions for > services like the DN/NM. > [~daryn] suggested a solution that we applied on our clusters. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-17346) Fair call queue is defeated by abusive service principals
[ https://issues.apache.org/jira/browse/HADOOP-17346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237703#comment-17237703 ] Eric Payne commented on HADOOP-17346: - Thanks [~ahussein] for the good work on this JIRA. I have committed to trunk, branch-3.3, and branch-3.2. I have a question about the branch-3.1 patch. It seems that in branch 3.1, the arguments for {{DecayRpcScheduler#computePriorityLevel}} are reversed: {panel:title=branch3.1} + private int computePriorityLevel(Object identity, long cost) { {panel} {panel:title=branch3.2} + private int computePriorityLevel(long cost, Object identity) { {panel} If there wasn't a specific reason for doing this, I would suggest we reverse them in branch-3.1 to match the other releases. > Fair call queue is defeated by abusive service principals > - > > Key: HADOOP-17346 > URL: https://issues.apache.org/jira/browse/HADOOP-17346 > Project: Hadoop Common > Issue Type: Bug > Components: common, ipc >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Labels: pull-request-available > Attachments: HADOOP-17346-branch-3.1.001.patch, > HADOOP-17346.branch-3.2.001.patch, HADOOP-17346.branch-3.3.001.patch > > Time Spent: 50m > Remaining Estimate: 0h > > [~daryn] reported that the FCQ prioritizes based on the full kerberos > principal (ie. "user/host@realm") rather than short name (ie. "user") to > prevent service principals like the DNs and NMs being de-prioritized since > service principals are expected to be well behaved. Notably the DNs > contribute a significant but important load so the intent is not to > de-prioritize all DNs because their sum total load is high relative to users. > This has the unfortunate side effect of allowing misbehaving & non-critical > service principals to abuse the FCQ. The gstorm/* principals are a prime > example. Each server is spamming opens as fast as possible which ensures > that none of the gstorm servers can be de-prioritized because each principal > is a fraction of the total load from all principals. > The secondary and more devasting problem is other abusive non-service > principals cannot be effectively de-prioritized. The sum total of all gstorm > load prevents other principals from surpassing the priority thresholds. > Principals stay in the highest priority queues which allows the abusive > principals to overflow the entire call queue for extended periods of time. > Notably it prevents the FCQ from moderating the heavy create loads from p_gup > @ DB which cause significant performance degradation. > Prioritization should be based on short name with configurable exemptions for > services like the DN/NM. > [~daryn] suggested a solution that we applied on our clusters. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-17346) Fair call queue is defeated by abusive service principals
[ https://issues.apache.org/jira/browse/HADOOP-17346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17236433#comment-17236433 ] Eric Payne commented on HADOOP-17346: - The patch for branch-3.3 changes the signature for {{DecayRpcScheduler#computePriorityLevel}} to add an identity object: {code:java} - private int computePriorityLevel(long cost) { + private int computePriorityLevel(long cost, Object identity) { {code} This signature change was done in trunk as part of HADOOP-17165. Since this fix needs that same signature change, we are faced with the following choices: 1) Backport HADOOP-17165 to branch-3.3 2) Make the same change in this patch. I don't like to implement solution 2 because it makes changes hard to track. However, I don't know if we want the feature from HADOOP-17165 backported to branch-3.3. [~tasanuma], is the service-user feature something we want backported to earlier branches? We would probably want it pulled back into at least branch-3.2. It backports cleanly to branch-3.3, but not quite cleanly to 3.2. > Fair call queue is defeated by abusive service principals > - > > Key: HADOOP-17346 > URL: https://issues.apache.org/jira/browse/HADOOP-17346 > Project: Hadoop Common > Issue Type: Bug > Components: common, ipc >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Labels: pull-request-available > Attachments: HADOOP-17346-branch-3.1.001.patch, > HADOOP-17346.branch-3.2.001.patch, HADOOP-17346.branch-3.3.001.patch > > Time Spent: 50m > Remaining Estimate: 0h > > [~daryn] reported that the FCQ prioritizes based on the full kerberos > principal (ie. "user/host@realm") rather than short name (ie. "user") to > prevent service principals like the DNs and NMs being de-prioritized since > service principals are expected to be well behaved. Notably the DNs > contribute a significant but important load so the intent is not to > de-prioritize all DNs because their sum total load is high relative to users. > This has the unfortunate side effect of allowing misbehaving & non-critical > service principals to abuse the FCQ. The gstorm/* principals are a prime > example. Each server is spamming opens as fast as possible which ensures > that none of the gstorm servers can be de-prioritized because each principal > is a fraction of the total load from all principals. > The secondary and more devasting problem is other abusive non-service > principals cannot be effectively de-prioritized. The sum total of all gstorm > load prevents other principals from surpassing the priority thresholds. > Principals stay in the highest priority queues which allows the abusive > principals to overflow the entire call queue for extended periods of time. > Notably it prevents the FCQ from moderating the heavy create loads from p_gup > @ DB which cause significant performance degradation. > Prioritization should be based on short name with configurable exemptions for > services like the DN/NM. > [~daryn] suggested a solution that we applied on our clusters. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-17381) Ability to specify separate compression settings when intermediate and final output use the same codec
[ https://issues.apache.org/jira/browse/HADOOP-17381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17233987#comment-17233987 ] Eric Payne commented on HADOOP-17381: - One hacky way that might work and be extensible to all codecs is to create an explicit codec for intermediate outputs, like IntermediateOutputCodec, that has its own config namespace with a prefix, like io.compression.codec.intermediate. Users can then configure the same codec being used for the output codec but as a sub-codec to IntermediateOutputCodec. IntermediateOutputCodec would gather all configs with the io.compression.codec.intermediate prefix, strip the prefix, then reapply to a new Configuration that becomes the config object for the sub-codec, allowing it to have different settings than the output codec even though the same codec type is being used underneath. > Ability to specify separate compression settings when intermediate and final > output use the same codec > -- > > Key: HADOOP-17381 > URL: https://issues.apache.org/jira/browse/HADOOP-17381 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Eric Payne >Priority: Major > > The ZStandard codec may become a codec that users will want to use for both > intermediate data and for final output data yet specify different compression > levels for those use cases. > It would be nice if there was a way we could create a "meta codec" like > IntermediateCodec that used conf prefix techniques, like Oozie does with > oozie.launcher for the Oozie launcher configs, to create a custom config > namespace of sorts for setting arbitrary codec settings specific to the > intermediate codec separate from the final output codec even if the same > underlying codec is used for both. > However Codecs don't allow a configuration to be passed when obtaining a > codec stream, and I think we would have to bypass the CodecPool entirely to > be able to pass a custom conf to an arbitrary Codec. > Another approach is to skip trying to generalize the solution and > specifically focus on ZStandard. It would be easy to create a wrapper codec > around the existing ZStandardCompressor and ZStandardDecompressor which take > the relevant parameters directly in their constructors. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-17381) Ability to specify separate compression settings when intermediate and final output use the same codec
Eric Payne created HADOOP-17381: --- Summary: Ability to specify separate compression settings when intermediate and final output use the same codec Key: HADOOP-17381 URL: https://issues.apache.org/jira/browse/HADOOP-17381 Project: Hadoop Common Issue Type: Improvement Reporter: Eric Payne The ZStandard codec may become a codec that users will want to use for both intermediate data and for final output data yet specify different compression levels for those use cases. It would be nice if there was a way we could create a "meta codec" like IntermediateCodec that used conf prefix techniques, like Oozie does with oozie.launcher for the Oozie launcher configs, to create a custom config namespace of sorts for setting arbitrary codec settings specific to the intermediate codec separate from the final output codec even if the same underlying codec is used for both. However Codecs don't allow a configuration to be passed when obtaining a codec stream, and I think we would have to bypass the CodecPool entirely to be able to pass a custom conf to an arbitrary Codec. Another approach is to skip trying to generalize the solution and specifically focus on ZStandard. It would be easy to create a wrapper codec around the existing ZStandardCompressor and ZStandardDecompressor which take the relevant parameters directly in their constructors. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-17346) Fair call queue is defeated by abusive service principals
[ https://issues.apache.org/jira/browse/HADOOP-17346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17230881#comment-17230881 ] Eric Payne commented on HADOOP-17346: - [~ahussein], I merged the PR to trunk. However, it does not backport quite cleanly to earlier 3.x branches. I think we want this in 3.3, 3.2, and 3.1. Please provide patches for those if we want this pulled back. > Fair call queue is defeated by abusive service principals > - > > Key: HADOOP-17346 > URL: https://issues.apache.org/jira/browse/HADOOP-17346 > Project: Hadoop Common > Issue Type: Bug > Components: common, ipc >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > [~daryn] reported that the FCQ prioritizes based on the full kerberos > principal (ie. "user/host@realm") rather than short name (ie. "user") to > prevent service principals like the DNs and NMs being de-prioritized since > service principals are expected to be well behaved. Notably the DNs > contribute a significant but important load so the intent is not to > de-prioritize all DNs because their sum total load is high relative to users. > This has the unfortunate side effect of allowing misbehaving & non-critical > service principals to abuse the FCQ. The gstorm/* principals are a prime > example. Each server is spamming opens as fast as possible which ensures > that none of the gstorm servers can be de-prioritized because each principal > is a fraction of the total load from all principals. > The secondary and more devasting problem is other abusive non-service > principals cannot be effectively de-prioritized. The sum total of all gstorm > load prevents other principals from surpassing the priority thresholds. > Principals stay in the highest priority queues which allows the abusive > principals to overflow the entire call queue for extended periods of time. > Notably it prevents the FCQ from moderating the heavy create loads from p_gup > @ DB which cause significant performance degradation. > Prioritization should be based on short name with configurable exemptions for > services like the DN/NM. > [~daryn] suggested a solution that we applied on our clusters. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-17346) Fair call queue is defeated by abusive service principals
[ https://issues.apache.org/jira/browse/HADOOP-17346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17230191#comment-17230191 ] Eric Payne commented on HADOOP-17346: - [~ahussein], I have verified that these changes are consistent with what we have been running internally. +1 > Fair call queue is defeated by abusive service principals > - > > Key: HADOOP-17346 > URL: https://issues.apache.org/jira/browse/HADOOP-17346 > Project: Hadoop Common > Issue Type: Bug > Components: common, ipc >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > [~daryn] reported that the FCQ prioritizes based on the full kerberos > principal (ie. "user/host@realm") rather than short name (ie. "user") to > prevent service principals like the DNs and NMs being de-prioritized since > service principals are expected to be well behaved. Notably the DNs > contribute a significant but important load so the intent is not to > de-prioritize all DNs because their sum total load is high relative to users. > This has the unfortunate side effect of allowing misbehaving & non-critical > service principals to abuse the FCQ. The gstorm/* principals are a prime > example. Each server is spamming opens as fast as possible which ensures > that none of the gstorm servers can be de-prioritized because each principal > is a fraction of the total load from all principals. > The secondary and more devasting problem is other abusive non-service > principals cannot be effectively de-prioritized. The sum total of all gstorm > load prevents other principals from surpassing the priority thresholds. > Principals stay in the highest priority queues which allows the abusive > principals to overflow the entire call queue for extended periods of time. > Notably it prevents the FCQ from moderating the heavy create loads from p_gup > @ DB which cause significant performance degradation. > Prioritization should be based on short name with configurable exemptions for > services like the DN/NM. > [~daryn] suggested a solution that we applied on our clusters. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-17373) hadoop-client-integration-tests doesn't work when building with skipShade
[ https://issues.apache.org/jira/browse/HADOOP-17373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne resolved HADOOP-17373. - Fix Version/s: 3.4.0 Resolution: Fixed > hadoop-client-integration-tests doesn't work when building with skipShade > - > > Key: HADOOP-17373 > URL: https://issues.apache.org/jira/browse/HADOOP-17373 > Project: Hadoop Common > Issue Type: Bug >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Compiling with skipShade: > {code} > mvn clean install -Pdist -DskipShade -DskipTests -Dtar -Danimal.sniffer.skip > -Dmaven.javadoc.skip=true > {code} > fails with > {code} > [ERROR] > /Users/chao/git/hadoop/hadoop-client-modules/hadoop-client-integration-tests/src/test/java/org/apache/hadoop/example/ITUseMiniCluster.java:[47,37] > package org.apache.hadoop.yarn.server does not exist > [ERROR] > /Users/chao/git/hadoop/hadoop-client-modules/hadoop-client-integration-tests/src/test/java/org/apache/hadoop/example/ITUseMiniCluster.java:[59,11] > cannot find symbol > [ERROR] symbol: class MiniYARNCluster > [ERROR] location: class org.apache.hadoop.example.ITUseMiniCluster > [ERROR] > /Users/chao/git/hadoop/hadoop-client-modules/hadoop-client-integration-tests/src/test/java/org/apache/hadoop/example/ITUseMiniCluster.java:[82,23] > cannot find symbol > [ERROR] symbol: class MiniYARNCluster > [ERROR] location: class org.apache.hadoop.example.ITUseMiniCluster > [ERROR] -> [Help 1] > [ERROR] > [ERROR] To see the full stack trace of the errors, re-run Maven with the -e > switch. > [ERROR] Re-run Maven using the -X switch to enable full debug logging. > [ERROR] > [ERROR] For more information about the errors and possible solutions, please > read the following articles: > [ERROR] [Help 1] > http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException > [ERROR] > [ERROR] After correcting the problems, you can resume the build with the > command > [ERROR] mvn -rf :hadoop-client-integration-tests > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-17373) hadoop-client-integration-tests doesn't work when building with skipShade
[ https://issues.apache.org/jira/browse/HADOOP-17373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17230070#comment-17230070 ] Eric Payne commented on HADOOP-17373: - +1. This fixes non-shaded compilation. I'll commit this shortly. > hadoop-client-integration-tests doesn't work when building with skipShade > - > > Key: HADOOP-17373 > URL: https://issues.apache.org/jira/browse/HADOOP-17373 > Project: Hadoop Common > Issue Type: Bug >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Compiling with skipShade: > {code} > mvn clean install -Pdist -DskipShade -DskipTests -Dtar -Danimal.sniffer.skip > -Dmaven.javadoc.skip=true > {code} > fails with > {code} > [ERROR] > /Users/chao/git/hadoop/hadoop-client-modules/hadoop-client-integration-tests/src/test/java/org/apache/hadoop/example/ITUseMiniCluster.java:[47,37] > package org.apache.hadoop.yarn.server does not exist > [ERROR] > /Users/chao/git/hadoop/hadoop-client-modules/hadoop-client-integration-tests/src/test/java/org/apache/hadoop/example/ITUseMiniCluster.java:[59,11] > cannot find symbol > [ERROR] symbol: class MiniYARNCluster > [ERROR] location: class org.apache.hadoop.example.ITUseMiniCluster > [ERROR] > /Users/chao/git/hadoop/hadoop-client-modules/hadoop-client-integration-tests/src/test/java/org/apache/hadoop/example/ITUseMiniCluster.java:[82,23] > cannot find symbol > [ERROR] symbol: class MiniYARNCluster > [ERROR] location: class org.apache.hadoop.example.ITUseMiniCluster > [ERROR] -> [Help 1] > [ERROR] > [ERROR] To see the full stack trace of the errors, re-run Maven with the -e > switch. > [ERROR] Re-run Maven using the -X switch to enable full debug logging. > [ERROR] > [ERROR] For more information about the errors and possible solutions, please > read the following articles: > [ERROR] [Help 1] > http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException > [ERROR] > [ERROR] After correcting the problems, you can resume the build with the > command > [ERROR] mvn -rf :hadoop-client-integration-tests > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-17373) hadoop-client-integration-tests doesn't work when building with skipShade
[ https://issues.apache.org/jira/browse/HADOOP-17373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17230015#comment-17230015 ] Eric Payne commented on HADOOP-17373: - Thanks [~csun] for providing this fix. I'll compile with the changes and test it out. > hadoop-client-integration-tests doesn't work when building with skipShade > - > > Key: HADOOP-17373 > URL: https://issues.apache.org/jira/browse/HADOOP-17373 > Project: Hadoop Common > Issue Type: Bug >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Compiling with skipShade: > {code} > mvn clean install -Pdist -DskipShade -DskipTests -Dtar -Danimal.sniffer.skip > -Dmaven.javadoc.skip=true > {code} > fails with > {code} > [ERROR] > /Users/chao/git/hadoop/hadoop-client-modules/hadoop-client-integration-tests/src/test/java/org/apache/hadoop/example/ITUseMiniCluster.java:[47,37] > package org.apache.hadoop.yarn.server does not exist > [ERROR] > /Users/chao/git/hadoop/hadoop-client-modules/hadoop-client-integration-tests/src/test/java/org/apache/hadoop/example/ITUseMiniCluster.java:[59,11] > cannot find symbol > [ERROR] symbol: class MiniYARNCluster > [ERROR] location: class org.apache.hadoop.example.ITUseMiniCluster > [ERROR] > /Users/chao/git/hadoop/hadoop-client-modules/hadoop-client-integration-tests/src/test/java/org/apache/hadoop/example/ITUseMiniCluster.java:[82,23] > cannot find symbol > [ERROR] symbol: class MiniYARNCluster > [ERROR] location: class org.apache.hadoop.example.ITUseMiniCluster > [ERROR] -> [Help 1] > [ERROR] > [ERROR] To see the full stack trace of the errors, re-run Maven with the -e > switch. > [ERROR] Re-run Maven using the -X switch to enable full debug logging. > [ERROR] > [ERROR] For more information about the errors and possible solutions, please > read the following articles: > [ERROR] [Help 1] > http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException > [ERROR] > [ERROR] After correcting the problems, you can resume the build with the > command > [ERROR] mvn -rf :hadoop-client-integration-tests > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-17324) Don't relocate org.bouncycastle in shaded client jars
[ https://issues.apache.org/jira/browse/HADOOP-17324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17229572#comment-17229572 ] Eric Payne commented on HADOOP-17324: - [~csun], [~dongjoon], and [~viirya], After this JIRA was committed (revision # 2522bf2f9b0c720eab099fef27bd3d22460ad5d0), I am seeing the following compilation errors: {noformat} [ERROR] /home/ericp/hadoop/source/current/orig/hadoop-client-modules/hadoop-client-integration-tests/src/test/java/org/apache/hadoop/example/ITUseMiniCluster.java:[59,11] cannot find symbol [ERROR] symbol: class MiniYARNCluster {noformat} I'm using the following mvn command to build: {noformat} mvn install -Pdist -DskipShade -DskipTests -Dtar -Danimal.sniffer.skip -Dmaven.javadoc.skip=true {noformat} > Don't relocate org.bouncycastle in shaded client jars > - > > Key: HADOOP-17324 > URL: https://issues.apache.org/jira/browse/HADOOP-17324 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Critical > Labels: pull-request-available > Time Spent: 3h 50m > Remaining Estimate: 0h > > When downstream apps depend on {{hadoop-client-api}}, > {{hadoop-client-runtime}} and {{hadoop-client-minicluster}}, it seems the > {{MiniYARNCluster}} could have issue because > {{org.apache.hadoop.shaded.org.bouncycastle.operator.OperatorCreationException}} > is not in any of the above jars. > {code} > Error: Caused by: sbt.ForkMain$ForkError: java.lang.ClassNotFoundException: > org.apache.hadoop.shaded.org.bouncycastle.operator.OperatorCreationException > Error:at java.net.URLClassLoader.findClass(URLClassLoader.java:382) > Error:at java.lang.ClassLoader.loadClass(ClassLoader.java:419) > Error:at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352) > Error:at java.lang.ClassLoader.loadClass(ClassLoader.java:352) > Error:at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:862) > Error:at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > Error:at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:1296) > Error:at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:339) > Error:at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > Error:at > org.apache.hadoop.yarn.server.MiniYARNCluster.initResourceManager(MiniYARNCluster.java:353) > Error:at > org.apache.hadoop.yarn.server.MiniYARNCluster.access$200(MiniYARNCluster.java:127) > Error:at > org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceInit(MiniYARNCluster.java:488) > Error:at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > Error:at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:109) > Error:at > org.apache.hadoop.yarn.server.MiniYARNCluster.serviceInit(MiniYARNCluster.java:321) > Error:at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > Error:at > org.apache.spark.deploy.yarn.BaseYarnClusterSuite.beforeAll(BaseYarnClusterSuite.scala:94) > Error:at > org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212) > Error:at > org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210) > Error:at > org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208) > Error:at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:61) > Error:at > org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:318) > Error:at > org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:513) > Error:at sbt.ForkMain$Run.lambda$runTest$1(ForkMain.java:413) > Error:at java.util.concurrent.FutureTask.run(FutureTask.java:266) > Error:at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > Error:at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > Error:at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-17346) Fair call queue is defeated by abusive service principals
[ https://issues.apache.org/jira/browse/HADOOP-17346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17228760#comment-17228760 ] Eric Payne commented on HADOOP-17346: - Thanks [~ahussein], for raising the issue and providing the patch. I am looking at the PR. > Fair call queue is defeated by abusive service principals > - > > Key: HADOOP-17346 > URL: https://issues.apache.org/jira/browse/HADOOP-17346 > Project: Hadoop Common > Issue Type: Bug > Components: common, ipc >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > [~daryn] reported that the FCQ prioritizes based on the full kerberos > principal (ie. "user/host@realm") rather than short name (ie. "user") to > prevent service principals like the DNs and NMs being de-prioritized since > service principals are expected to be well behaved. Notably the DNs > contribute a significant but important load so the intent is not to > de-prioritize all DNs because their sum total load is high relative to users. > This has the unfortunate side effect of allowing misbehaving & non-critical > service principals to abuse the FCQ. The gstorm/* principals are a prime > example. Each server is spamming opens as fast as possible which ensures > that none of the gstorm servers can be de-prioritized because each principal > is a fraction of the total load from all principals. > The secondary and more devasting problem is other abusive non-service > principals cannot be effectively de-prioritized. The sum total of all gstorm > load prevents other principals from surpassing the priority thresholds. > Principals stay in the highest priority queues which allows the abusive > principals to overflow the entire call queue for extended periods of time. > Notably it prevents the FCQ from moderating the heavy create loads from p_gup > @ DB which cause significant performance degradation. > Prioritization should be based on short name with configurable exemptions for > services like the DN/NM. > [~daryn] suggested a solution that we applied on our clusters. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16096) HADOOP-15281/distcp -Xdirect needs to use commons-logging on 3.1
[ https://issues.apache.org/jira/browse/HADOOP-16096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated HADOOP-16096: Resolution: Fixed Fix Version/s: 3.1.3 Status: Resolved (was: Patch Available) I committed to branch-3.1. > HADOOP-15281/distcp -Xdirect needs to use commons-logging on 3.1 > > > Key: HADOOP-16096 > URL: https://issues.apache.org/jira/browse/HADOOP-16096 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.1.3 >Reporter: Eric Payne >Assignee: Steve Loughran >Priority: Critical > Fix For: 3.1.3 > > Attachments: HADOOP-15281-branch-3.1-001.patch > > > HADOOP-15281 breaks the branch-3.1 build when building with java 1.8. > {code:title="RetriableFileCopyCommand.java"} > LOG.info("Copying {} to {}", source.getPath(), target); > {code} > Multiple lines have this error: > {panel:title="Build Failure"} > [ERROR] > hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/RetriableFileCopyCommand.java:[121,8] > no suitable method found for > info(java.lang.String,org.apache.hadoop.fs.Path,org.apache.hadoop.fs.Path) > [ERROR] method org.apache.commons.logging.Log.info(java.lang.Object) is > not applicable > [ERROR] (actual and formal argument lists differ in length) > [ERROR] method > org.apache.commons.logging.Log.info(java.lang.Object,java.lang.Throwable) is > not applicable > [ERROR] (actual and formal argument lists differ in length) > {panel} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16096) HADOOP-15281/distcp -Xdirect needs to use commons-logging on 3.1
[ https://issues.apache.org/jira/browse/HADOOP-16096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763164#comment-16763164 ] Eric Payne commented on HADOOP-16096: - +1 Will commit shortly > HADOOP-15281/distcp -Xdirect needs to use commons-logging on 3.1 > > > Key: HADOOP-16096 > URL: https://issues.apache.org/jira/browse/HADOOP-16096 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.1.3 >Reporter: Eric Payne >Assignee: Steve Loughran >Priority: Critical > Attachments: HADOOP-15281-branch-3.1-001.patch > > > HADOOP-15281 breaks the branch-3.1 build when building with java 1.8. > {code:title="RetriableFileCopyCommand.java"} > LOG.info("Copying {} to {}", source.getPath(), target); > {code} > Multiple lines have this error: > {panel:title="Build Failure"} > [ERROR] > hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/RetriableFileCopyCommand.java:[121,8] > no suitable method found for > info(java.lang.String,org.apache.hadoop.fs.Path,org.apache.hadoop.fs.Path) > [ERROR] method org.apache.commons.logging.Log.info(java.lang.Object) is > not applicable > [ERROR] (actual and formal argument lists differ in length) > [ERROR] method > org.apache.commons.logging.Log.info(java.lang.Object,java.lang.Throwable) is > not applicable > [ERROR] (actual and formal argument lists differ in length) > {panel} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16096) HADOOP-15281/distcp -Xdirect needs to use commons-logging on 3.1
[ https://issues.apache.org/jira/browse/HADOOP-16096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763136#comment-16763136 ] Eric Payne commented on HADOOP-16096: - Thanks a lot [~ste...@apache.org] for the quick turn around on the updated patch! I will wait for the pre-commit build, but this patch builds on 3.1 and the changes look good to me. Is there someone else you would like to look at if? > HADOOP-15281/distcp -Xdirect needs to use commons-logging on 3.1 > > > Key: HADOOP-16096 > URL: https://issues.apache.org/jira/browse/HADOOP-16096 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.1.3 >Reporter: Eric Payne >Assignee: Steve Loughran >Priority: Critical > Attachments: HADOOP-15281-branch-3.1-001.patch > > > HADOOP-15281 breaks the branch-3.1 build when building with java 1.8. > {code:title="RetriableFileCopyCommand.java"} > LOG.info("Copying {} to {}", source.getPath(), target); > {code} > Multiple lines have this error: > {panel:title="Build Failure"} > [ERROR] > hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/RetriableFileCopyCommand.java:[121,8] > no suitable method found for > info(java.lang.String,org.apache.hadoop.fs.Path,org.apache.hadoop.fs.Path) > [ERROR] method org.apache.commons.logging.Log.info(java.lang.Object) is > not applicable > [ERROR] (actual and formal argument lists differ in length) > [ERROR] method > org.apache.commons.logging.Log.info(java.lang.Object,java.lang.Throwable) is > not applicable > [ERROR] (actual and formal argument lists differ in length) > {panel} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16096) HADOOP-15281 breaks 3.1 build
[ https://issues.apache.org/jira/browse/HADOOP-16096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763061#comment-16763061 ] Eric Payne commented on HADOOP-16096: - I reverted HADOOP-15281 until a fix can be introduced. > HADOOP-15281 breaks 3.1 build > - > > Key: HADOOP-16096 > URL: https://issues.apache.org/jira/browse/HADOOP-16096 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.1.3 >Reporter: Eric Payne >Priority: Critical > > HADOOP-15281 breaks the branch-3.1 build when building with java 1.8. > {code:title="RetriableFileCopyCommand.java"} > LOG.info("Copying {} to {}", source.getPath(), target); > {code} > Multiple lines have this error: > {panel:title="Build Failure"} > [ERROR] > hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/RetriableFileCopyCommand.java:[121,8] > no suitable method found for > info(java.lang.String,org.apache.hadoop.fs.Path,org.apache.hadoop.fs.Path) > [ERROR] method org.apache.commons.logging.Log.info(java.lang.Object) is > not applicable > [ERROR] (actual and formal argument lists differ in length) > [ERROR] method > org.apache.commons.logging.Log.info(java.lang.Object,java.lang.Throwable) is > not applicable > [ERROR] (actual and formal argument lists differ in length) > {panel} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option
[ https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763045#comment-16763045 ] Eric Payne commented on HADOOP-15281: - I'm going to revert this from 3.1 so that further development can continue. > Distcp to add no-rename copy option > --- > > Key: HADOOP-15281 > URL: https://issues.apache.org/jira/browse/HADOOP-15281 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 3.0.0 >Reporter: Steve Loughran >Assignee: Andrew Olson >Priority: Major > Fix For: 3.2.1, 3.1.3 > > Attachments: HADOOP-15281-001.patch, HADOOP-15281-002.patch, > HADOOP-15281-003.patch, HADOOP-15281-004.patch > > > Currently Distcp uploads a file by two strategies > # append parts > # copy to temp then rename > option 2 executes the following sequence in {{promoteTmpToTarget}} > {code} > if ((fs.exists(target) && !fs.delete(target, false)) > || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent())) > || !fs.rename(tmpTarget, target)) { > throw new IOException("Failed to promote tmp-file:" + tmpTarget > + " to: " + target); > } > {code} > For any object store, that's a lot of HTTP requests; for S3A you are looking > at 12+ requests and an O(data) copy call. > This is not a good upload strategy for any store which manifests its output > atomically at the end of the write(). > Proposed: add a switch to write directly to the dest path, which can be > supplied as either a conf option (distcp.direct.write) or a CLI option > (-direct). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-16096) HADOOP-15281 breaks 3.1 build
Eric Payne created HADOOP-16096: --- Summary: HADOOP-15281 breaks 3.1 build Key: HADOOP-16096 URL: https://issues.apache.org/jira/browse/HADOOP-16096 Project: Hadoop Common Issue Type: Bug Affects Versions: 3.1.3 Reporter: Eric Payne HADOOP-15281 breaks the branch-3.1 build when building with java 1.8. {code:title="RetriableFileCopyCommand.java"} LOG.info("Copying {} to {}", source.getPath(), target); {code} Multiple lines have this error: {panel:title="Build Failure"} [ERROR] hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/RetriableFileCopyCommand.java:[121,8] no suitable method found for info(java.lang.String,org.apache.hadoop.fs.Path,org.apache.hadoop.fs.Path) [ERROR] method org.apache.commons.logging.Log.info(java.lang.Object) is not applicable [ERROR] (actual and formal argument lists differ in length) [ERROR] method org.apache.commons.logging.Log.info(java.lang.Object,java.lang.Throwable) is not applicable [ERROR] (actual and formal argument lists differ in length) {panel} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option
[ https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762957#comment-16762957 ] Eric Payne commented on HADOOP-15281: - [~noslowerdna] and [~ste...@apache.org], this commit breaks the branch-3.1 build when building with java 1.8. {code:title="RetriableFileCopyCommand.java"} LOG.info("Copying {} to {}", source.getPath(), target); {code} Multiple lines have this error: {panel:title="Build Failure"} [ERROR] hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/RetriableFileCopyCommand.java:[121,8] no suitable method found for info(java.lang.String,org.apache.hadoop.fs.Path,org.apache.hadoop.fs.Path) [ERROR] method org.apache.commons.logging.Log.info(java.lang.Object) is not applicable [ERROR] (actual and formal argument lists differ in length) [ERROR] method org.apache.commons.logging.Log.info(java.lang.Object,java.lang.Throwable) is not applicable [ERROR] (actual and formal argument lists differ in length) {panel} > Distcp to add no-rename copy option > --- > > Key: HADOOP-15281 > URL: https://issues.apache.org/jira/browse/HADOOP-15281 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Affects Versions: 3.0.0 >Reporter: Steve Loughran >Assignee: Andrew Olson >Priority: Major > Fix For: 3.2.1, 3.1.3 > > Attachments: HADOOP-15281-001.patch, HADOOP-15281-002.patch, > HADOOP-15281-003.patch, HADOOP-15281-004.patch > > > Currently Distcp uploads a file by two strategies > # append parts > # copy to temp then rename > option 2 executes the following sequence in {{promoteTmpToTarget}} > {code} > if ((fs.exists(target) && !fs.delete(target, false)) > || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent())) > || !fs.rename(tmpTarget, target)) { > throw new IOException("Failed to promote tmp-file:" + tmpTarget > + " to: " + target); > } > {code} > For any object store, that's a lot of HTTP requests; for S3A you are looking > at 12+ requests and an O(data) copy call. > This is not a good upload strategy for any store which manifests its output > atomically at the end of the write(). > Proposed: add a switch to write directly to the dest path, which can be > supplied as either a conf option (distcp.direct.write) or a CLI option > (-direct). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15973) Configuration: Included properties are not cached if resource is a stream
[ https://issues.apache.org/jira/browse/HADOOP-15973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725164#comment-16725164 ] Eric Payne commented on HADOOP-15973: - bq. branch-2.8 will also have a different patch, if necessary. Not necessary. This is not failing in branch-2.8. bq. confirm the omission of quiet mode suppression in the new include handling was intentional I agree that suppression of include handling is not desired. > Configuration: Included properties are not cached if resource is a stream > - > > Key: HADOOP-15973 > URL: https://issues.apache.org/jira/browse/HADOOP-15973 > Project: Hadoop Common > Issue Type: Bug >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Critical > Attachments: HADOOP-15973.001.patch, HADOOP-15973.002.patch, > HADOOP-15973.003.branch-2.patch, HADOOP-15973.003.branch-3.0.patch, > HADOOP-15973.003.patch > > > If a configuration resource is a bufferedinputstream and the resource has an > included xml file, the properties from the included file are read and stored > in the properties of the configuration, but they are not stored in the > resource cache. So, if a later resource is added to the config and the > properties are recalculated from the first resource, the included properties > are lost. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15973) Configuration: Included properties are not cached if resource is a stream
[ https://issues.apache.org/jira/browse/HADOOP-15973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725097#comment-16725097 ] Eric Payne commented on HADOOP-15973: - TestSSLFactory is not failing in my local environment. Also, I uploaded a branch-2 patch for version 003. It backports cleanly and builds onto branch-2.9. branch-2.8 will also have a different patch, if necessary. > Configuration: Included properties are not cached if resource is a stream > - > > Key: HADOOP-15973 > URL: https://issues.apache.org/jira/browse/HADOOP-15973 > Project: Hadoop Common > Issue Type: Bug >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Critical > Attachments: HADOOP-15973.001.patch, HADOOP-15973.002.patch, > HADOOP-15973.003.branch-2.patch, HADOOP-15973.003.branch-3.0.patch, > HADOOP-15973.003.patch > > > If a configuration resource is a bufferedinputstream and the resource has an > included xml file, the properties from the included file are read and stored > in the properties of the configuration, but they are not stored in the > resource cache. So, if a later resource is added to the config and the > properties are recalculated from the first resource, the included properties > are lost. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15973) Configuration: Included properties are not cached if resource is a stream
[ https://issues.apache.org/jira/browse/HADOOP-15973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated HADOOP-15973: Attachment: HADOOP-15973.003.branch-2.patch > Configuration: Included properties are not cached if resource is a stream > - > > Key: HADOOP-15973 > URL: https://issues.apache.org/jira/browse/HADOOP-15973 > Project: Hadoop Common > Issue Type: Bug >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Critical > Attachments: HADOOP-15973.001.patch, HADOOP-15973.002.patch, > HADOOP-15973.003.branch-2.patch, HADOOP-15973.003.branch-3.0.patch, > HADOOP-15973.003.patch > > > If a configuration resource is a bufferedinputstream and the resource has an > included xml file, the properties from the included file are read and stored > in the properties of the configuration, but they are not stored in the > resource cache. So, if a later resource is added to the config and the > properties are recalculated from the first resource, the included properties > are lost. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15973) Configuration: Included properties are not cached if resource is a stream
[ https://issues.apache.org/jira/browse/HADOOP-15973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated HADOOP-15973: Attachment: HADOOP-15973.003.branch-3.0.patch > Configuration: Included properties are not cached if resource is a stream > - > > Key: HADOOP-15973 > URL: https://issues.apache.org/jira/browse/HADOOP-15973 > Project: Hadoop Common > Issue Type: Bug >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Critical > Attachments: HADOOP-15973.001.patch, HADOOP-15973.002.patch, > HADOOP-15973.003.branch-3.0.patch, HADOOP-15973.003.patch > > > If a configuration resource is a bufferedinputstream and the resource has an > included xml file, the properties from the included file are read and stored > in the properties of the configuration, but they are not stored in the > resource cache. So, if a later resource is added to the config and the > properties are recalculated from the first resource, the included properties > are lost. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15973) Configuration: Included properties are not cached if resource is a stream
[ https://issues.apache.org/jira/browse/HADOOP-15973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16724534#comment-16724534 ] Eric Payne commented on HADOOP-15973: - Thanks [~jira.shegalov] for taking the time to review the code. As you can tell, this patch moved existing code to a utility method so it could be called in multiple places. As such, I'd rather not change the existing code as part of this patch. Thoughts? > Configuration: Included properties are not cached if resource is a stream > - > > Key: HADOOP-15973 > URL: https://issues.apache.org/jira/browse/HADOOP-15973 > Project: Hadoop Common > Issue Type: Bug >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Critical > Attachments: HADOOP-15973.001.patch, HADOOP-15973.002.patch, > HADOOP-15973.003.patch > > > If a configuration resource is a bufferedinputstream and the resource has an > included xml file, the properties from the included file are read and stored > in the properties of the configuration, but they are not stored in the > resource cache. So, if a later resource is added to the config and the > properties are recalculated from the first resource, the included properties > are lost. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15973) Configuration: Included properties are not cached if resource is a stream
[ https://issues.apache.org/jira/browse/HADOOP-15973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated HADOOP-15973: Attachment: HADOOP-15973.003.patch > Configuration: Included properties are not cached if resource is a stream > - > > Key: HADOOP-15973 > URL: https://issues.apache.org/jira/browse/HADOOP-15973 > Project: Hadoop Common > Issue Type: Bug >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Critical > Attachments: HADOOP-15973.001.patch, HADOOP-15973.002.patch, > HADOOP-15973.003.patch > > > If a configuration resource is a bufferedinputstream and the resource has an > included xml file, the properties from the included file are read and stored > in the properties of the configuration, but they are not stored in the > resource cache. So, if a later resource is added to the config and the > properties are recalculated from the first resource, the included properties > are lost. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15973) Configuration: Included properties are not cached if resource is a stream
[ https://issues.apache.org/jira/browse/HADOOP-15973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16724283#comment-16724283 ] Eric Payne commented on HADOOP-15973: - Thanks a lot, [~jlowe]. I uploaded 003 with the suggested changes. > Configuration: Included properties are not cached if resource is a stream > - > > Key: HADOOP-15973 > URL: https://issues.apache.org/jira/browse/HADOOP-15973 > Project: Hadoop Common > Issue Type: Bug >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Critical > Attachments: HADOOP-15973.001.patch, HADOOP-15973.002.patch, > HADOOP-15973.003.patch > > > If a configuration resource is a bufferedinputstream and the resource has an > included xml file, the properties from the included file are read and stored > in the properties of the configuration, but they are not stored in the > resource cache. So, if a later resource is added to the config and the > properties are recalculated from the first resource, the included properties > are lost. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15973) Configuration: Included properties are not cached if resource is a stream
[ https://issues.apache.org/jira/browse/HADOOP-15973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723447#comment-16723447 ] Eric Payne commented on HADOOP-15973: - Attaching patch 002. This patch invokes a new parser when processing includes rather than loading a resource. This should also fix HADOOP-16007. > Configuration: Included properties are not cached if resource is a stream > - > > Key: HADOOP-15973 > URL: https://issues.apache.org/jira/browse/HADOOP-15973 > Project: Hadoop Common > Issue Type: Bug >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Critical > Attachments: HADOOP-15973.001.patch, HADOOP-15973.002.patch > > > If a configuration resource is a bufferedinputstream and the resource has an > included xml file, the properties from the included file are read and stored > in the properties of the configuration, but they are not stored in the > resource cache. So, if a later resource is added to the config and the > properties are recalculated from the first resource, the included properties > are lost. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15973) Configuration: Included properties are not cached if resource is a stream
[ https://issues.apache.org/jira/browse/HADOOP-15973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated HADOOP-15973: Attachment: HADOOP-15973.002.patch > Configuration: Included properties are not cached if resource is a stream > - > > Key: HADOOP-15973 > URL: https://issues.apache.org/jira/browse/HADOOP-15973 > Project: Hadoop Common > Issue Type: Bug >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Critical > Attachments: HADOOP-15973.001.patch, HADOOP-15973.002.patch > > > If a configuration resource is a bufferedinputstream and the resource has an > included xml file, the properties from the included file are read and stored > in the properties of the configuration, but they are not stored in the > resource cache. So, if a later resource is added to the config and the > properties are recalculated from the first resource, the included properties > are lost. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Assigned] (HADOOP-16007) Order of property settings is incorrect when includes are processed
[ https://issues.apache.org/jira/browse/HADOOP-16007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne reassigned HADOOP-16007: --- Assignee: Eric Payne > Order of property settings is incorrect when includes are processed > --- > > Key: HADOOP-16007 > URL: https://issues.apache.org/jira/browse/HADOOP-16007 > Project: Hadoop Common > Issue Type: Bug > Components: conf >Affects Versions: 3.2.0, 3.1.1, 3.0.4 >Reporter: Jason Lowe >Assignee: Eric Payne >Priority: Blocker > > If a configuration file contains a setting for a property then later includes > another file that also sets that property to a different value then the > property will be parsed incorrectly. For example, consider the following > configuration file: > {noformat} > http://www.w3.org/2001/XInclude;> > > myprop > val1 > > > > {noformat} > with the contents of /some/other/file.xml as: > {noformat} > >myprop >val2 > > {noformat} > Parsing this configuration should result in myprop=val2, but it actually > results in myprop=val1. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15973) Configuration: Included properties are not cached if resource is a stream
[ https://issues.apache.org/jira/browse/HADOOP-15973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721873#comment-16721873 ] Eric Payne commented on HADOOP-15973: - One additional data point is that my manual tests do not show this problem in 2.9 and 3.0, but the included unit test fails on 2.9 and 3.0 (as well as 3.1 and 3.2) without any fix. > Configuration: Included properties are not cached if resource is a stream > - > > Key: HADOOP-15973 > URL: https://issues.apache.org/jira/browse/HADOOP-15973 > Project: Hadoop Common > Issue Type: Bug >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Critical > Attachments: HADOOP-15973.001.patch > > > If a configuration resource is a bufferedinputstream and the resource has an > included xml file, the properties from the included file are read and stored > in the properties of the configuration, but they are not stored in the > resource cache. So, if a later resource is added to the config and the > properties are recalculated from the first resource, the included properties > are lost. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15973) Configuration: Included properties are not cached if resource is a stream
[ https://issues.apache.org/jira/browse/HADOOP-15973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721771#comment-16721771 ] Eric Payne commented on HADOOP-15973: - [~sunilg], [~jlowe], [~ste...@apache.org], thanks for watching this JIRA. Do any of you want to review it? > Configuration: Included properties are not cached if resource is a stream > - > > Key: HADOOP-15973 > URL: https://issues.apache.org/jira/browse/HADOOP-15973 > Project: Hadoop Common > Issue Type: Bug >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Critical > Attachments: HADOOP-15973.001.patch > > > If a configuration resource is a bufferedinputstream and the resource has an > included xml file, the properties from the included file are read and stored > in the properties of the configuration, but they are not stored in the > resource cache. So, if a later resource is added to the config and the > properties are recalculated from the first resource, the included properties > are lost. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15973) Configuration: Included properties are not cached if resource is a stream
[ https://issues.apache.org/jira/browse/HADOOP-15973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated HADOOP-15973: Attachment: HADOOP-15973.001.patch > Configuration: Included properties are not cached if resource is a stream > - > > Key: HADOOP-15973 > URL: https://issues.apache.org/jira/browse/HADOOP-15973 > Project: Hadoop Common > Issue Type: Bug >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Critical > Attachments: HADOOP-15973.001.patch > > > If a configuration resource is a bufferedinputstream and the resource has an > included xml file, the properties from the included file are read and stored > in the properties of the configuration, but they are not stored in the > resource cache. So, if a later resource is added to the config and the > properties are recalculated from the first resource, the included properties > are lost. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15973) Configuration: Included properties are not cached if resource is a stream
[ https://issues.apache.org/jira/browse/HADOOP-15973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated HADOOP-15973: Status: Patch Available (was: Open) Submitted 001 version of the patch. > Configuration: Included properties are not cached if resource is a stream > - > > Key: HADOOP-15973 > URL: https://issues.apache.org/jira/browse/HADOOP-15973 > Project: Hadoop Common > Issue Type: Bug >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Critical > Attachments: HADOOP-15973.001.patch > > > If a configuration resource is a bufferedinputstream and the resource has an > included xml file, the properties from the included file are read and stored > in the properties of the configuration, but they are not stored in the > resource cache. So, if a later resource is added to the config and the > properties are recalculated from the first resource, the included properties > are lost. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15982) Support configurable trash location
[ https://issues.apache.org/jira/browse/HADOOP-15982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16711653#comment-16711653 ] Eric Payne commented on HADOOP-15982: - This JIRA is part of the wider discussion being done as part of HADOOP-7310. > Support configurable trash location > --- > > Key: HADOOP-15982 > URL: https://issues.apache.org/jira/browse/HADOOP-15982 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 3.0.0 >Reporter: George Huang >Assignee: George Huang >Priority: Minor > > Currently some customer has users accounts that are functional ids (fid) to > manage application and application data under the path /data/FID. These fid's > also get a home directory under /user path. The user's home directories are > limited with space quota 60 G. When these fids delete data, due to customer > deletion policy they are placed in /user//.Trash location and run over > quota. > For now they are increasing quotas for these functional users, but > considering growing applications they would like the .Trash location to be > configurable or something like /trash/\{userid} that is owned by the user. > What should the configurable path look like to make this happen? For example, > some thoughts may relate whether we want to configure it for per user or per > cluster, etc. > Here is current behavior: > fs.TrashPolicyDefault: Moved: 'hdfs://ns1/user/hdfs/test/1.txt to trash at: > hdfs://ns1/user/hdfs/.Trash/Current/user/hdfs/test/1.txt > for path under encryption zone: > fs.TrashPolicyDefault: Moved: 'hdfs://ns1/scale/2.txt' to trash at > hdfs://ns1/scale/.Trash/hdfs/Current/scale/2.txt > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Assigned] (HADOOP-15973) Configuration: Included properties are not cached if resource is a stream
[ https://issues.apache.org/jira/browse/HADOOP-15973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne reassigned HADOOP-15973: --- Assignee: Eric Payne > Configuration: Included properties are not cached if resource is a stream > - > > Key: HADOOP-15973 > URL: https://issues.apache.org/jira/browse/HADOOP-15973 > Project: Hadoop Common > Issue Type: Bug >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Critical > > If a configuration resource is a bufferedinputstream and the resource has an > included xml file, the properties from the included file are read and stored > in the properties of the configuration, but they are not stored in the > resource cache. So, if a later resource is added to the config and the > properties are recalculated from the first resource, the included properties > are lost. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15973) Configuration: Included properties are not cached if resource is a stream
Eric Payne created HADOOP-15973: --- Summary: Configuration: Included properties are not cached if resource is a stream Key: HADOOP-15973 URL: https://issues.apache.org/jira/browse/HADOOP-15973 Project: Hadoop Common Issue Type: Bug Reporter: Eric Payne If a configuration resource is a bufferedinputstream and the resource has an included xml file, the properties from the included file are read and stored in the properties of the configuration, but they are not stored in the resource cache. So, if a later resource is added to the config and the properties are recalculated from the first resource, the included properties are lost. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15548) Randomize local dirs
[ https://issues.apache.org/jira/browse/HADOOP-15548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated HADOOP-15548: Resolution: Fixed Fix Version/s: 3.0.4 2.8.5 2.9.2 3.1.1 3.2.0 2.10.0 Status: Resolved (was: Patch Available) > Randomize local dirs > > > Key: HADOOP-15548 > URL: https://issues.apache.org/jira/browse/HADOOP-15548 > Project: Hadoop Common > Issue Type: Bug >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Fix For: 2.10.0, 3.2.0, 3.1.1, 2.9.2, 2.8.5, 3.0.4 > > Attachments: HADOOP-15548-branch-2.001.patch, HADOOP-15548.001.patch, > HADOOP-15548.002.patch > > > shuffle LOCAL_DIRS, LOG_DIRS and LOCAL_USER_DIRS when launching container. > Some applications will process these in exactly the same way in every > container (e.g. roundrobin) which can cause disks to get unnecessarily > overloaded (e.g. one output file written to first entry specified in the > environment variable). > There are two paths for local dir allocation, depending on whether the size > is unknown or known. The unknown path already uses a random algorithm. The > known path initializes with a random starting point, and then goes > round-robin after that. When selecting a dir, it increments the last used by > one and then checks sequentially until it finds a dir that satisfies the > request. Proposal is to increment by a random value of between 1 and > num_dirs - 1, and then check sequentially from there. This should result in > a more random selection in all cases. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15548) Randomize local dirs
[ https://issues.apache.org/jira/browse/HADOOP-15548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16528211#comment-16528211 ] Eric Payne commented on HADOOP-15548: - The precommit build failure is my fault. I should have waited for the precommit build to run before I committed the branch-2 patch. Sorry about that. I've committed it to trunk, branch-3.1, branch-3.0, branch-2, branch-2.9, and branch-2.8 > Randomize local dirs > > > Key: HADOOP-15548 > URL: https://issues.apache.org/jira/browse/HADOOP-15548 > Project: Hadoop Common > Issue Type: Bug >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Attachments: HADOOP-15548-branch-2.001.patch, HADOOP-15548.001.patch, > HADOOP-15548.002.patch > > > shuffle LOCAL_DIRS, LOG_DIRS and LOCAL_USER_DIRS when launching container. > Some applications will process these in exactly the same way in every > container (e.g. roundrobin) which can cause disks to get unnecessarily > overloaded (e.g. one output file written to first entry specified in the > environment variable). > There are two paths for local dir allocation, depending on whether the size > is unknown or known. The unknown path already uses a random algorithm. The > known path initializes with a random starting point, and then goes > round-robin after that. When selecting a dir, it increments the last used by > one and then checks sequentially until it finds a dir that satisfies the > request. Proposal is to increment by a random value of between 1 and > num_dirs - 1, and then check sequentially from there. This should result in > a more random selection in all cases. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15548) Randomize local dirs
[ https://issues.apache.org/jira/browse/HADOOP-15548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16528090#comment-16528090 ] Eric Payne commented on HADOOP-15548: - Hi [~Jim_Brennan]. I tried backporting this and building it in branch-2. It gets the following errors during build: {code} [ERROR] /home/ericp/hadoop/source/Apache/HADOOP-15548/branch-2/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestLocalDirAllocator.java:[284,5] cannot find symbol symbol: method assumeNotWindows() {code} Can you please provide a 2.x patch? > Randomize local dirs > > > Key: HADOOP-15548 > URL: https://issues.apache.org/jira/browse/HADOOP-15548 > Project: Hadoop Common > Issue Type: Bug >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Attachments: HADOOP-15548.001.patch, HADOOP-15548.002.patch > > > shuffle LOCAL_DIRS, LOG_DIRS and LOCAL_USER_DIRS when launching container. > Some applications will process these in exactly the same way in every > container (e.g. roundrobin) which can cause disks to get unnecessarily > overloaded (e.g. one output file written to first entry specified in the > environment variable). > There are two paths for local dir allocation, depending on whether the size > is unknown or known. The unknown path already uses a random algorithm. The > known path initializes with a random starting point, and then goes > round-robin after that. When selecting a dir, it increments the last used by > one and then checks sequentially until it finds a dir that satisfies the > request. Proposal is to increment by a random value of between 1 and > num_dirs - 1, and then check sequentially from there. This should result in > a more random selection in all cases. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15548) Randomize local dirs
[ https://issues.apache.org/jira/browse/HADOOP-15548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16528034#comment-16528034 ] Eric Payne commented on HADOOP-15548: - Thanks [~Jim_Brennan]. +1 I will commit shortly > Randomize local dirs > > > Key: HADOOP-15548 > URL: https://issues.apache.org/jira/browse/HADOOP-15548 > Project: Hadoop Common > Issue Type: Bug >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Attachments: HADOOP-15548.001.patch, HADOOP-15548.002.patch > > > shuffle LOCAL_DIRS, LOG_DIRS and LOCAL_USER_DIRS when launching container. > Some applications will process these in exactly the same way in every > container (e.g. roundrobin) which can cause disks to get unnecessarily > overloaded (e.g. one output file written to first entry specified in the > environment variable). > There are two paths for local dir allocation, depending on whether the size > is unknown or known. The unknown path already uses a random algorithm. The > known path initializes with a random starting point, and then goes > round-robin after that. When selecting a dir, it increments the last used by > one and then checks sequentially until it finds a dir that satisfies the > request. Proposal is to increment by a random value of between 1 and > num_dirs - 1, and then check sequentially from there. This should result in > a more random selection in all cases. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15548) Randomize local dirs
[ https://issues.apache.org/jira/browse/HADOOP-15548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16526758#comment-16526758 ] Eric Payne commented on HADOOP-15548: - Thanks [~Jim_Brennan] for reporting this problem and providing the fix. The patch looks fine, but I have one concern with the test. It succeeds even without changing {{/LocalDirAllocator}}. Can you please modify the test so that it failes with the original code? > Randomize local dirs > > > Key: HADOOP-15548 > URL: https://issues.apache.org/jira/browse/HADOOP-15548 > Project: Hadoop Common > Issue Type: Bug >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Attachments: HADOOP-15548.001.patch > > > shuffle LOCAL_DIRS, LOG_DIRS and LOCAL_USER_DIRS when launching container. > Some applications will process these in exactly the same way in every > container (e.g. roundrobin) which can cause disks to get unnecessarily > overloaded (e.g. one output file written to first entry specified in the > environment variable). > There are two paths for local dir allocation, depending on whether the size > is unknown or known. The unknown path already uses a random algorithm. The > known path initializes with a random starting point, and then goes > round-robin after that. When selecting a dir, it increments the last used by > one and then checks sequentially until it finds a dir that satisfies the > request. Proposal is to increment by a random value of between 1 and > num_dirs - 1, and then check sequentially from there. This should result in > a more random selection in all cases. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Reopened] (HADOOP-9383) mvn clean compile fails without install goal
[ https://issues.apache.org/jira/browse/HADOOP-9383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne reopened HADOOP-9383: Something similar is definitely still happening on fedora and redhat. This is what I'm getting. {noformat} Could not find artifact org.apache.hadoop:hadoop-maven-plugins:jar:3.1.0-SNAPSHOT {noformat} I'm reopening the JIRA. > mvn clean compile fails without install goal > > > Key: HADOOP-9383 > URL: https://issues.apache.org/jira/browse/HADOOP-9383 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.0.0-alpha1 >Reporter: Arpit Agarwal > > 'mvn -Pnative-win clean compile' fails with the following error: > [ERROR] Could not find goal 'protoc' in plugin > org.apache.hadoop:hadoop-maven-plugins:3.0.0-SNAPSHOT among available goals > -> [Help 1] > The build succeeds if the install goal is specified. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-9747) Reduce unnecessary UGI synchronization
[ https://issues.apache.org/jira/browse/HADOOP-9747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16141807#comment-16141807 ] Eric Payne commented on HADOOP-9747: [~daryn], I started to commit this, but I ran into a couple of issues: # If this fix needs to go into branch-2.8, we may need a separate 2.8 patch. I tried applying the branch-2 patch to branch-2.8, and there were several conflicts in {{UserGroupInformation.java}} # {{HADOOP-9747.2.branch-2.patch}} does not apply cleanly to branch-2. It's just a minor import conflict that I could fix myself, but as long as you need to address the branch-2.8 conflicts... > Reduce unnecessary UGI synchronization > -- > > Key: HADOOP-9747 > URL: https://issues.apache.org/jira/browse/HADOOP-9747 > Project: Hadoop Common > Issue Type: Bug > Components: security >Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0-alpha1 >Reporter: Daryn Sharp >Assignee: Daryn Sharp >Priority: Critical > Attachments: HADOOP-9747.2.branch-2.patch, HADOOP-9747.2.trunk.patch, > HADOOP-9747.branch-2.patch, HADOOP-9747.trunk.patch > > > Jstacks of heavily loaded NNs show up to dozens of threads blocking in the > UGI. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-9747) Reduce unnecessary UGI synchronization
[ https://issues.apache.org/jira/browse/HADOOP-9747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16138923#comment-16138923 ] Eric Payne commented on HADOOP-9747: [~daryn], Thanks for providing the fixes for the YARN tests. +1. The patch LGTM. If there are no concerns, I will commit tomorrow afternoon. > Reduce unnecessary UGI synchronization > -- > > Key: HADOOP-9747 > URL: https://issues.apache.org/jira/browse/HADOOP-9747 > Project: Hadoop Common > Issue Type: Bug > Components: security >Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0-alpha1 >Reporter: Daryn Sharp >Assignee: Daryn Sharp >Priority: Critical > Attachments: HADOOP-9747.2.branch-2.patch, HADOOP-9747.2.trunk.patch, > HADOOP-9747.branch-2.patch, HADOOP-9747.trunk.patch > > > Jstacks of heavily loaded NNs show up to dozens of threads blocking in the > UGI. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-9747) Reduce unnecessary UGI synchronization
[ https://issues.apache.org/jira/browse/HADOOP-9747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16126644#comment-16126644 ] Eric Payne commented on HADOOP-9747: [~daryn], the following tests are failing with this patch and succeeding without it on trunk: {noformat} TestTokenClientRMService#testTokenRenewalByLoginUser testTokenRenewalByLoginUser(org.apache.hadoop.yarn.server.resourcemanager.TestTokenClientRMService) Time elapsed: 0.043 sec <<< ERROR! java.lang.reflect.UndeclaredThrowableException: null at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.renewDelegationToken(ClientRMService.java:1058) at org.apache.hadoop.yarn.server.resourcemanager.TestTokenClientRMService.checkTokenRenewal(TestTokenClientRMService.java:169) at org.apache.hadoop.yarn.server.resourcemanager.TestTokenClientRMService.access$500(TestTokenClientRMService.java:46) TestRMDelegationTokens#testRMDTMasterKeyStateOnRollingMasterKey testRMDTMasterKeyStateOnRollingMasterKey(org.apache.hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens) Time elapsed: 0.792 sec <<< ERROR! org.apache.hadoop.yarn.exceptions.YarnException: java.io.IOException: Delegation Token can be issued only with kerberos authentication at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getDelegationToken(ClientRMService.java:1022) at org.apache.hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens.testRMDTMasterKeyStateOnRollingMasterKey(TestRMDelegationTokens.java:102) {noformat} > Reduce unnecessary UGI synchronization > -- > > Key: HADOOP-9747 > URL: https://issues.apache.org/jira/browse/HADOOP-9747 > Project: Hadoop Common > Issue Type: Bug > Components: security >Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0-alpha1 >Reporter: Daryn Sharp >Assignee: Daryn Sharp >Priority: Critical > Attachments: HADOOP-9747.2.branch-2.patch, HADOOP-9747.2.trunk.patch, > HADOOP-9747.branch-2.patch, HADOOP-9747.trunk.patch > > > Jstacks of heavily loaded NNs show up to dozens of threads blocking in the > UGI. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14343) Wrong pid file name in error message when starting secure daemon
[ https://issues.apache.org/jira/browse/HADOOP-14343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16108082#comment-16108082 ] Eric Payne commented on HADOOP-14343: - [~boky01], thanks for the effort on this patch. Patch LGTM. +1 [~aw], did you have anything you wanted to add? > Wrong pid file name in error message when starting secure daemon > > > Key: HADOOP-14343 > URL: https://issues.apache.org/jira/browse/HADOOP-14343 > Project: Hadoop Common > Issue Type: Bug >Reporter: Andras Bokor >Assignee: Andras Bokor >Priority: Minor > Attachments: HADOOP-14343.01.patch, HADOOP-14343.02.patch > > > {code}# this is for the daemon pid creation > #shellcheck disable=SC2086 > echo $! > "${jsvcpidfile}" 2>/dev/null > if [[ $? -gt 0 ]]; then > hadoop_error "ERROR: Cannot write ${daemonname} pid ${daemonpidfile}." > fi{code} > It will log datanode's pid file instead of JSVC's pid file. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14320) TestIPC.testIpcWithReaderQueuing fails intermittently
[ https://issues.apache.org/jira/browse/HADOOP-14320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated HADOOP-14320: Resolution: Fixed Fix Version/s: 3.0.0-alpha3 2.8.1 2.9.0 Status: Resolved (was: Patch Available) > TestIPC.testIpcWithReaderQueuing fails intermittently > - > > Key: HADOOP-14320 > URL: https://issues.apache.org/jira/browse/HADOOP-14320 > Project: Hadoop Common > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger > Fix For: 2.9.0, 2.8.1, 3.0.0-alpha3 > > Attachments: HADOOP-14320.001.patch > > > {noformat} > org.mockito.exceptions.verification.TooLittleActualInvocations: > callQueueManager.put(); > Wanted 2 times: > -> at org.apache.hadoop.ipc.TestIPC.checkBlocking(TestIPC.java:810) > But was 1 time: > -> at org.apache.hadoop.ipc.Server.queueCall(Server.java:2466) > at org.apache.hadoop.ipc.TestIPC.checkBlocking(TestIPC.java:810) > at > org.apache.hadoop.ipc.TestIPC.testIpcWithReaderQueuing(TestIPC.java:738) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14320) TestIPC.testIpcWithReaderQueuing fails intermittently
[ https://issues.apache.org/jira/browse/HADOOP-14320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989379#comment-15989379 ] Eric Payne commented on HADOOP-14320: - Thanks [~ebadger]. +1 > TestIPC.testIpcWithReaderQueuing fails intermittently > - > > Key: HADOOP-14320 > URL: https://issues.apache.org/jira/browse/HADOOP-14320 > Project: Hadoop Common > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: HADOOP-14320.001.patch > > > {noformat} > org.mockito.exceptions.verification.TooLittleActualInvocations: > callQueueManager.put(); > Wanted 2 times: > -> at org.apache.hadoop.ipc.TestIPC.checkBlocking(TestIPC.java:810) > But was 1 time: > -> at org.apache.hadoop.ipc.Server.queueCall(Server.java:2466) > at org.apache.hadoop.ipc.TestIPC.checkBlocking(TestIPC.java:810) > at > org.apache.hadoop.ipc.TestIPC.testIpcWithReaderQueuing(TestIPC.java:738) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-12605) Fix intermittent failure of TestIPC.testIpcWithReaderQueuing
[ https://issues.apache.org/jira/browse/HADOOP-12605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated HADOOP-12605: Fix Version/s: 2.8.0 Thanks [~iwasakims] for this fix. I backported it to branch-2.8. > Fix intermittent failure of TestIPC.testIpcWithReaderQueuing > > > Key: HADOOP-12605 > URL: https://issues.apache.org/jira/browse/HADOOP-12605 > Project: Hadoop Common > Issue Type: Bug > Components: test >Reporter: Masatake Iwasaki >Assignee: Masatake Iwasaki >Priority: Minor > Fix For: 2.8.0, 2.9.0, 3.0.0-alpha1 > > Attachments: HADOOP-12605.001.patch, HADOOP-12605.002.patch, > HADOOP-12605.003.patch, HADOOP-12605.004.patch, HADOOP-12605.005.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13671) Fix ClassFormatException in trunk build.
[ https://issues.apache.org/jira/browse/HADOOP-13671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15536655#comment-15536655 ] Eric Payne commented on HADOOP-13671: - Thanks [~kihwal]. +1 Committing to trunk. > Fix ClassFormatException in trunk build. > > > Key: HADOOP-13671 > URL: https://issues.apache.org/jira/browse/HADOOP-13671 > Project: Hadoop Common > Issue Type: Bug >Reporter: Kihwal Lee >Assignee: Kihwal Lee > Attachments: HADOOP-13671.patch > > > The maven-project-info-reports-plugin version 2.7 depends on > maven-shared-jar-1.1, which uses bcel 5.2. This does not work well with the > new lamda expression. The 2.9 depends on maven-shared-jar-1.2, which works > around this problem by using the custom release of bcel 6.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-12418) TestRPC.testRPCInterruptedSimple fails intermittently
[ https://issues.apache.org/jira/browse/HADOOP-12418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated HADOOP-12418: Fix Version/s: 2.7.4 Thanks [~kihwal] and [~steve_l] for your work on this issue. This patch backports cleanly to 2.7 (with only contextual diffs). We would like this fix in 2.7, so I backported it. > TestRPC.testRPCInterruptedSimple fails intermittently > - > > Key: HADOOP-12418 > URL: https://issues.apache.org/jira/browse/HADOOP-12418 > Project: Hadoop Common > Issue Type: Bug > Components: test >Affects Versions: 3.0.0-alpha1 > Environment: Jenkins, Java 8 >Reporter: Steve Loughran >Assignee: Kihwal Lee > Fix For: 2.8.0, 2.7.4, 3.0.0-alpha1 > > Attachments: HADOOP-12418.patch, HADOOP-12418.v2.patch > > > Jenkins trunk + java 8 saw a failure of > {{TestRPC.testRPCInterruptedSimple}}; the interrupt wasn't picked up. Race in > test -or a surfacing of a bug in RPC where at some points interrupt > exceptions are not picked up? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-12582) Using BytesWritable's getLength() and getBytes() instead of get() and getSize()
[ https://issues.apache.org/jira/browse/HADOOP-12582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012195#comment-15012195 ] Eric Payne commented on HADOOP-12582: - bq. Honestly, we need a massive "remove usage of deprecated methods" patch for all of Hadoop. I think it would be better to do it piecemeal. Easier to review, easier to test. > Using BytesWritable's getLength() and getBytes() instead of get() and > getSize() > --- > > Key: HADOOP-12582 > URL: https://issues.apache.org/jira/browse/HADOOP-12582 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Tsuyoshi Ozawa > > BytesWritable's deprecated methods, get() and getSize(), are still used in > some tests: TestTFileSeek, TestTFileSeqFileComparison, TestSequenceFile, and > so on. We can also remove them if targeting this to 3.0.0 > https://builds.apache.org/job/PreCommit-HADOOP-Build/8084/artifact/patchprocess/diff-compile-javac-root-jdk1.7.0_85.txt -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-10321) TestCompositeService should cover all enumerations of adding a service to a parent service
[ https://issues.apache.org/jira/browse/HADOOP-10321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated HADOOP-10321: Labels: supportability test (was: BB2015-05-TBR supportability test) TestCompositeService should cover all enumerations of adding a service to a parent service -- Key: HADOOP-10321 URL: https://issues.apache.org/jira/browse/HADOOP-10321 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.3.0 Reporter: Karthik Kambatla Assignee: Ray Chiang Labels: supportability, test Attachments: HADOOP-10321-02.patch, HADOOP-10321-03.patch, HADOOP-10321-04.patch, HADOOP10321-01.patch HADOOP-10085 fixes some synchronization issues in CompositeService#addService(). The tests should cover all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11802) DomainSocketWatcher thread terminates sometimes after there is an I/O error during requestShortCircuitShm
[ https://issues.apache.org/jira/browse/HADOOP-11802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507815#comment-14507815 ] Eric Payne commented on HADOOP-11802: - Thanks for the new patch, [~cmccabe]. I have verified that patch 003 still fixes the problem of the dying {{DomainSocketWatcher}} thread in my manual tests. I have also verified that the new unit test fails without the patch and succeeds with it. +1 : LGTM DomainSocketWatcher thread terminates sometimes after there is an I/O error during requestShortCircuitShm - Key: HADOOP-11802 URL: https://issues.apache.org/jira/browse/HADOOP-11802 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.7.0 Reporter: Eric Payne Assignee: Colin Patrick McCabe Attachments: HADOOP-11802.001.patch, HADOOP-11802.002.patch, HADOOP-11802.003.patch In {{DataXceiver#requestShortCircuitShm}}, we attempt to recover from some errors by closing the {{DomainSocket}}. However, this violates the invariant that the domain socket should never be closed when it is being managed by the {{DomainSocketWatcher}}. Instead, we should call {{shutdown}} on the {{DomainSocket}}. When this bug hits, it terminates the {{DomainSocketWatcher}} thread. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11802) DomainSocketWatcher thread terminates sometimes after there is an I/O error during requestShortCircuitShm
[ https://issues.apache.org/jira/browse/HADOOP-11802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498828#comment-14498828 ] Eric Payne commented on HADOOP-11802: - [~cmccabe], Thanks very much for the patch. I was able to manually verify that the patch fixed the problem we were encountering when {{DomainSocketWatcher}}'s main thread was dying. Using the same methods as used previously to generate the exception in {{DataXceiver#requestShortCircuitShm}}, I was able to verify that the main thread of {{DomainSocketWatcher}} remains running. However, I don't think the unit test is verifying this use case. Here's what I did: 1. I patched branch-2 with {{HADOOP-11802.002.patch}}, built it, and ran the test for {{TestShortCircuitCache#testDataXceiverHandlesRequestShortCircuitShmFailure}}. This was successful. 2. I commented out the following code in {{DataXceiver#requestShortCircuitShm}} {code} if ((!success) releasedSocket) { try { sock.shutdown(); } catch (IOException e) { LOG.warn(Failed to shut down socket in error handler, e); } } {code} and replaced it with the original code: {code} if ((!success) (peer == null)) { IOUtils.cleanup(null, sock); } {code} This also succeeded. DomainSocketWatcher thread terminates sometimes after there is an I/O error during requestShortCircuitShm - Key: HADOOP-11802 URL: https://issues.apache.org/jira/browse/HADOOP-11802 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.7.0 Reporter: Eric Payne Assignee: Colin Patrick McCabe Attachments: HADOOP-11802.001.patch, HADOOP-11802.002.patch In {{DataXceiver#requestShortCircuitShm}}, we attempt to recover from some errors by closing the {{DomainSocket}}. However, this violates the invariant that the domain socket should never be closed when it is being managed by the {{DomainSocketWatcher}}. Instead, we should call {{shutdown}} on the {{DomainSocket}}. When this bug hits, it terminates the {{DomainSocketWatcher}} thread. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11802) DomainSocketWatcher#watcherThread can encounter IllegalStateException in finally block when calling sendCallback
[ https://issues.apache.org/jira/browse/HADOOP-11802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14493182#comment-14493182 ] Eric Payne commented on HADOOP-11802: - Thanks again, [~cmccabe], for your comments and taking time on this issue. One thing to note is that just prior to these problems, a 195-second GC was taking place on the DN. I added a catch of {{Throwable}} in the main thread of the {{DomainSocketWatcher}} and reproduced the problem. AFAICT, the following represents what is happening: - Request for short circuit read is received - {{DataXceiver#requestShortCircuitShm}} calls {{ShortCircuitRegistry#createNewMemorySegment}}, which creates a shared memory segment and associates it with the passed domain socket in the {{DomainSocketWatcher}}. Then, in that thread, {{createNewMemorySegment}} waits on that socket/shm entry in {{DomainSocketWatcher#add}}. {code} public NewShmInfo createNewMemorySegment(String clientName, ... watcher.add(sock, shm); ... {code} - It's at this point that things get confusing, and I'm still working on why this happens. The wait wakes up, but things are not normal, but it wasn't woken up because of an exception, either. You can tell that no exception was thrown inside {{createNewMemorySegment}} to wake it up because the following code goes on to call {{sendShmSuccessRespons}}, which is where the next bad thing happens: {code} public void requestShortCircuitShm(String clientName) throws IOException { ... try { shmInfo = datanode.shortCircuitRegistry. createNewMemorySegment(clientName, sock); // After calling #{ShortCircuitRegistry#createNewMemorySegment}, the // socket is managed by the DomainSocketWatcher, not the DataXceiver. releaseSocket(); } catch (UnsupportedOperationException e) { sendShmErrorResponse(ERROR_UNSUPPORTED, This datanode has not been configured to support + short-circuit shared memory segments.); return; } catch (IOException e) { sendShmErrorResponse(ERROR, Failed to create shared file descriptor: + e.getMessage()); return; } sendShmSuccessResponse(sock, shmInfo); ... {code} - At this point, the call to {{sendShmSuccessResponse}} gets an exception: {noformat} 2015-04-04 13:12:30,973 [DataXceiver for client unix:/home/gs/var/run/hdfs/dn_socket [Waiting for operation #1]] INFO DataNode.clienttrace: cliID: DFSClient_attempt_1427231924849_569269_m_002116_0_-161414780_1, src: 127.0.0.1, dest: 127.0.0.1, op: REQUEST_SHORT_CIRCUIT_SHM, shmId: n/a, srvID: a2d3bac0-e98b-4b73-a5a1-82c7eb557a7a, success: false 2015-04-04 13:12:30,984 [DataXceiver for client unix:/home/gs/var/run/hdfs/dn_socket [Waiting for operation #1]] ERROR datanode.DataNode: host.domain.com:1004:DataXceiver error processing REQUEST_SHORT_CIRCUIT_SHM operation src: unix:/home/gs/var/run/hdfs/dn_socket dst: local java.net.SocketException: write(2) error: Broken pipe at org.apache.hadoop.net.unix.DomainSocket.writeArray0(Native Method) at org.apache.hadoop.net.unix.DomainSocket.access$300(DomainSocket.java:45) at org.apache.hadoop.net.unix.DomainSocket$DomainOutputStream.write(DomainSocket.java:601) at com.google.protobuf.CodedOutputStream.refreshBuffer(CodedOutputStream.java:833) at com.google.protobuf.CodedOutputStream.flush(CodedOutputStream.java:843) at com.google.protobuf.AbstractMessageLite.writeDelimitedTo(AbstractMessageLite.java:91) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.sendShmSuccessResponse(DataXceiver.java:380) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:418) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:214) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:95) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235) at java.lang.Thread.run(Thread.java:722) {noformat} - At this point, it bubbles back up to {{DataXceiver#requestShortCircuitShm}}, which cleans up, closing the socket: {code} ... if ((!success) (peer == null)) { // If we failed to pass the shared memory segment to the client, // close the UNIX domain socket now. This will trigger the // DomainSocketWatcher callback, cleaning up the segment. IOUtils.cleanup(null, sock); } {code} - Then, the main {{DomainSocketWatcher}} thread wakes up (after regular timeout interval has expired), and tries to call {{sendCallbackAndRemove}}, which encounters the following {{IllegalArgumentException}}: {code} final Thread watcherThread = new Thread(new Runnable() { ... while (true) { lock.lock();
[jira] [Commented] (HADOOP-11802) DomainSocketWatcher#watcherThread can encounter IllegalStateException in finally block when calling sendCallback
[ https://issues.apache.org/jira/browse/HADOOP-11802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14485380#comment-14485380 ] Eric Payne commented on HADOOP-11802: - Sorry, I just noticed that the following was the first exception in the series: {noformat} 2015-04-02 11:48:09,866 [DataXceiver for client unix:/home/gs/var/run/hdfs/dn_socket [Waiting for operation #1]] ERROR datanode.DataNode: gsta70851.tan.ygrid.yahoo.com:1004:DataXceiver error processing REQUEST_SHORT_CIRCUIT_SHM operation src: unix:/home/gs/var/run/hdfs/dn_socket dst: local java.net.SocketException: write(2) error: Broken pipe at org.apache.hadoop.net.unix.DomainSocket.writeArray0(Native Method) at org.apache.hadoop.net.unix.DomainSocket.access$300(DomainSocket.java:45) at org.apache.hadoop.net.unix.DomainSocket$DomainOutputStream.write(DomainSocket.java:601) at com.google.protobuf.CodedOutputStream.refreshBuffer(CodedOutputStream.java:833) at com.google.protobuf.CodedOutputStream.flush(CodedOutputStream.java:843) at com.google.protobuf.AbstractMessageLite.writeDelimitedTo(AbstractMessageLite.java:91) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.sendShmSuccessResponse(DataXceiver.java:380) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:418) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:214) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:95) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235) {noformat} DomainSocketWatcher#watcherThread can encounter IllegalStateException in finally block when calling sendCallback Key: HADOOP-11802 URL: https://issues.apache.org/jira/browse/HADOOP-11802 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.7.0 Reporter: Eric Payne Assignee: Eric Payne In the main finally block of the {{DomainSocketWatcher#watcherThread}}, the call to {{sendCallback}} can encounter an {{IllegalStateException}}, and leave some cleanup tasks undone. {code} } finally { lock.lock(); try { kick(); // allow the handler for notificationSockets[0] to read a byte for (Entry entry : entries.values()) { // We do not remove from entries as we iterate, because that can // cause a ConcurrentModificationException. sendCallback(close, entries, fdSet, entry.getDomainSocket().fd); } entries.clear(); fdSet.close(); } finally { lock.unlock(); } } {code} The exception causes {{watcherThread}} to skip the calls to {{entries.clear()}} and {{fdSet.close()}}. {code} 2015-04-02 11:48:09,941 [DataXceiver for client unix:/home/gs/var/run/hdfs/dn_socket [Waiting for operation #1]] INFO DataNode.clienttrace: cliID: DFSClient_NONMAPREDUCE_-807148576_1, src: 127.0.0.1, dest: 127.0.0.1, op: REQUEST_SHORT_CIRCUIT_SHM, shmId: n/a, srvID: e6b6cdd7-1bf8-415f-a412-32d8493554df, success: false 2015-04-02 11:48:09,941 [Thread-14] ERROR unix.DomainSocketWatcher: Thread[Thread-14,5,main] terminating on unexpected exception java.lang.IllegalStateException: failed to remove b845649551b6b1eab5c17f630e42489d at com.google.common.base.Preconditions.checkState(Preconditions.java:145) at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.removeShm(ShortCircuitRegistry.java:119) at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry$RegisteredShm.handle(ShortCircuitRegistry.java:102) at org.apache.hadoop.net.unix.DomainSocketWatcher.sendCallback(DomainSocketWatcher.java:402) at org.apache.hadoop.net.unix.DomainSocketWatcher.access$1100(DomainSocketWatcher.java:52) at org.apache.hadoop.net.unix.DomainSocketWatcher$2.run(DomainSocketWatcher.java:522) at java.lang.Thread.run(Thread.java:722) {code} Please note that this is not a duplicate of HADOOP-11333, HADOOP-11604, or HADOOP-10404. The cluster installation is running code with all of these fixes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11802) DomainSocketWatcher#watcherThread can encounter IllegalStateException in finally block when calling sendCallback
[ https://issues.apache.org/jira/browse/HADOOP-11802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14485289#comment-14485289 ] Eric Payne commented on HADOOP-11802: - Thanks [~cmccabe] for your comment and interest in this issue. This problem is happening in multiple different live clusters. Only a small percentage of datanodes are affected each day, but once they hit this and the threads pile up, the datanodes must be restarted. The only 'terminating on' message in the DN log is coming from DomainSocketWatchers unhandled exception handler. That is, it's the one documented in the description above: {quote} {noformat} 2015-04-04 13:12:31,059 [Thread-12] ERROR unix.DomainSocketWatcher: Thread[Thread-12,5,main] terminating on unexpected exception java.lang.IllegalStateException: failed to remove 17e33191fa8238098d7d22142f5787e2 2015-04-02 11:48:09,941 [DataXceiver for client unix:/home/gs/var/run/hdfs/dn_socket [Waiting for operation #1]] INFO DataNode.clienttrace: cliID: DFSClient_NONMAPREDUCE_-807148576_1, src: 127.0.0.1, dest: 127.0.0.1, op: REQUEST_SHORT_CIRCUIT_SHM, shmId: n/a, srvID: e6b6cdd7-1bf8-415f-a412-32d8493554df, success: false 2015-04-02 11:48:09,941 [Thread-14] ERROR unix.DomainSocketWatcher: Thread[Thread-14,5,main] terminating on unexpected exception java.lang.IllegalStateException: failed to remove b845649551b6b1eab5c17f630e42489d ... {noformat} {quote} However, as you pointed out, that is happening after something went wrong in the main try block of the watcher thread. Since I'm seeing neither 'terminating on InterruptedException' nor 'terminating on IOException', there must be some other exception occurring. However, the only reference in the DN log of {{DomainSocketWatcher}} is in the stack trace already mentioned. However, just above the IllegalStateException stacktrace is the following that indicated a premature EOF occurred. There were several of these, but it's not clear that they are related to the reason why the DomainSocketWatcher exited. Your input would be greatly appreciated. {noformat} 2015-04-02 11:48:09,885 [DataXceiver for client DFSClient_attempt_1427231924849_569467_m_000135_0_346288762_1 at /xxx.xxx.xxx.xxx:41908 [Receiving block BP-658831282-xxx.xxx.xxx.xxx-1351509219914:blk_3365919992_1105804585360]] ERROR datanode.DataNode: gsta70851.tan.ygrid.yahoo.com:1004:DataXceiver error processing WRITE_BLOCK operation src: /xxx.xxx.xxx.xxx:41908 dst: /xxx.xxx.xxx.xxx:1004 java.io.IOException: Premature EOF from inputStream at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:467) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:781) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:730) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235) at java.lang.Thread.run(Thread.java:722) {noformat} DomainSocketWatcher#watcherThread can encounter IllegalStateException in finally block when calling sendCallback Key: HADOOP-11802 URL: https://issues.apache.org/jira/browse/HADOOP-11802 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.7.0 Reporter: Eric Payne Assignee: Eric Payne In the main finally block of the {{DomainSocketWatcher#watcherThread}}, the call to {{sendCallback}} can encounter an {{IllegalStateException}}, and leave some cleanup tasks undone. {code} } finally { lock.lock(); try { kick(); // allow the handler for notificationSockets[0] to read a byte for (Entry entry : entries.values()) { // We do not remove from entries as we iterate, because that can // cause a ConcurrentModificationException. sendCallback(close, entries, fdSet, entry.getDomainSocket().fd); } entries.clear(); fdSet.close(); } finally { lock.unlock(); } } {code} The exception causes {{watcherThread}} to skip the calls to {{entries.clear()}} and {{fdSet.close()}}. {code}
[jira] [Created] (HADOOP-11802) DomainSocketWatcher#watcherThread encounters IllegalStateException in finally block when calling sendCallback
Eric Payne created HADOOP-11802: --- Summary: DomainSocketWatcher#watcherThread encounters IllegalStateException in finally block when calling sendCallback Key: HADOOP-11802 URL: https://issues.apache.org/jira/browse/HADOOP-11802 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.7.0 Reporter: Eric Payne In the main finally block of the {{DomainSocketWatcher#watcherThread}}, the call to {{sendCallback}} can encountering an {{IllegalStateException}}, and leave some cleanup tasks undone. {code} } finally { lock.lock(); try { kick(); // allow the handler for notificationSockets[0] to read a byte for (Entry entry : entries.values()) { // We do not remove from entries as we iterate, because that can // cause a ConcurrentModificationException. sendCallback(close, entries, fdSet, entry.getDomainSocket().fd); } entries.clear(); fdSet.close(); } finally { lock.unlock(); } } {code} The exception causes {{watcherThread}} to skip the calls to {{entries.clear()}} and {{fdSet.close()}}. {code} 2015-04-02 11:48:09,941 [DataXceiver for client unix:/home/gs/var/run/hdfs/dn_socket [Waiting for operation #1]] INFO DataNode.clienttrace: cliID: DFSClient_NONMAPREDUCE_-807148576_1, src: 127.0.0.1, dest: 127.0.0.1, op: REQUEST_SHORT_CIRCUIT_SHM, shmId: n/a, srvID: e6b6cdd7-1bf8-415f-a412-32d8493554df, success: false 2015-04-02 11:48:09,941 [Thread-14] ERROR unix.DomainSocketWatcher: Thread[Thread-14,5,main] terminating on unexpected exception java.lang.IllegalStateException: failed to remove b845649551b6b1eab5c17f630e42489d at com.google.common.base.Preconditions.checkState(Preconditions.java:145) at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.removeShm(ShortCircuitRegistry.java:119) at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry$RegisteredShm.handle(ShortCircuitRegistry.java:102) at org.apache.hadoop.net.unix.DomainSocketWatcher.sendCallback(DomainSocketWatcher.java:402) at org.apache.hadoop.net.unix.DomainSocketWatcher.access$1100(DomainSocketWatcher.java:52) at org.apache.hadoop.net.unix.DomainSocketWatcher$2.run(DomainSocketWatcher.java:522) at java.lang.Thread.run(Thread.java:722) {code} Please note that this is not a duplicate of HADOOP-11333, HADOOP-11604, or HADOOP-10404. The cluster installation is running code with all of these fixes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HADOOP-11802) DomainSocketWatcher#watcherThread encounters IllegalStateException in finally block when calling sendCallback
[ https://issues.apache.org/jira/browse/HADOOP-11802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne reassigned HADOOP-11802: --- Assignee: Eric Payne DomainSocketWatcher#watcherThread encounters IllegalStateException in finally block when calling sendCallback - Key: HADOOP-11802 URL: https://issues.apache.org/jira/browse/HADOOP-11802 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.7.0 Reporter: Eric Payne Assignee: Eric Payne In the main finally block of the {{DomainSocketWatcher#watcherThread}}, the call to {{sendCallback}} can encountering an {{IllegalStateException}}, and leave some cleanup tasks undone. {code} } finally { lock.lock(); try { kick(); // allow the handler for notificationSockets[0] to read a byte for (Entry entry : entries.values()) { // We do not remove from entries as we iterate, because that can // cause a ConcurrentModificationException. sendCallback(close, entries, fdSet, entry.getDomainSocket().fd); } entries.clear(); fdSet.close(); } finally { lock.unlock(); } } {code} The exception causes {{watcherThread}} to skip the calls to {{entries.clear()}} and {{fdSet.close()}}. {code} 2015-04-02 11:48:09,941 [DataXceiver for client unix:/home/gs/var/run/hdfs/dn_socket [Waiting for operation #1]] INFO DataNode.clienttrace: cliID: DFSClient_NONMAPREDUCE_-807148576_1, src: 127.0.0.1, dest: 127.0.0.1, op: REQUEST_SHORT_CIRCUIT_SHM, shmId: n/a, srvID: e6b6cdd7-1bf8-415f-a412-32d8493554df, success: false 2015-04-02 11:48:09,941 [Thread-14] ERROR unix.DomainSocketWatcher: Thread[Thread-14,5,main] terminating on unexpected exception java.lang.IllegalStateException: failed to remove b845649551b6b1eab5c17f630e42489d at com.google.common.base.Preconditions.checkState(Preconditions.java:145) at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.removeShm(ShortCircuitRegistry.java:119) at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry$RegisteredShm.handle(ShortCircuitRegistry.java:102) at org.apache.hadoop.net.unix.DomainSocketWatcher.sendCallback(DomainSocketWatcher.java:402) at org.apache.hadoop.net.unix.DomainSocketWatcher.access$1100(DomainSocketWatcher.java:52) at org.apache.hadoop.net.unix.DomainSocketWatcher$2.run(DomainSocketWatcher.java:522) at java.lang.Thread.run(Thread.java:722) {code} Please note that this is not a duplicate of HADOOP-11333, HADOOP-11604, or HADOOP-10404. The cluster installation is running code with all of these fixes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11802) DomainSocketWatcher#watcherThread encounters IllegalStateException in finally block when calling sendCallback
[ https://issues.apache.org/jira/browse/HADOOP-11802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14394675#comment-14394675 ] Eric Payne commented on HADOOP-11802: - The place in {{sendCallback}} where it is encountering the exception is {code} if (entry.getHandler().handle(sock)) { {code} Once the {{IllegalStateException}} occurs, I am seeing 4069 datanode threads getting stuck in {{DomainSocketWatcher#add}} when {{DataXceiver}} is trying to request a new short circuit read. This is similar to the symptoms seen in HADOOP-11333, but, as I mentioned above, the cluster is already running with that fix. Here is the stack trace from the stuck threads, for reference: {noformat} DataXceiver for client unix:/home/gs/var/run/hdfs/dn_socket [Waiting for operat ion #1] daemon prio=10 tid=0x7fcbbcae1000 nid=0x498a waiting on condition [ 0x7fcb61132000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0xd06c3a78 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:323) at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:322) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:403) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:214) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:95) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235) at java.lang.Thread.run(Thread.java:722) {noformat} DomainSocketWatcher#watcherThread encounters IllegalStateException in finally block when calling sendCallback - Key: HADOOP-11802 URL: https://issues.apache.org/jira/browse/HADOOP-11802 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.7.0 Reporter: Eric Payne Assignee: Eric Payne In the main finally block of the {{DomainSocketWatcher#watcherThread}}, the call to {{sendCallback}} can encountering an {{IllegalStateException}}, and leave some cleanup tasks undone. {code} } finally { lock.lock(); try { kick(); // allow the handler for notificationSockets[0] to read a byte for (Entry entry : entries.values()) { // We do not remove from entries as we iterate, because that can // cause a ConcurrentModificationException. sendCallback(close, entries, fdSet, entry.getDomainSocket().fd); } entries.clear(); fdSet.close(); } finally { lock.unlock(); } } {code} The exception causes {{watcherThread}} to skip the calls to {{entries.clear()}} and {{fdSet.close()}}. {code} 2015-04-02 11:48:09,941 [DataXceiver for client unix:/home/gs/var/run/hdfs/dn_socket [Waiting for operation #1]] INFO DataNode.clienttrace: cliID: DFSClient_NONMAPREDUCE_-807148576_1, src: 127.0.0.1, dest: 127.0.0.1, op: REQUEST_SHORT_CIRCUIT_SHM, shmId: n/a, srvID: e6b6cdd7-1bf8-415f-a412-32d8493554df, success: false 2015-04-02 11:48:09,941 [Thread-14] ERROR unix.DomainSocketWatcher: Thread[Thread-14,5,main] terminating on unexpected exception java.lang.IllegalStateException: failed to remove b845649551b6b1eab5c17f630e42489d at com.google.common.base.Preconditions.checkState(Preconditions.java:145) at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.removeShm(ShortCircuitRegistry.java:119) at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry$RegisteredShm.handle(ShortCircuitRegistry.java:102) at org.apache.hadoop.net.unix.DomainSocketWatcher.sendCallback(DomainSocketWatcher.java:402) at org.apache.hadoop.net.unix.DomainSocketWatcher.access$1100(DomainSocketWatcher.java:52) at org.apache.hadoop.net.unix.DomainSocketWatcher$2.run(DomainSocketWatcher.java:522) at java.lang.Thread.run(Thread.java:722) {code} Please note that this is not a duplicate of HADOOP-11333, HADOOP-11604, or HADOOP-10404. The cluster installation is running code with all of these fixes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11802) DomainSocketWatcher#watcherThread can encounter IllegalStateException in finally block when calling sendCallback
[ https://issues.apache.org/jira/browse/HADOOP-11802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated HADOOP-11802: Summary: DomainSocketWatcher#watcherThread can encounter IllegalStateException in finally block when calling sendCallback (was: DomainSocketWatcher#watcherThread encounters IllegalStateException in finally block when calling sendCallback) DomainSocketWatcher#watcherThread can encounter IllegalStateException in finally block when calling sendCallback Key: HADOOP-11802 URL: https://issues.apache.org/jira/browse/HADOOP-11802 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.7.0 Reporter: Eric Payne Assignee: Eric Payne In the main finally block of the {{DomainSocketWatcher#watcherThread}}, the call to {{sendCallback}} can encounter an {{IllegalStateException}}, and leave some cleanup tasks undone. {code} } finally { lock.lock(); try { kick(); // allow the handler for notificationSockets[0] to read a byte for (Entry entry : entries.values()) { // We do not remove from entries as we iterate, because that can // cause a ConcurrentModificationException. sendCallback(close, entries, fdSet, entry.getDomainSocket().fd); } entries.clear(); fdSet.close(); } finally { lock.unlock(); } } {code} The exception causes {{watcherThread}} to skip the calls to {{entries.clear()}} and {{fdSet.close()}}. {code} 2015-04-02 11:48:09,941 [DataXceiver for client unix:/home/gs/var/run/hdfs/dn_socket [Waiting for operation #1]] INFO DataNode.clienttrace: cliID: DFSClient_NONMAPREDUCE_-807148576_1, src: 127.0.0.1, dest: 127.0.0.1, op: REQUEST_SHORT_CIRCUIT_SHM, shmId: n/a, srvID: e6b6cdd7-1bf8-415f-a412-32d8493554df, success: false 2015-04-02 11:48:09,941 [Thread-14] ERROR unix.DomainSocketWatcher: Thread[Thread-14,5,main] terminating on unexpected exception java.lang.IllegalStateException: failed to remove b845649551b6b1eab5c17f630e42489d at com.google.common.base.Preconditions.checkState(Preconditions.java:145) at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.removeShm(ShortCircuitRegistry.java:119) at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry$RegisteredShm.handle(ShortCircuitRegistry.java:102) at org.apache.hadoop.net.unix.DomainSocketWatcher.sendCallback(DomainSocketWatcher.java:402) at org.apache.hadoop.net.unix.DomainSocketWatcher.access$1100(DomainSocketWatcher.java:52) at org.apache.hadoop.net.unix.DomainSocketWatcher$2.run(DomainSocketWatcher.java:522) at java.lang.Thread.run(Thread.java:722) {code} Please note that this is not a duplicate of HADOOP-11333, HADOOP-11604, or HADOOP-10404. The cluster installation is running code with all of these fixes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11802) DomainSocketWatcher#watcherThread encounters IllegalStateException in finally block when calling sendCallback
[ https://issues.apache.org/jira/browse/HADOOP-11802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated HADOOP-11802: Description: In the main finally block of the {{DomainSocketWatcher#watcherThread}}, the call to {{sendCallback}} can encounter an {{IllegalStateException}}, and leave some cleanup tasks undone. {code} } finally { lock.lock(); try { kick(); // allow the handler for notificationSockets[0] to read a byte for (Entry entry : entries.values()) { // We do not remove from entries as we iterate, because that can // cause a ConcurrentModificationException. sendCallback(close, entries, fdSet, entry.getDomainSocket().fd); } entries.clear(); fdSet.close(); } finally { lock.unlock(); } } {code} The exception causes {{watcherThread}} to skip the calls to {{entries.clear()}} and {{fdSet.close()}}. {code} 2015-04-02 11:48:09,941 [DataXceiver for client unix:/home/gs/var/run/hdfs/dn_socket [Waiting for operation #1]] INFO DataNode.clienttrace: cliID: DFSClient_NONMAPREDUCE_-807148576_1, src: 127.0.0.1, dest: 127.0.0.1, op: REQUEST_SHORT_CIRCUIT_SHM, shmId: n/a, srvID: e6b6cdd7-1bf8-415f-a412-32d8493554df, success: false 2015-04-02 11:48:09,941 [Thread-14] ERROR unix.DomainSocketWatcher: Thread[Thread-14,5,main] terminating on unexpected exception java.lang.IllegalStateException: failed to remove b845649551b6b1eab5c17f630e42489d at com.google.common.base.Preconditions.checkState(Preconditions.java:145) at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.removeShm(ShortCircuitRegistry.java:119) at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry$RegisteredShm.handle(ShortCircuitRegistry.java:102) at org.apache.hadoop.net.unix.DomainSocketWatcher.sendCallback(DomainSocketWatcher.java:402) at org.apache.hadoop.net.unix.DomainSocketWatcher.access$1100(DomainSocketWatcher.java:52) at org.apache.hadoop.net.unix.DomainSocketWatcher$2.run(DomainSocketWatcher.java:522) at java.lang.Thread.run(Thread.java:722) {code} Please note that this is not a duplicate of HADOOP-11333, HADOOP-11604, or HADOOP-10404. The cluster installation is running code with all of these fixes. was: In the main finally block of the {{DomainSocketWatcher#watcherThread}}, the call to {{sendCallback}} can encountering an {{IllegalStateException}}, and leave some cleanup tasks undone. {code} } finally { lock.lock(); try { kick(); // allow the handler for notificationSockets[0] to read a byte for (Entry entry : entries.values()) { // We do not remove from entries as we iterate, because that can // cause a ConcurrentModificationException. sendCallback(close, entries, fdSet, entry.getDomainSocket().fd); } entries.clear(); fdSet.close(); } finally { lock.unlock(); } } {code} The exception causes {{watcherThread}} to skip the calls to {{entries.clear()}} and {{fdSet.close()}}. {code} 2015-04-02 11:48:09,941 [DataXceiver for client unix:/home/gs/var/run/hdfs/dn_socket [Waiting for operation #1]] INFO DataNode.clienttrace: cliID: DFSClient_NONMAPREDUCE_-807148576_1, src: 127.0.0.1, dest: 127.0.0.1, op: REQUEST_SHORT_CIRCUIT_SHM, shmId: n/a, srvID: e6b6cdd7-1bf8-415f-a412-32d8493554df, success: false 2015-04-02 11:48:09,941 [Thread-14] ERROR unix.DomainSocketWatcher: Thread[Thread-14,5,main] terminating on unexpected exception java.lang.IllegalStateException: failed to remove b845649551b6b1eab5c17f630e42489d at com.google.common.base.Preconditions.checkState(Preconditions.java:145) at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.removeShm(ShortCircuitRegistry.java:119) at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry$RegisteredShm.handle(ShortCircuitRegistry.java:102) at org.apache.hadoop.net.unix.DomainSocketWatcher.sendCallback(DomainSocketWatcher.java:402) at org.apache.hadoop.net.unix.DomainSocketWatcher.access$1100(DomainSocketWatcher.java:52) at org.apache.hadoop.net.unix.DomainSocketWatcher$2.run(DomainSocketWatcher.java:522) at java.lang.Thread.run(Thread.java:722) {code} Please note that this is not a duplicate of HADOOP-11333, HADOOP-11604, or HADOOP-10404. The cluster installation is running code with all of these fixes. DomainSocketWatcher#watcherThread encounters IllegalStateException in finally block when calling sendCallback - Key: HADOOP-11802 URL: https://issues.apache.org/jira/browse/HADOOP-11802 Project: Hadoop Common
[jira] [Updated] (HADOOP-11327) BloomFilter#not() omits the last bit, resulting in an incorrect filter
[ https://issues.apache.org/jira/browse/HADOOP-11327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated HADOOP-11327: Attachment: HADOOP-11327.v2.txt Thanks, [~jlowe], for the review and comments. I have updated the test case in version 2 of the patch. BloomFilter#not() omits the last bit, resulting in an incorrect filter -- Key: HADOOP-11327 URL: https://issues.apache.org/jira/browse/HADOOP-11327 Project: Hadoop Common Issue Type: Bug Components: util Affects Versions: 2.5.1 Reporter: Tim Luo Assignee: Eric Payne Priority: Minor Attachments: HADOOP-11327.v1.txt, HADOOP-11327.v2.txt There's an off-by-one error in {{BloomFilter#not()}}: {{BloomFilter#not}} calls {{BitSet#flip(0, vectorSize - 1)}}, but according to the javadoc for that method, {{toIndex}} is end-_exclusive_: {noformat} * @param toIndex index after the last bit to flip {noformat} This means that the last bit in the bit array is not flipped. Specifically, this was discovered in the following scenario: 1. A new/empty {{BloomFilter}} was created with vectorSize=7. 2. Invoke {{bloomFilter.not()}}; now expecting a bloom filter with all 7 bits (0 through 6) flipped to 1 and membershipTest(...) to always return true. 3. However, membershipTest(...) was found to often not return true, and upon inspection, the BitSet only had bits 0 through 5 flipped. The fix should be simple: remove the - 1 from the call to {{BitSet#flip}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (HADOOP-11327) BloomFilter#not() omits the last bit, resulting in an incorrect filter
[ https://issues.apache.org/jira/browse/HADOOP-11327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HADOOP-11327 started by Eric Payne. --- BloomFilter#not() omits the last bit, resulting in an incorrect filter -- Key: HADOOP-11327 URL: https://issues.apache.org/jira/browse/HADOOP-11327 Project: Hadoop Common Issue Type: Bug Components: util Affects Versions: 2.5.1 Reporter: Tim Luo Assignee: Eric Payne Priority: Minor Attachments: HADOOP-11327.v1.txt There's an off-by-one error in {{BloomFilter#not()}}: {{BloomFilter#not}} calls {{BitSet#flip(0, vectorSize - 1)}}, but according to the javadoc for that method, {{toIndex}} is end-_exclusive_: {noformat} * @param toIndex index after the last bit to flip {noformat} This means that the last bit in the bit array is not flipped. Specifically, this was discovered in the following scenario: 1. A new/empty {{BloomFilter}} was created with vectorSize=7. 2. Invoke {{bloomFilter.not()}}; now expecting a bloom filter with all 7 bits (0 through 6) flipped to 1 and membershipTest(...) to always return true. 3. However, membershipTest(...) was found to often not return true, and upon inspection, the BitSet only had bits 0 through 5 flipped. The fix should be simple: remove the - 1 from the call to {{BitSet#flip}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work stopped] (HADOOP-11327) BloomFilter#not() omits the last bit, resulting in an incorrect filter
[ https://issues.apache.org/jira/browse/HADOOP-11327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HADOOP-11327 stopped by Eric Payne. --- BloomFilter#not() omits the last bit, resulting in an incorrect filter -- Key: HADOOP-11327 URL: https://issues.apache.org/jira/browse/HADOOP-11327 Project: Hadoop Common Issue Type: Bug Components: util Affects Versions: 2.5.1 Reporter: Tim Luo Assignee: Eric Payne Priority: Minor Attachments: HADOOP-11327.v1.txt There's an off-by-one error in {{BloomFilter#not()}}: {{BloomFilter#not}} calls {{BitSet#flip(0, vectorSize - 1)}}, but according to the javadoc for that method, {{toIndex}} is end-_exclusive_: {noformat} * @param toIndex index after the last bit to flip {noformat} This means that the last bit in the bit array is not flipped. Specifically, this was discovered in the following scenario: 1. A new/empty {{BloomFilter}} was created with vectorSize=7. 2. Invoke {{bloomFilter.not()}}; now expecting a bloom filter with all 7 bits (0 through 6) flipped to 1 and membershipTest(...) to always return true. 3. However, membershipTest(...) was found to often not return true, and upon inspection, the BitSet only had bits 0 through 5 flipped. The fix should be simple: remove the - 1 from the call to {{BitSet#flip}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11327) BloomFilter#not() omits the last bit, resulting in an incorrect filter
[ https://issues.apache.org/jira/browse/HADOOP-11327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated HADOOP-11327: Attachment: HADOOP-11327.v1.txt Thanks [~tim.luo]. Here is patch, version 1. BloomFilter#not() omits the last bit, resulting in an incorrect filter -- Key: HADOOP-11327 URL: https://issues.apache.org/jira/browse/HADOOP-11327 Project: Hadoop Common Issue Type: Bug Components: util Affects Versions: 2.5.1 Reporter: Tim Luo Assignee: Eric Payne Priority: Minor Attachments: HADOOP-11327.v1.txt There's an off-by-one error in {{BloomFilter#not()}}: {{BloomFilter#not}} calls {{BitSet#flip(0, vectorSize - 1)}}, but according to the javadoc for that method, {{toIndex}} is end-_exclusive_: {noformat} * @param toIndex index after the last bit to flip {noformat} This means that the last bit in the bit array is not flipped. Specifically, this was discovered in the following scenario: 1. A new/empty {{BloomFilter}} was created with vectorSize=7. 2. Invoke {{bloomFilter.not()}}; now expecting a bloom filter with all 7 bits (0 through 6) flipped to 1 and membershipTest(...) to always return true. 3. However, membershipTest(...) was found to often not return true, and upon inspection, the BitSet only had bits 0 through 5 flipped. The fix should be simple: remove the - 1 from the call to {{BitSet#flip}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work stopped] (HADOOP-11327) BloomFilter#not() omits the last bit, resulting in an incorrect filter
[ https://issues.apache.org/jira/browse/HADOOP-11327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HADOOP-11327 stopped by Eric Payne. --- BloomFilter#not() omits the last bit, resulting in an incorrect filter -- Key: HADOOP-11327 URL: https://issues.apache.org/jira/browse/HADOOP-11327 Project: Hadoop Common Issue Type: Bug Components: util Affects Versions: 2.5.1 Reporter: Tim Luo Assignee: Eric Payne Priority: Minor Attachments: HADOOP-11327.v1.txt There's an off-by-one error in {{BloomFilter#not()}}: {{BloomFilter#not}} calls {{BitSet#flip(0, vectorSize - 1)}}, but according to the javadoc for that method, {{toIndex}} is end-_exclusive_: {noformat} * @param toIndex index after the last bit to flip {noformat} This means that the last bit in the bit array is not flipped. Specifically, this was discovered in the following scenario: 1. A new/empty {{BloomFilter}} was created with vectorSize=7. 2. Invoke {{bloomFilter.not()}}; now expecting a bloom filter with all 7 bits (0 through 6) flipped to 1 and membershipTest(...) to always return true. 3. However, membershipTest(...) was found to often not return true, and upon inspection, the BitSet only had bits 0 through 5 flipped. The fix should be simple: remove the - 1 from the call to {{BitSet#flip}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11327) BloomFilter#not() omits the last bit, resulting in an incorrect filter
[ https://issues.apache.org/jira/browse/HADOOP-11327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated HADOOP-11327: Target Version/s: 2.7.0 Status: Patch Available (was: Open) BloomFilter#not() omits the last bit, resulting in an incorrect filter -- Key: HADOOP-11327 URL: https://issues.apache.org/jira/browse/HADOOP-11327 Project: Hadoop Common Issue Type: Bug Components: util Affects Versions: 2.5.1 Reporter: Tim Luo Assignee: Eric Payne Priority: Minor Attachments: HADOOP-11327.v1.txt There's an off-by-one error in {{BloomFilter#not()}}: {{BloomFilter#not}} calls {{BitSet#flip(0, vectorSize - 1)}}, but according to the javadoc for that method, {{toIndex}} is end-_exclusive_: {noformat} * @param toIndex index after the last bit to flip {noformat} This means that the last bit in the bit array is not flipped. Specifically, this was discovered in the following scenario: 1. A new/empty {{BloomFilter}} was created with vectorSize=7. 2. Invoke {{bloomFilter.not()}}; now expecting a bloom filter with all 7 bits (0 through 6) flipped to 1 and membershipTest(...) to always return true. 3. However, membershipTest(...) was found to often not return true, and upon inspection, the BitSet only had bits 0 through 5 flipped. The fix should be simple: remove the - 1 from the call to {{BitSet#flip}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (HADOOP-11327) BloomFilter#not() omits the last bit, resulting in an incorrect filter
[ https://issues.apache.org/jira/browse/HADOOP-11327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HADOOP-11327 started by Eric Payne. --- BloomFilter#not() omits the last bit, resulting in an incorrect filter -- Key: HADOOP-11327 URL: https://issues.apache.org/jira/browse/HADOOP-11327 Project: Hadoop Common Issue Type: Bug Components: util Affects Versions: 2.5.1 Reporter: Tim Luo Assignee: Eric Payne Priority: Minor Attachments: HADOOP-11327.v1.txt There's an off-by-one error in {{BloomFilter#not()}}: {{BloomFilter#not}} calls {{BitSet#flip(0, vectorSize - 1)}}, but according to the javadoc for that method, {{toIndex}} is end-_exclusive_: {noformat} * @param toIndex index after the last bit to flip {noformat} This means that the last bit in the bit array is not flipped. Specifically, this was discovered in the following scenario: 1. A new/empty {{BloomFilter}} was created with vectorSize=7. 2. Invoke {{bloomFilter.not()}}; now expecting a bloom filter with all 7 bits (0 through 6) flipped to 1 and membershipTest(...) to always return true. 3. However, membershipTest(...) was found to often not return true, and upon inspection, the BitSet only had bits 0 through 5 flipped. The fix should be simple: remove the - 1 from the call to {{BitSet#flip}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11327) BloomFilter#not() omits the last bit, resulting in an incorrect filter
[ https://issues.apache.org/jira/browse/HADOOP-11327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276012#comment-14276012 ] Eric Payne commented on HADOOP-11327: - Hi [~tim.luo]. I'm interested in seeing this issue resolved. Please let me know if you plan on working on it any time soon. Otherwise, would it be okay if I took it over? BloomFilter#not() omits the last bit, resulting in an incorrect filter -- Key: HADOOP-11327 URL: https://issues.apache.org/jira/browse/HADOOP-11327 Project: Hadoop Common Issue Type: Bug Components: util Affects Versions: 2.5.1 Reporter: Tim Luo Assignee: Tim Luo Priority: Minor There's an off-by-one error in {{BloomFilter#not()}}: {{BloomFilter#not}} calls {{BitSet#flip(0, vectorSize - 1)}}, but according to the javadoc for that method, {{toIndex}} is end-_exclusive_: {noformat} * @param toIndex index after the last bit to flip {noformat} This means that the last bit in the bit array is not flipped. Specifically, this was discovered in the following scenario: 1. A new/empty {{BloomFilter}} was created with vectorSize=7. 2. Invoke {{bloomFilter.not()}}; now expecting a bloom filter with all 7 bits (0 through 6) flipped to 1 and membershipTest(...) to always return true. 3. However, membershipTest(...) was found to often not return true, and upon inspection, the BitSet only had bits 0 through 5 flipped. The fix should be simple: remove the - 1 from the call to {{BitSet#flip}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11012) hadoop fs -text of zero-length file causes EOFException
[ https://issues.apache.org/jira/browse/HADOOP-11012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated HADOOP-11012: Status: Open (was: Patch Available) hadoop fs -text of zero-length file causes EOFException --- Key: HADOOP-11012 URL: https://issues.apache.org/jira/browse/HADOOP-11012 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 2.5.0 Reporter: Eric Payne Assignee: Eric Payne Attachments: HDFS-6915.201408271824.txt, HDFS-6915.201408272144.txt List: $ $HADOOP_PREFIX/bin/hadoop fs -ls /user/ericp/foo -rw--- 3 ericp hdfs 0 2014-08-22 16:37 /user/ericp/foo Cat: $ $HADOOP_PREFIX/bin/hadoop fs -cat /user/ericp/foo Text: $ $HADOOP_PREFIX/bin/hadoop fs -text /user/ericp/foo text: java.io.EOFException at java.io.DataInputStream.readShort(DataInputStream.java:315) at org.apache.hadoop.fs.shell.Display$Text.getInputStream(Display.java:130) at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:98) at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:306) at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:278) at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:260) at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:244) at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:190) at org.apache.hadoop.fs.shell.Command.run(Command.java:154) at org.apache.hadoop.fs.FsShell.run(FsShell.java:287) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.fs.FsShell.main(FsShell.java:340) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-11012) hadoop fs -text of zero-length file causes EOFException
[ https://issues.apache.org/jira/browse/HADOOP-11012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated HADOOP-11012: Status: Patch Available (was: Open) hadoop fs -text of zero-length file causes EOFException --- Key: HADOOP-11012 URL: https://issues.apache.org/jira/browse/HADOOP-11012 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 2.5.0 Reporter: Eric Payne Assignee: Eric Payne Attachments: HDFS-6915.201408271824.txt, HDFS-6915.201408272144.txt, HDFS-6915.201408282053.txt List: $ $HADOOP_PREFIX/bin/hadoop fs -ls /user/ericp/foo -rw--- 3 ericp hdfs 0 2014-08-22 16:37 /user/ericp/foo Cat: $ $HADOOP_PREFIX/bin/hadoop fs -cat /user/ericp/foo Text: $ $HADOOP_PREFIX/bin/hadoop fs -text /user/ericp/foo text: java.io.EOFException at java.io.DataInputStream.readShort(DataInputStream.java:315) at org.apache.hadoop.fs.shell.Display$Text.getInputStream(Display.java:130) at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:98) at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:306) at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:278) at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:260) at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:244) at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:190) at org.apache.hadoop.fs.shell.Command.run(Command.java:154) at org.apache.hadoop.fs.FsShell.run(FsShell.java:287) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.fs.FsShell.main(FsShell.java:340) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-11012) hadoop fs -text of zero-length file causes EOFException
[ https://issues.apache.org/jira/browse/HADOOP-11012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated HADOOP-11012: Attachment: HDFS-6915.201408282053.txt [~daryn], thank you for reviewing this patch. using {{fsshell.run}} might not be simple since that would entail creating a new output stream, setting the {{out}} instance variable for {{Display.Text}}, and then reading from that stream. However, with this patch, I was able to anonymously extend the {{Display.Text}} class and override the {{getInputStream}} method to be public, and then call {{getInputStream}} directly. Please let me know what you think. hadoop fs -text of zero-length file causes EOFException --- Key: HADOOP-11012 URL: https://issues.apache.org/jira/browse/HADOOP-11012 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 2.5.0 Reporter: Eric Payne Assignee: Eric Payne Attachments: HDFS-6915.201408271824.txt, HDFS-6915.201408272144.txt, HDFS-6915.201408282053.txt List: $ $HADOOP_PREFIX/bin/hadoop fs -ls /user/ericp/foo -rw--- 3 ericp hdfs 0 2014-08-22 16:37 /user/ericp/foo Cat: $ $HADOOP_PREFIX/bin/hadoop fs -cat /user/ericp/foo Text: $ $HADOOP_PREFIX/bin/hadoop fs -text /user/ericp/foo text: java.io.EOFException at java.io.DataInputStream.readShort(DataInputStream.java:315) at org.apache.hadoop.fs.shell.Display$Text.getInputStream(Display.java:130) at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:98) at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:306) at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:278) at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:260) at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:244) at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:190) at org.apache.hadoop.fs.shell.Command.run(Command.java:154) at org.apache.hadoop.fs.FsShell.run(FsShell.java:287) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.fs.FsShell.main(FsShell.java:340) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-11012) hadoop fs -text of zero-length file causes EOFException
[ https://issues.apache.org/jira/browse/HADOOP-11012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated HADOOP-11012: Target Version/s: 3.0.0, 2.6.0 (was: 2.6.0) hadoop fs -text of zero-length file causes EOFException --- Key: HADOOP-11012 URL: https://issues.apache.org/jira/browse/HADOOP-11012 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 2.5.0 Reporter: Eric Payne Assignee: Eric Payne Attachments: HDFS-6915.201408271824.txt List: $ $HADOOP_PREFIX/bin/hadoop fs -ls /user/ericp/foo -rw--- 3 ericp hdfs 0 2014-08-22 16:37 /user/ericp/foo Cat: $ $HADOOP_PREFIX/bin/hadoop fs -cat /user/ericp/foo Text: $ $HADOOP_PREFIX/bin/hadoop fs -text /user/ericp/foo text: java.io.EOFException at java.io.DataInputStream.readShort(DataInputStream.java:315) at org.apache.hadoop.fs.shell.Display$Text.getInputStream(Display.java:130) at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:98) at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:306) at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:278) at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:260) at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:244) at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:190) at org.apache.hadoop.fs.shell.Command.run(Command.java:154) at org.apache.hadoop.fs.FsShell.run(FsShell.java:287) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.fs.FsShell.main(FsShell.java:340) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-11012) hadoop fs -text of zero-length file causes EOFException
[ https://issues.apache.org/jira/browse/HADOOP-11012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14112914#comment-14112914 ] Eric Payne commented on HADOOP-11012: - [~jira.shegalov], thank you for reviewing this patch. bq. Now you read the magic twice however. I woud change the original code just by enclosing the switch statement into try-catch-EOF. Would it be sufficient to do as [~jlowe] suggests and save the magic bytes and switch on that? hadoop fs -text of zero-length file causes EOFException --- Key: HADOOP-11012 URL: https://issues.apache.org/jira/browse/HADOOP-11012 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 2.5.0 Reporter: Eric Payne Assignee: Eric Payne Attachments: HDFS-6915.201408271824.txt List: $ $HADOOP_PREFIX/bin/hadoop fs -ls /user/ericp/foo -rw--- 3 ericp hdfs 0 2014-08-22 16:37 /user/ericp/foo Cat: $ $HADOOP_PREFIX/bin/hadoop fs -cat /user/ericp/foo Text: $ $HADOOP_PREFIX/bin/hadoop fs -text /user/ericp/foo text: java.io.EOFException at java.io.DataInputStream.readShort(DataInputStream.java:315) at org.apache.hadoop.fs.shell.Display$Text.getInputStream(Display.java:130) at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:98) at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:306) at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:278) at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:260) at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:244) at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:190) at org.apache.hadoop.fs.shell.Command.run(Command.java:154) at org.apache.hadoop.fs.FsShell.run(FsShell.java:287) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.fs.FsShell.main(FsShell.java:340) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-11012) hadoop fs -text of zero-length file causes EOFException
[ https://issues.apache.org/jira/browse/HADOOP-11012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated HADOOP-11012: Attachment: HDFS-6915.201408272144.txt [~jlowe], thank you for taking the time to review this patch. {quote} It would be more efficient and a bit clearer if we saved off the result of the initial readShort call and switched on that rather than throwing it away as a test read. {quote} This has been done in this new patch. {quote} There's a lot of duplication setting up the input stream in the unit tests, and it's probably worth it to factor this out. Given there appears to be no overlap between the three added test cases (0-byte, 1-byte, and 2-byte files) it would be nice to put these in separate unit tests. Then the unit test that fails makes it obvious which test case is broken. {quote} In this latest patch, I created separate tests for each of the added use cases and combine the duplicate setup code into a separate method. hadoop fs -text of zero-length file causes EOFException --- Key: HADOOP-11012 URL: https://issues.apache.org/jira/browse/HADOOP-11012 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 2.5.0 Reporter: Eric Payne Assignee: Eric Payne Attachments: HDFS-6915.201408271824.txt, HDFS-6915.201408272144.txt List: $ $HADOOP_PREFIX/bin/hadoop fs -ls /user/ericp/foo -rw--- 3 ericp hdfs 0 2014-08-22 16:37 /user/ericp/foo Cat: $ $HADOOP_PREFIX/bin/hadoop fs -cat /user/ericp/foo Text: $ $HADOOP_PREFIX/bin/hadoop fs -text /user/ericp/foo text: java.io.EOFException at java.io.DataInputStream.readShort(DataInputStream.java:315) at org.apache.hadoop.fs.shell.Display$Text.getInputStream(Display.java:130) at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:98) at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:306) at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:278) at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:260) at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:244) at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:190) at org.apache.hadoop.fs.shell.Command.run(Command.java:154) at org.apache.hadoop.fs.FsShell.run(FsShell.java:287) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.fs.FsShell.main(FsShell.java:340) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-8087) Paths that start with a double slash cause No filesystem for scheme: null errors
[ https://issues.apache.org/jira/browse/HADOOP-8087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966460#comment-13966460 ] Eric Payne commented on HADOOP-8087: [~daryn] and [~cmccabe] : I came across this issue as part of a 0.23 backlog review. Will this issue be resolved in 0.23 or 2.0? If not, can we remove the 0.23.3 and 2.0.0-alpha targets and leave this JIRA targeted for 3.0.0? Paths that start with a double slash cause No filesystem for scheme: null errors -- Key: HADOOP-8087 URL: https://issues.apache.org/jira/browse/HADOOP-8087 Project: Hadoop Common Issue Type: Bug Affects Versions: 0.23.0, 0.24.0 Reporter: Daryn Sharp Assignee: Colin Patrick McCabe Attachments: HADOOP-8087.001.patch, HADOOP-8087.002.patch {{Path}} is incorrectly parsing {{//dir/path}} in a very unexpected way. While it should translate to the directory {{$fs.default.name}/dir/path}}, it instead discards the {{//dir}} and returns {{$fs.default.name/path}}. The problem is {{Path}} is trying to parsing an authority even when a scheme is not present. -- This message was sent by Atlassian JIRA (v6.2#6252)