[jira] [Updated] (HADOOP-17857) Check real user ACLs in addition to proxied user ACLs

2021-10-21 Thread Eric Payne (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HADOOP-17857:

Fix Version/s: 3.3.3
   3.2.4
   2.10.2

> Check real user ACLs in addition to proxied user ACLs
> -
>
> Key: HADOOP-17857
> URL: https://issues.apache.org/jira/browse/HADOOP-17857
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 3.2.2, 2.10.1, 3.3.1
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Fix For: 3.4.0, 2.10.2, 3.2.4, 3.3.3
>
> Attachments: HADOOP-17857.001.patch, HADOOP-17857.002.patch
>
>
> In a secure cluster, it is possible to configure the services to allow a 
> super-user to proxy to a regular user and perform actions on behalf of the 
> proxied user (see [Proxy user - Superusers Acting On Behalf Of Other 
> Users|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html]).
> This is useful for automating server access for multiple different users in a 
> multi-tenant cluster. For example, this can be used by a super user 
> submitting jobs to a YARN queue, accessing HDFS files, scheduling Oozie 
> workflows, etc, which will then execute the service as the proxied user.
> Usually when these services check ACLs to determine if the user has access to 
> the requested resources, the service only needs to check the ACLs for the 
> proxied user. However, it is sometimes desirable to allow the proxied user to 
> have access to the resources when only the real user has open ACLs.
> For instance, let's say the user {{adm}} is the only user with submit ACLs to 
> the {{dataload}} queue, and the {{adm}} user wants to submit apps to the 
> {{dataload}} queue on behalf of users {{headless1}} and {{headless2}}. In 
> addition, we want to be able to bill {{headless1}} and {{headless2}} 
> separately for the YARN resources used in the {{dataload}} queue. In order to 
> do this, the apps need to run in the {{dataload}} queue as the respective 
> headless users. We could open up the ACLs to the {{dataload}} queue to allow 
> {{headless1}} and {{headless2}} to submit apps. But this would allow those 
> users to submit any app to that queue, and not be limited to just the data 
> loading apps, and we don't trust the {{headless1}} and {{headless2}} owners 
> to honor that restriction.
> This JIRA proposes that we define a way to set up ACLs to restrict a 
> resource's access to a  super-user, but when the access happens, run it as 
> the proxied user.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-17857) Check real user ACLs in addition to proxied user ACLs

2021-10-19 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17430759#comment-17430759
 ] 

Eric Payne commented on HADOOP-17857:
-

I would like to backport this to the previous branches, back to branch-2.10.

> Check real user ACLs in addition to proxied user ACLs
> -
>
> Key: HADOOP-17857
> URL: https://issues.apache.org/jira/browse/HADOOP-17857
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 3.2.2, 2.10.1, 3.3.1
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HADOOP-17857.001.patch, HADOOP-17857.002.patch
>
>
> In a secure cluster, it is possible to configure the services to allow a 
> super-user to proxy to a regular user and perform actions on behalf of the 
> proxied user (see [Proxy user - Superusers Acting On Behalf Of Other 
> Users|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html]).
> This is useful for automating server access for multiple different users in a 
> multi-tenant cluster. For example, this can be used by a super user 
> submitting jobs to a YARN queue, accessing HDFS files, scheduling Oozie 
> workflows, etc, which will then execute the service as the proxied user.
> Usually when these services check ACLs to determine if the user has access to 
> the requested resources, the service only needs to check the ACLs for the 
> proxied user. However, it is sometimes desirable to allow the proxied user to 
> have access to the resources when only the real user has open ACLs.
> For instance, let's say the user {{adm}} is the only user with submit ACLs to 
> the {{dataload}} queue, and the {{adm}} user wants to submit apps to the 
> {{dataload}} queue on behalf of users {{headless1}} and {{headless2}}. In 
> addition, we want to be able to bill {{headless1}} and {{headless2}} 
> separately for the YARN resources used in the {{dataload}} queue. In order to 
> do this, the apps need to run in the {{dataload}} queue as the respective 
> headless users. We could open up the ACLs to the {{dataload}} queue to allow 
> {{headless1}} and {{headless2}} to submit apps. But this would allow those 
> users to submit any app to that queue, and not be limited to just the data 
> loading apps, and we don't trust the {{headless1}} and {{headless2}} owners 
> to honor that restriction.
> This JIRA proposes that we define a way to set up ACLs to restrict a 
> resource's access to a  super-user, but when the access happens, run it as 
> the proxied user.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-17857) Check real user ACLs in addition to proxied user ACLs

2021-09-10 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17413365#comment-17413365
 ] 

Eric Payne commented on HADOOP-17857:
-

[~snemeth], Thanks a lot for the review and commit! Yes, I will open a 
follow-up for documenting this. Good catch.

> Check real user ACLs in addition to proxied user ACLs
> -
>
> Key: HADOOP-17857
> URL: https://issues.apache.org/jira/browse/HADOOP-17857
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 3.2.2, 2.10.1, 3.3.1
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HADOOP-17857.001.patch, HADOOP-17857.002.patch
>
>
> In a secure cluster, it is possible to configure the services to allow a 
> super-user to proxy to a regular user and perform actions on behalf of the 
> proxied user (see [Proxy user - Superusers Acting On Behalf Of Other 
> Users|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html]).
> This is useful for automating server access for multiple different users in a 
> multi-tenant cluster. For example, this can be used by a super user 
> submitting jobs to a YARN queue, accessing HDFS files, scheduling Oozie 
> workflows, etc, which will then execute the service as the proxied user.
> Usually when these services check ACLs to determine if the user has access to 
> the requested resources, the service only needs to check the ACLs for the 
> proxied user. However, it is sometimes desirable to allow the proxied user to 
> have access to the resources when only the real user has open ACLs.
> For instance, let's say the user {{adm}} is the only user with submit ACLs to 
> the {{dataload}} queue, and the {{adm}} user wants to submit apps to the 
> {{dataload}} queue on behalf of users {{headless1}} and {{headless2}}. In 
> addition, we want to be able to bill {{headless1}} and {{headless2}} 
> separately for the YARN resources used in the {{dataload}} queue. In order to 
> do this, the apps need to run in the {{dataload}} queue as the respective 
> headless users. We could open up the ACLs to the {{dataload}} queue to allow 
> {{headless1}} and {{headless2}} to submit apps. But this would allow those 
> users to submit any app to that queue, and not be limited to just the data 
> loading apps, and we don't trust the {{headless1}} and {{headless2}} owners 
> to honor that restriction.
> This JIRA proposes that we define a way to set up ACLs to restrict a 
> resource's access to a  super-user, but when the access happens, run it as 
> the proxied user.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-17857) Check real user ACLs in addition to proxied user ACLs

2021-08-25 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17404659#comment-17404659
 ] 

Eric Payne commented on HADOOP-17857:
-

[~zhoukang] , [~snemeth], I believe that this JIRA is the first step to 
addressing the requirements outlined in YARN-9975. Can you please review the 
changes and let me know what you think?

> Check real user ACLs in addition to proxied user ACLs
> -
>
> Key: HADOOP-17857
> URL: https://issues.apache.org/jira/browse/HADOOP-17857
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 3.2.2, 2.10.1, 3.3.1
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Attachments: HADOOP-17857.001.patch, HADOOP-17857.002.patch
>
>
> In a secure cluster, it is possible to configure the services to allow a 
> super-user to proxy to a regular user and perform actions on behalf of the 
> proxied user (see [Proxy user - Superusers Acting On Behalf Of Other 
> Users|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html]).
> This is useful for automating server access for multiple different users in a 
> multi-tenant cluster. For example, this can be used by a super user 
> submitting jobs to a YARN queue, accessing HDFS files, scheduling Oozie 
> workflows, etc, which will then execute the service as the proxied user.
> Usually when these services check ACLs to determine if the user has access to 
> the requested resources, the service only needs to check the ACLs for the 
> proxied user. However, it is sometimes desirable to allow the proxied user to 
> have access to the resources when only the real user has open ACLs.
> For instance, let's say the user {{adm}} is the only user with submit ACLs to 
> the {{dataload}} queue, and the {{adm}} user wants to submit apps to the 
> {{dataload}} queue on behalf of users {{headless1}} and {{headless2}}. In 
> addition, we want to be able to bill {{headless1}} and {{headless2}} 
> separately for the YARN resources used in the {{dataload}} queue. In order to 
> do this, the apps need to run in the {{dataload}} queue as the respective 
> headless users. We could open up the ACLs to the {{dataload}} queue to allow 
> {{headless1}} and {{headless2}} to submit apps. But this would allow those 
> users to submit any app to that queue, and not be limited to just the data 
> loading apps, and we don't trust the {{headless1}} and {{headless2}} owners 
> to honor that restriction.
> This JIRA proposes that we define a way to set up ACLs to restrict a 
> resource's access to a  super-user, but when the access happens, run it as 
> the proxied user.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-17857) Check real user ACLs in addition to proxied user ACLs

2021-08-25 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17404656#comment-17404656
 ] 

Eric Payne commented on HADOOP-17857:
-

Thanks [~ahussein] for the review and comments! I have attached v2 of the patch.



> Check real user ACLs in addition to proxied user ACLs
> -
>
> Key: HADOOP-17857
> URL: https://issues.apache.org/jira/browse/HADOOP-17857
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 3.2.2, 2.10.1, 3.3.1
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Attachments: HADOOP-17857.001.patch, HADOOP-17857.002.patch
>
>
> In a secure cluster, it is possible to configure the services to allow a 
> super-user to proxy to a regular user and perform actions on behalf of the 
> proxied user (see [Proxy user - Superusers Acting On Behalf Of Other 
> Users|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html]).
> This is useful for automating server access for multiple different users in a 
> multi-tenant cluster. For example, this can be used by a super user 
> submitting jobs to a YARN queue, accessing HDFS files, scheduling Oozie 
> workflows, etc, which will then execute the service as the proxied user.
> Usually when these services check ACLs to determine if the user has access to 
> the requested resources, the service only needs to check the ACLs for the 
> proxied user. However, it is sometimes desirable to allow the proxied user to 
> have access to the resources when only the real user has open ACLs.
> For instance, let's say the user {{adm}} is the only user with submit ACLs to 
> the {{dataload}} queue, and the {{adm}} user wants to submit apps to the 
> {{dataload}} queue on behalf of users {{headless1}} and {{headless2}}. In 
> addition, we want to be able to bill {{headless1}} and {{headless2}} 
> separately for the YARN resources used in the {{dataload}} queue. In order to 
> do this, the apps need to run in the {{dataload}} queue as the respective 
> headless users. We could open up the ACLs to the {{dataload}} queue to allow 
> {{headless1}} and {{headless2}} to submit apps. But this would allow those 
> users to submit any app to that queue, and not be limited to just the data 
> loading apps, and we don't trust the {{headless1}} and {{headless2}} owners 
> to honor that restriction.
> This JIRA proposes that we define a way to set up ACLs to restrict a 
> resource's access to a  super-user, but when the access happens, run it as 
> the proxied user.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-17857) Check real user ACLs in addition to proxied user ACLs

2021-08-25 Thread Eric Payne (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HADOOP-17857:

Attachment: HADOOP-17857.002.patch

> Check real user ACLs in addition to proxied user ACLs
> -
>
> Key: HADOOP-17857
> URL: https://issues.apache.org/jira/browse/HADOOP-17857
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 3.2.2, 2.10.1, 3.3.1
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Attachments: HADOOP-17857.001.patch, HADOOP-17857.002.patch
>
>
> In a secure cluster, it is possible to configure the services to allow a 
> super-user to proxy to a regular user and perform actions on behalf of the 
> proxied user (see [Proxy user - Superusers Acting On Behalf Of Other 
> Users|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html]).
> This is useful for automating server access for multiple different users in a 
> multi-tenant cluster. For example, this can be used by a super user 
> submitting jobs to a YARN queue, accessing HDFS files, scheduling Oozie 
> workflows, etc, which will then execute the service as the proxied user.
> Usually when these services check ACLs to determine if the user has access to 
> the requested resources, the service only needs to check the ACLs for the 
> proxied user. However, it is sometimes desirable to allow the proxied user to 
> have access to the resources when only the real user has open ACLs.
> For instance, let's say the user {{adm}} is the only user with submit ACLs to 
> the {{dataload}} queue, and the {{adm}} user wants to submit apps to the 
> {{dataload}} queue on behalf of users {{headless1}} and {{headless2}}. In 
> addition, we want to be able to bill {{headless1}} and {{headless2}} 
> separately for the YARN resources used in the {{dataload}} queue. In order to 
> do this, the apps need to run in the {{dataload}} queue as the respective 
> headless users. We could open up the ACLs to the {{dataload}} queue to allow 
> {{headless1}} and {{headless2}} to submit apps. But this would allow those 
> users to submit any app to that queue, and not be limited to just the data 
> loading apps, and we don't trust the {{headless1}} and {{headless2}} owners 
> to honor that restriction.
> This JIRA proposes that we define a way to set up ACLs to restrict a 
> resource's access to a  super-user, but when the access happens, run it as 
> the proxied user.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Assigned] (HADOOP-17857) Check real user ACLs in addition to proxied user ACLs

2021-08-19 Thread Eric Payne (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne reassigned HADOOP-17857:
---

Assignee: Eric Payne

> Check real user ACLs in addition to proxied user ACLs
> -
>
> Key: HADOOP-17857
> URL: https://issues.apache.org/jira/browse/HADOOP-17857
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 3.2.2, 2.10.1, 3.3.1
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Attachments: HADOOP-17857.001.patch
>
>
> In a secure cluster, it is possible to configure the services to allow a 
> super-user to proxy to a regular user and perform actions on behalf of the 
> proxied user (see [Proxy user - Superusers Acting On Behalf Of Other 
> Users|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html]).
> This is useful for automating server access for multiple different users in a 
> multi-tenant cluster. For example, this can be used by a super user 
> submitting jobs to a YARN queue, accessing HDFS files, scheduling Oozie 
> workflows, etc, which will then execute the service as the proxied user.
> Usually when these services check ACLs to determine if the user has access to 
> the requested resources, the service only needs to check the ACLs for the 
> proxied user. However, it is sometimes desirable to allow the proxied user to 
> have access to the resources when only the real user has open ACLs.
> For instance, let's say the user {{adm}} is the only user with submit ACLs to 
> the {{dataload}} queue, and the {{adm}} user wants to submit apps to the 
> {{dataload}} queue on behalf of users {{headless1}} and {{headless2}}. In 
> addition, we want to be able to bill {{headless1}} and {{headless2}} 
> separately for the YARN resources used in the {{dataload}} queue. In order to 
> do this, the apps need to run in the {{dataload}} queue as the respective 
> headless users. We could open up the ACLs to the {{dataload}} queue to allow 
> {{headless1}} and {{headless2}} to submit apps. But this would allow those 
> users to submit any app to that queue, and not be limited to just the data 
> loading apps, and we don't trust the {{headless1}} and {{headless2}} owners 
> to honor that restriction.
> This JIRA proposes that we define a way to set up ACLs to restrict a 
> resource's access to a  super-user, but when the access happens, run it as 
> the proxied user.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-17857) Check real user ACLs in addition to proxied user ACLs

2021-08-19 Thread Eric Payne (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HADOOP-17857:

Attachment: HADOOP-17857.001.patch

> Check real user ACLs in addition to proxied user ACLs
> -
>
> Key: HADOOP-17857
> URL: https://issues.apache.org/jira/browse/HADOOP-17857
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 3.2.2, 2.10.1, 3.3.1
>Reporter: Eric Payne
>Priority: Major
> Attachments: HADOOP-17857.001.patch
>
>
> In a secure cluster, it is possible to configure the services to allow a 
> super-user to proxy to a regular user and perform actions on behalf of the 
> proxied user (see [Proxy user - Superusers Acting On Behalf Of Other 
> Users|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html]).
> This is useful for automating server access for multiple different users in a 
> multi-tenant cluster. For example, this can be used by a super user 
> submitting jobs to a YARN queue, accessing HDFS files, scheduling Oozie 
> workflows, etc, which will then execute the service as the proxied user.
> Usually when these services check ACLs to determine if the user has access to 
> the requested resources, the service only needs to check the ACLs for the 
> proxied user. However, it is sometimes desirable to allow the proxied user to 
> have access to the resources when only the real user has open ACLs.
> For instance, let's say the user {{adm}} is the only user with submit ACLs to 
> the {{dataload}} queue, and the {{adm}} user wants to submit apps to the 
> {{dataload}} queue on behalf of users {{headless1}} and {{headless2}}. In 
> addition, we want to be able to bill {{headless1}} and {{headless2}} 
> separately for the YARN resources used in the {{dataload}} queue. In order to 
> do this, the apps need to run in the {{dataload}} queue as the respective 
> headless users. We could open up the ACLs to the {{dataload}} queue to allow 
> {{headless1}} and {{headless2}} to submit apps. But this would allow those 
> users to submit any app to that queue, and not be limited to just the data 
> loading apps, and we don't trust the {{headless1}} and {{headless2}} owners 
> to honor that restriction.
> This JIRA proposes that we define a way to set up ACLs to restrict a 
> resource's access to a  super-user, but when the access happens, run it as 
> the proxied user.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-17857) Check real user ACLs in addition to proxied user ACLs

2021-08-19 Thread Eric Payne (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HADOOP-17857:

Status: Patch Available  (was: Open)

> Check real user ACLs in addition to proxied user ACLs
> -
>
> Key: HADOOP-17857
> URL: https://issues.apache.org/jira/browse/HADOOP-17857
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 3.3.1, 2.10.1, 3.2.2
>Reporter: Eric Payne
>Priority: Major
> Attachments: HADOOP-17857.001.patch
>
>
> In a secure cluster, it is possible to configure the services to allow a 
> super-user to proxy to a regular user and perform actions on behalf of the 
> proxied user (see [Proxy user - Superusers Acting On Behalf Of Other 
> Users|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html]).
> This is useful for automating server access for multiple different users in a 
> multi-tenant cluster. For example, this can be used by a super user 
> submitting jobs to a YARN queue, accessing HDFS files, scheduling Oozie 
> workflows, etc, which will then execute the service as the proxied user.
> Usually when these services check ACLs to determine if the user has access to 
> the requested resources, the service only needs to check the ACLs for the 
> proxied user. However, it is sometimes desirable to allow the proxied user to 
> have access to the resources when only the real user has open ACLs.
> For instance, let's say the user {{adm}} is the only user with submit ACLs to 
> the {{dataload}} queue, and the {{adm}} user wants to submit apps to the 
> {{dataload}} queue on behalf of users {{headless1}} and {{headless2}}. In 
> addition, we want to be able to bill {{headless1}} and {{headless2}} 
> separately for the YARN resources used in the {{dataload}} queue. In order to 
> do this, the apps need to run in the {{dataload}} queue as the respective 
> headless users. We could open up the ACLs to the {{dataload}} queue to allow 
> {{headless1}} and {{headless2}} to submit apps. But this would allow those 
> users to submit any app to that queue, and not be limited to just the data 
> loading apps, and we don't trust the {{headless1}} and {{headless2}} owners 
> to honor that restriction.
> This JIRA proposes that we define a way to set up ACLs to restrict a 
> resource's access to a  super-user, but when the access happens, run it as 
> the proxied user.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-17857) Check real user ACLs in addition to proxied user ACLs

2021-08-19 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17401820#comment-17401820
 ] 

Eric Payne commented on HADOOP-17857:
-

I suggest that we define the ACLs so that a special character tells the 
AccessControlList system to check the ACLs for the real user and not those for 
the proxied user.

Let's take and example for submitting jobs in the Capacity Scheduler into the 
{{dataload}} queue. The use case is to only allow {{adm}} to submit to the 
{{dataload}} queue, but once the apps are submitted, they run as a proxy user 
(like {{headless1}}):
The following is the current syntax, and it only allows the {{adm}} user, as 
itself, to submit jobs to the {{dataload}} queue.
{code:xml}
  
yarn.scheduler.capacity.root.dataload.acl_submit_applications
adm
  
{code}
With this syntax, if the {{adm}} user proxies to {{headless1}} and submits the 
job, the Capacity Scheduler will reject the submission because {{headless1}} 
does not have submit ACL permissions.

*PROPOSED CHANGES:*
- Add a tilde (~) to the beginning of the {{adm}} user in the value section of 
the property.
In the above example, note the additon of the tilde (~):
{code:xml}
  
yarn.scheduler.capacity.root.dataload.acl_submit_applications
~adm
  
{code}
  - With the tilde (~}, any proxied user submitted by the {{adm}} user will be 
allowed to run in the {{dataload}} queue.
  - That same proxied user will _not_ be allowed to submit by themselves if 
they are not first proxied by {{adm}}.
  - NOTE: with this syntax, {{adm}} will not be able to directly submit as 
itself to the {{dataload}} queue. In order to both submit as {{adm}} and also 
allow an {{adm}}-proxied user to submit to the {{dataload}} queue, both 
{{~adm}} and {{adm}} must be specified, as follows:
{code:xml}
  
yarn.scheduler.capacity.root.dataload.acl_submit_applications
~adm,adm
  
{code}

This example could be extended to other ACL properties in other Hadoop systems.

We have been running with this change in production for over a year now, and it 
works well.


> Check real user ACLs in addition to proxied user ACLs
> -
>
> Key: HADOOP-17857
> URL: https://issues.apache.org/jira/browse/HADOOP-17857
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 3.2.2, 2.10.1, 3.3.1
>Reporter: Eric Payne
>Priority: Major
>
> In a secure cluster, it is possible to configure the services to allow a 
> super-user to proxy to a regular user and perform actions on behalf of the 
> proxied user (see [Proxy user - Superusers Acting On Behalf Of Other 
> Users|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html]).
> This is useful for automating server access for multiple different users in a 
> multi-tenant cluster. For example, this can be used by a super user 
> submitting jobs to a YARN queue, accessing HDFS files, scheduling Oozie 
> workflows, etc, which will then execute the service as the proxied user.
> Usually when these services check ACLs to determine if the user has access to 
> the requested resources, the service only needs to check the ACLs for the 
> proxied user. However, it is sometimes desirable to allow the proxied user to 
> have access to the resources when only the real user has open ACLs.
> For instance, let's say the user {{adm}} is the only user with submit ACLs to 
> the {{dataload}} queue, and the {{adm}} user wants to submit apps to the 
> {{dataload}} queue on behalf of users {{headless1}} and {{headless2}}. In 
> addition, we want to be able to bill {{headless1}} and {{headless2}} 
> separately for the YARN resources used in the {{dataload}} queue. In order to 
> do this, the apps need to run in the {{dataload}} queue as the respective 
> headless users. We could open up the ACLs to the {{dataload}} queue to allow 
> {{headless1}} and {{headless2}} to submit apps. But this would allow those 
> users to submit any app to that queue, and not be limited to just the data 
> loading apps, and we don't trust the {{headless1}} and {{headless2}} owners 
> to honor that restriction.
> This JIRA proposes that we define a way to set up ACLs to restrict a 
> resource's access to a  super-user, but when the access happens, run it as 
> the proxied user.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17857) Check real user ACLs in addition to proxied user ACLs

2021-08-19 Thread Eric Payne (Jira)
Eric Payne created HADOOP-17857:
---

 Summary: Check real user ACLs in addition to proxied user ACLs
 Key: HADOOP-17857
 URL: https://issues.apache.org/jira/browse/HADOOP-17857
 Project: Hadoop Common
  Issue Type: Improvement
Affects Versions: 3.3.1, 2.10.1, 3.2.2
Reporter: Eric Payne


In a secure cluster, it is possible to configure the services to allow a 
super-user to proxy to a regular user and perform actions on behalf of the 
proxied user (see [Proxy user - Superusers Acting On Behalf Of Other 
Users|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html]).

This is useful for automating server access for multiple different users in a 
multi-tenant cluster. For example, this can be used by a super user submitting 
jobs to a YARN queue, accessing HDFS files, scheduling Oozie workflows, etc, 
which will then execute the service as the proxied user.

Usually when these services check ACLs to determine if the user has access to 
the requested resources, the service only needs to check the ACLs for the 
proxied user. However, it is sometimes desirable to allow the proxied user to 
have access to the resources when only the real user has open ACLs.

For instance, let's say the user {{adm}} is the only user with submit ACLs to 
the {{dataload}} queue, and the {{adm}} user wants to submit apps to the 
{{dataload}} queue on behalf of users {{headless1}} and {{headless2}}. In 
addition, we want to be able to bill {{headless1}} and {{headless2}} separately 
for the YARN resources used in the {{dataload}} queue. In order to do this, the 
apps need to run in the {{dataload}} queue as the respective headless users. We 
could open up the ACLs to the {{dataload}} queue to allow {{headless1}} and 
{{headless2}} to submit apps. But this would allow those users to submit any 
app to that queue, and not be limited to just the data loading apps, and we 
don't trust the {{headless1}} and {{headless2}} owners to honor that 
restriction.

This JIRA proposes that we define a way to set up ACLs to restrict a resource's 
access to a  super-user, but when the access happens, run it as the proxied 
user.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-17346) Fair call queue is defeated by abusive service principals

2020-11-23 Thread Eric Payne (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne resolved HADOOP-17346.
-
Fix Version/s: 3.2.3
   3.1.5
   3.4.0
   3.3.1
   Resolution: Fixed

Thanks [~ahussein]. I committed to branch-3.1, branch-3.2, branch-3.3, and 
trunk.

> Fair call queue is defeated by abusive service principals
> -
>
> Key: HADOOP-17346
> URL: https://issues.apache.org/jira/browse/HADOOP-17346
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, ipc
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0, 3.1.5, 3.2.3
>
> Attachments: HADOOP-17346-branch-3.1.001.patch, 
> HADOOP-17346.branch-3.2.001.patch, HADOOP-17346.branch-3.3.001.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> [~daryn] reported  that the FCQ prioritizes based on the full kerberos 
> principal (ie. "user/host@realm") rather than short name (ie. "user") to 
> prevent service principals like the DNs and NMs being de-prioritized since 
> service principals are expected to be well behaved.  Notably the DNs 
> contribute a significant but important load so the intent is not to 
> de-prioritize all DNs because their sum total load is high relative to users.
> This has the unfortunate side effect of allowing misbehaving & non-critical 
> service principals to abuse the FCQ. The gstorm/* principals are a prime 
> example.   Each server is spamming opens as fast as possible which ensures 
> that none of the gstorm servers can be de-prioritized because each principal 
> is a fraction of the total load from all principals.
> The secondary and more devasting problem is other abusive non-service 
> principals cannot be effectively de-prioritized.  The sum total of all gstorm 
> load prevents other principals from surpassing the priority thresholds.  
> Principals stay in the highest priority queues which allows the abusive 
> principals to overflow the entire call queue for extended periods of time.  
> Notably it prevents the FCQ from moderating the heavy create loads from p_gup 
> @ DB which cause significant performance degradation.
> Prioritization should be based on short name with configurable exemptions for 
> services like the DN/NM.
> [~daryn] suggested a solution that we applied on our clusters.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-17346) Fair call queue is defeated by abusive service principals

2020-11-23 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237704#comment-17237704
 ] 

Eric Payne commented on HADOOP-17346:
-

bq. If there wasn't a specific reason for doing this, I would suggest we 
reverse them in branch-3.1 to match the other releases.
BTW, [~ahussein], you don't need to put up another patch. I'll do this as part 
of the commit.

> Fair call queue is defeated by abusive service principals
> -
>
> Key: HADOOP-17346
> URL: https://issues.apache.org/jira/browse/HADOOP-17346
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, ipc
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
>  Labels: pull-request-available
> Attachments: HADOOP-17346-branch-3.1.001.patch, 
> HADOOP-17346.branch-3.2.001.patch, HADOOP-17346.branch-3.3.001.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> [~daryn] reported  that the FCQ prioritizes based on the full kerberos 
> principal (ie. "user/host@realm") rather than short name (ie. "user") to 
> prevent service principals like the DNs and NMs being de-prioritized since 
> service principals are expected to be well behaved.  Notably the DNs 
> contribute a significant but important load so the intent is not to 
> de-prioritize all DNs because their sum total load is high relative to users.
> This has the unfortunate side effect of allowing misbehaving & non-critical 
> service principals to abuse the FCQ. The gstorm/* principals are a prime 
> example.   Each server is spamming opens as fast as possible which ensures 
> that none of the gstorm servers can be de-prioritized because each principal 
> is a fraction of the total load from all principals.
> The secondary and more devasting problem is other abusive non-service 
> principals cannot be effectively de-prioritized.  The sum total of all gstorm 
> load prevents other principals from surpassing the priority thresholds.  
> Principals stay in the highest priority queues which allows the abusive 
> principals to overflow the entire call queue for extended periods of time.  
> Notably it prevents the FCQ from moderating the heavy create loads from p_gup 
> @ DB which cause significant performance degradation.
> Prioritization should be based on short name with configurable exemptions for 
> services like the DN/NM.
> [~daryn] suggested a solution that we applied on our clusters.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-17346) Fair call queue is defeated by abusive service principals

2020-11-23 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237703#comment-17237703
 ] 

Eric Payne commented on HADOOP-17346:
-

Thanks [~ahussein] for the good work on this JIRA. I have committed to trunk, 
branch-3.3, and branch-3.2.

I have a question about the branch-3.1 patch. It seems that in branch 3.1, the 
arguments for {{DecayRpcScheduler#computePriorityLevel}} are reversed:
{panel:title=branch3.1}
+  private int computePriorityLevel(Object identity, long cost) {
{panel}
{panel:title=branch3.2}
+  private int computePriorityLevel(long cost, Object identity) {
{panel}
If there wasn't a specific reason for doing this, I would suggest we reverse 
them in branch-3.1 to match the other releases.

> Fair call queue is defeated by abusive service principals
> -
>
> Key: HADOOP-17346
> URL: https://issues.apache.org/jira/browse/HADOOP-17346
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, ipc
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
>  Labels: pull-request-available
> Attachments: HADOOP-17346-branch-3.1.001.patch, 
> HADOOP-17346.branch-3.2.001.patch, HADOOP-17346.branch-3.3.001.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> [~daryn] reported  that the FCQ prioritizes based on the full kerberos 
> principal (ie. "user/host@realm") rather than short name (ie. "user") to 
> prevent service principals like the DNs and NMs being de-prioritized since 
> service principals are expected to be well behaved.  Notably the DNs 
> contribute a significant but important load so the intent is not to 
> de-prioritize all DNs because their sum total load is high relative to users.
> This has the unfortunate side effect of allowing misbehaving & non-critical 
> service principals to abuse the FCQ. The gstorm/* principals are a prime 
> example.   Each server is spamming opens as fast as possible which ensures 
> that none of the gstorm servers can be de-prioritized because each principal 
> is a fraction of the total load from all principals.
> The secondary and more devasting problem is other abusive non-service 
> principals cannot be effectively de-prioritized.  The sum total of all gstorm 
> load prevents other principals from surpassing the priority thresholds.  
> Principals stay in the highest priority queues which allows the abusive 
> principals to overflow the entire call queue for extended periods of time.  
> Notably it prevents the FCQ from moderating the heavy create loads from p_gup 
> @ DB which cause significant performance degradation.
> Prioritization should be based on short name with configurable exemptions for 
> services like the DN/NM.
> [~daryn] suggested a solution that we applied on our clusters.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-17346) Fair call queue is defeated by abusive service principals

2020-11-20 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17236433#comment-17236433
 ] 

Eric Payne commented on HADOOP-17346:
-

The patch for branch-3.3 changes the signature for 
{{DecayRpcScheduler#computePriorityLevel}} to add an identity object:
{code:java}
-  private int computePriorityLevel(long cost) {
+  private int computePriorityLevel(long cost, Object identity) {
{code}
This signature change was done in trunk as part of HADOOP-17165. Since this fix 
needs that same signature change, we are faced with the following choices:
1) Backport HADOOP-17165 to branch-3.3
2) Make the same change in this patch.
I don't like to implement solution 2 because it makes changes hard to track. 
However, I don't know if we want the feature from HADOOP-17165 backported to 
branch-3.3.

[~tasanuma], is the service-user feature something we want backported to 
earlier branches? We would probably want it pulled back into at least 
branch-3.2. It backports cleanly to branch-3.3, but not quite cleanly to 3.2.

> Fair call queue is defeated by abusive service principals
> -
>
> Key: HADOOP-17346
> URL: https://issues.apache.org/jira/browse/HADOOP-17346
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, ipc
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
>  Labels: pull-request-available
> Attachments: HADOOP-17346-branch-3.1.001.patch, 
> HADOOP-17346.branch-3.2.001.patch, HADOOP-17346.branch-3.3.001.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> [~daryn] reported  that the FCQ prioritizes based on the full kerberos 
> principal (ie. "user/host@realm") rather than short name (ie. "user") to 
> prevent service principals like the DNs and NMs being de-prioritized since 
> service principals are expected to be well behaved.  Notably the DNs 
> contribute a significant but important load so the intent is not to 
> de-prioritize all DNs because their sum total load is high relative to users.
> This has the unfortunate side effect of allowing misbehaving & non-critical 
> service principals to abuse the FCQ. The gstorm/* principals are a prime 
> example.   Each server is spamming opens as fast as possible which ensures 
> that none of the gstorm servers can be de-prioritized because each principal 
> is a fraction of the total load from all principals.
> The secondary and more devasting problem is other abusive non-service 
> principals cannot be effectively de-prioritized.  The sum total of all gstorm 
> load prevents other principals from surpassing the priority thresholds.  
> Principals stay in the highest priority queues which allows the abusive 
> principals to overflow the entire call queue for extended periods of time.  
> Notably it prevents the FCQ from moderating the heavy create loads from p_gup 
> @ DB which cause significant performance degradation.
> Prioritization should be based on short name with configurable exemptions for 
> services like the DN/NM.
> [~daryn] suggested a solution that we applied on our clusters.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-17381) Ability to specify separate compression settings when intermediate and final output use the same codec

2020-11-17 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17233987#comment-17233987
 ] 

Eric Payne commented on HADOOP-17381:
-

One hacky way that might work and be extensible to all codecs is to create an 
explicit codec for intermediate outputs, like IntermediateOutputCodec, that has 
its own config namespace with a prefix, like io.compression.codec.intermediate. 
Users can then configure the same codec being used for the output codec but as 
a sub-codec to IntermediateOutputCodec. IntermediateOutputCodec would gather 
all configs with the io.compression.codec.intermediate prefix, strip the 
prefix, then reapply to a new Configuration that becomes the config object for 
the sub-codec, allowing it to have different settings than the output codec 
even though the same codec type is being used underneath.

> Ability to specify separate compression settings when intermediate and final 
> output use the same codec
> --
>
> Key: HADOOP-17381
> URL: https://issues.apache.org/jira/browse/HADOOP-17381
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Eric Payne
>Priority: Major
>
> The ZStandard codec may become a codec that users will want to use for both 
> intermediate data and for final output data yet specify different compression 
> levels for those use cases.
> It would be nice if there was a way we could create a "meta codec" like 
> IntermediateCodec that used conf prefix techniques, like Oozie does with 
> oozie.launcher for the Oozie launcher configs, to create a custom config 
> namespace of sorts for setting arbitrary codec settings specific to the 
> intermediate codec separate from the final output codec even if the same 
> underlying codec is used for both.
> However Codecs don't allow a configuration to be passed when obtaining a 
> codec stream, and I think we would have to bypass the CodecPool entirely to 
> be able to pass a custom conf to an arbitrary Codec.
> Another approach is to skip trying to generalize the solution and 
> specifically focus on ZStandard. It would be easy to create a wrapper codec 
> around the existing ZStandardCompressor and ZStandardDecompressor which take 
> the relevant parameters directly in their constructors.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17381) Ability to specify separate compression settings when intermediate and final output use the same codec

2020-11-17 Thread Eric Payne (Jira)
Eric Payne created HADOOP-17381:
---

 Summary: Ability to specify separate compression settings when 
intermediate and final output use the same codec
 Key: HADOOP-17381
 URL: https://issues.apache.org/jira/browse/HADOOP-17381
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Eric Payne


The ZStandard codec may become a codec that users will want to use for both 
intermediate data and for final output data yet specify different compression 
levels for those use cases.

It would be nice if there was a way we could create a "meta codec" like 
IntermediateCodec that used conf prefix techniques, like Oozie does with 
oozie.launcher for the Oozie launcher configs, to create a custom config 
namespace of sorts for setting arbitrary codec settings specific to the 
intermediate codec separate from the final output codec even if the same 
underlying codec is used for both.

However Codecs don't allow a configuration to be passed when obtaining a codec 
stream, and I think we would have to bypass the CodecPool entirely to be able 
to pass a custom conf to an arbitrary Codec.

Another approach is to skip trying to generalize the solution and specifically 
focus on ZStandard. It would be easy to create a wrapper codec around the 
existing ZStandardCompressor and ZStandardDecompressor which take the relevant 
parameters directly in their constructors.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-17346) Fair call queue is defeated by abusive service principals

2020-11-12 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17230881#comment-17230881
 ] 

Eric Payne commented on HADOOP-17346:
-

[~ahussein], I merged the PR to trunk. However, it does not backport quite 
cleanly to earlier 3.x branches. I think we want this in 3.3, 3.2, and 3.1. 
Please provide patches for those if we want this pulled back.

> Fair call queue is defeated by abusive service principals
> -
>
> Key: HADOOP-17346
> URL: https://issues.apache.org/jira/browse/HADOOP-17346
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, ipc
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> [~daryn] reported  that the FCQ prioritizes based on the full kerberos 
> principal (ie. "user/host@realm") rather than short name (ie. "user") to 
> prevent service principals like the DNs and NMs being de-prioritized since 
> service principals are expected to be well behaved.  Notably the DNs 
> contribute a significant but important load so the intent is not to 
> de-prioritize all DNs because their sum total load is high relative to users.
> This has the unfortunate side effect of allowing misbehaving & non-critical 
> service principals to abuse the FCQ. The gstorm/* principals are a prime 
> example.   Each server is spamming opens as fast as possible which ensures 
> that none of the gstorm servers can be de-prioritized because each principal 
> is a fraction of the total load from all principals.
> The secondary and more devasting problem is other abusive non-service 
> principals cannot be effectively de-prioritized.  The sum total of all gstorm 
> load prevents other principals from surpassing the priority thresholds.  
> Principals stay in the highest priority queues which allows the abusive 
> principals to overflow the entire call queue for extended periods of time.  
> Notably it prevents the FCQ from moderating the heavy create loads from p_gup 
> @ DB which cause significant performance degradation.
> Prioritization should be based on short name with configurable exemptions for 
> services like the DN/NM.
> [~daryn] suggested a solution that we applied on our clusters.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-17346) Fair call queue is defeated by abusive service principals

2020-11-11 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17230191#comment-17230191
 ] 

Eric Payne commented on HADOOP-17346:
-

[~ahussein], I have verified that these changes are consistent with what we 
have been running internally.

+1

> Fair call queue is defeated by abusive service principals
> -
>
> Key: HADOOP-17346
> URL: https://issues.apache.org/jira/browse/HADOOP-17346
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, ipc
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> [~daryn] reported  that the FCQ prioritizes based on the full kerberos 
> principal (ie. "user/host@realm") rather than short name (ie. "user") to 
> prevent service principals like the DNs and NMs being de-prioritized since 
> service principals are expected to be well behaved.  Notably the DNs 
> contribute a significant but important load so the intent is not to 
> de-prioritize all DNs because their sum total load is high relative to users.
> This has the unfortunate side effect of allowing misbehaving & non-critical 
> service principals to abuse the FCQ. The gstorm/* principals are a prime 
> example.   Each server is spamming opens as fast as possible which ensures 
> that none of the gstorm servers can be de-prioritized because each principal 
> is a fraction of the total load from all principals.
> The secondary and more devasting problem is other abusive non-service 
> principals cannot be effectively de-prioritized.  The sum total of all gstorm 
> load prevents other principals from surpassing the priority thresholds.  
> Principals stay in the highest priority queues which allows the abusive 
> principals to overflow the entire call queue for extended periods of time.  
> Notably it prevents the FCQ from moderating the heavy create loads from p_gup 
> @ DB which cause significant performance degradation.
> Prioritization should be based on short name with configurable exemptions for 
> services like the DN/NM.
> [~daryn] suggested a solution that we applied on our clusters.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-17373) hadoop-client-integration-tests doesn't work when building with skipShade

2020-11-11 Thread Eric Payne (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne resolved HADOOP-17373.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

> hadoop-client-integration-tests doesn't work when building with skipShade
> -
>
> Key: HADOOP-17373
> URL: https://issues.apache.org/jira/browse/HADOOP-17373
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Compiling with skipShade:
> {code}
> mvn clean install -Pdist -DskipShade -DskipTests -Dtar -Danimal.sniffer.skip 
> -Dmaven.javadoc.skip=true
> {code}
> fails with
> {code}
> [ERROR] 
> /Users/chao/git/hadoop/hadoop-client-modules/hadoop-client-integration-tests/src/test/java/org/apache/hadoop/example/ITUseMiniCluster.java:[47,37]
>  package org.apache.hadoop.yarn.server does not exist
> [ERROR] 
> /Users/chao/git/hadoop/hadoop-client-modules/hadoop-client-integration-tests/src/test/java/org/apache/hadoop/example/ITUseMiniCluster.java:[59,11]
>  cannot find symbol
> [ERROR]   symbol:   class MiniYARNCluster
> [ERROR]   location: class org.apache.hadoop.example.ITUseMiniCluster
> [ERROR] 
> /Users/chao/git/hadoop/hadoop-client-modules/hadoop-client-integration-tests/src/test/java/org/apache/hadoop/example/ITUseMiniCluster.java:[82,23]
>  cannot find symbol
> [ERROR]   symbol:   class MiniYARNCluster
> [ERROR]   location: class org.apache.hadoop.example.ITUseMiniCluster
> [ERROR] -> [Help 1]
> [ERROR]
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR]
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
> [ERROR]
> [ERROR] After correcting the problems, you can resume the build with the 
> command
> [ERROR]   mvn  -rf :hadoop-client-integration-tests
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-17373) hadoop-client-integration-tests doesn't work when building with skipShade

2020-11-11 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17230070#comment-17230070
 ] 

Eric Payne commented on HADOOP-17373:
-

+1. This fixes non-shaded compilation. I'll commit this shortly.

> hadoop-client-integration-tests doesn't work when building with skipShade
> -
>
> Key: HADOOP-17373
> URL: https://issues.apache.org/jira/browse/HADOOP-17373
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Compiling with skipShade:
> {code}
> mvn clean install -Pdist -DskipShade -DskipTests -Dtar -Danimal.sniffer.skip 
> -Dmaven.javadoc.skip=true
> {code}
> fails with
> {code}
> [ERROR] 
> /Users/chao/git/hadoop/hadoop-client-modules/hadoop-client-integration-tests/src/test/java/org/apache/hadoop/example/ITUseMiniCluster.java:[47,37]
>  package org.apache.hadoop.yarn.server does not exist
> [ERROR] 
> /Users/chao/git/hadoop/hadoop-client-modules/hadoop-client-integration-tests/src/test/java/org/apache/hadoop/example/ITUseMiniCluster.java:[59,11]
>  cannot find symbol
> [ERROR]   symbol:   class MiniYARNCluster
> [ERROR]   location: class org.apache.hadoop.example.ITUseMiniCluster
> [ERROR] 
> /Users/chao/git/hadoop/hadoop-client-modules/hadoop-client-integration-tests/src/test/java/org/apache/hadoop/example/ITUseMiniCluster.java:[82,23]
>  cannot find symbol
> [ERROR]   symbol:   class MiniYARNCluster
> [ERROR]   location: class org.apache.hadoop.example.ITUseMiniCluster
> [ERROR] -> [Help 1]
> [ERROR]
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR]
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
> [ERROR]
> [ERROR] After correcting the problems, you can resume the build with the 
> command
> [ERROR]   mvn  -rf :hadoop-client-integration-tests
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-17373) hadoop-client-integration-tests doesn't work when building with skipShade

2020-11-11 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17230015#comment-17230015
 ] 

Eric Payne commented on HADOOP-17373:
-

Thanks [~csun] for providing this fix. I'll compile with the changes and test 
it out.

> hadoop-client-integration-tests doesn't work when building with skipShade
> -
>
> Key: HADOOP-17373
> URL: https://issues.apache.org/jira/browse/HADOOP-17373
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Compiling with skipShade:
> {code}
> mvn clean install -Pdist -DskipShade -DskipTests -Dtar -Danimal.sniffer.skip 
> -Dmaven.javadoc.skip=true
> {code}
> fails with
> {code}
> [ERROR] 
> /Users/chao/git/hadoop/hadoop-client-modules/hadoop-client-integration-tests/src/test/java/org/apache/hadoop/example/ITUseMiniCluster.java:[47,37]
>  package org.apache.hadoop.yarn.server does not exist
> [ERROR] 
> /Users/chao/git/hadoop/hadoop-client-modules/hadoop-client-integration-tests/src/test/java/org/apache/hadoop/example/ITUseMiniCluster.java:[59,11]
>  cannot find symbol
> [ERROR]   symbol:   class MiniYARNCluster
> [ERROR]   location: class org.apache.hadoop.example.ITUseMiniCluster
> [ERROR] 
> /Users/chao/git/hadoop/hadoop-client-modules/hadoop-client-integration-tests/src/test/java/org/apache/hadoop/example/ITUseMiniCluster.java:[82,23]
>  cannot find symbol
> [ERROR]   symbol:   class MiniYARNCluster
> [ERROR]   location: class org.apache.hadoop.example.ITUseMiniCluster
> [ERROR] -> [Help 1]
> [ERROR]
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR]
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
> [ERROR]
> [ERROR] After correcting the problems, you can resume the build with the 
> command
> [ERROR]   mvn  -rf :hadoop-client-integration-tests
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-17324) Don't relocate org.bouncycastle in shaded client jars

2020-11-10 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17229572#comment-17229572
 ] 

Eric Payne commented on HADOOP-17324:
-

[~csun], [~dongjoon], and [~viirya],

After this JIRA was committed (revision # 
2522bf2f9b0c720eab099fef27bd3d22460ad5d0), I am seeing the following 
compilation errors:
{noformat}
[ERROR] 
/home/ericp/hadoop/source/current/orig/hadoop-client-modules/hadoop-client-integration-tests/src/test/java/org/apache/hadoop/example/ITUseMiniCluster.java:[59,11]
 cannot find symbol
[ERROR]   symbol:   class MiniYARNCluster
{noformat}
I'm using the following mvn command to build:
{noformat}
mvn install -Pdist -DskipShade -DskipTests -Dtar -Danimal.sniffer.skip 
-Dmaven.javadoc.skip=true
{noformat}

> Don't relocate org.bouncycastle in shaded client jars
> -
>
> Key: HADOOP-17324
> URL: https://issues.apache.org/jira/browse/HADOOP-17324
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> When downstream apps depend on {{hadoop-client-api}}, 
> {{hadoop-client-runtime}} and {{hadoop-client-minicluster}}, it seems the 
> {{MiniYARNCluster}} could have issue because 
> {{org.apache.hadoop.shaded.org.bouncycastle.operator.OperatorCreationException}}
>  is not in any of the above jars. 
> {code}
> Error:  Caused by: sbt.ForkMain$ForkError: java.lang.ClassNotFoundException: 
> org.apache.hadoop.shaded.org.bouncycastle.operator.OperatorCreationException
> Error:at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
> Error:at java.lang.ClassLoader.loadClass(ClassLoader.java:419)
> Error:at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
> Error:at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
> Error:at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:862)
> Error:at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> Error:at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:1296)
> Error:at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:339)
> Error:at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> Error:at 
> org.apache.hadoop.yarn.server.MiniYARNCluster.initResourceManager(MiniYARNCluster.java:353)
> Error:at 
> org.apache.hadoop.yarn.server.MiniYARNCluster.access$200(MiniYARNCluster.java:127)
> Error:at 
> org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceInit(MiniYARNCluster.java:488)
> Error:at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> Error:at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:109)
> Error:at 
> org.apache.hadoop.yarn.server.MiniYARNCluster.serviceInit(MiniYARNCluster.java:321)
> Error:at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> Error:at 
> org.apache.spark.deploy.yarn.BaseYarnClusterSuite.beforeAll(BaseYarnClusterSuite.scala:94)
> Error:at 
> org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212)
> Error:at 
> org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
> Error:at 
> org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
> Error:at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:61)
> Error:at 
> org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:318)
> Error:at 
> org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:513)
> Error:at sbt.ForkMain$Run.lambda$runTest$1(ForkMain.java:413)
> Error:at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> Error:at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> Error:at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> Error:at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-17346) Fair call queue is defeated by abusive service principals

2020-11-09 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17228760#comment-17228760
 ] 

Eric Payne commented on HADOOP-17346:
-

Thanks [~ahussein], for raising the issue and providing the patch.
 I am looking at the PR.

> Fair call queue is defeated by abusive service principals
> -
>
> Key: HADOOP-17346
> URL: https://issues.apache.org/jira/browse/HADOOP-17346
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, ipc
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> [~daryn] reported  that the FCQ prioritizes based on the full kerberos 
> principal (ie. "user/host@realm") rather than short name (ie. "user") to 
> prevent service principals like the DNs and NMs being de-prioritized since 
> service principals are expected to be well behaved.  Notably the DNs 
> contribute a significant but important load so the intent is not to 
> de-prioritize all DNs because their sum total load is high relative to users.
> This has the unfortunate side effect of allowing misbehaving & non-critical 
> service principals to abuse the FCQ. The gstorm/* principals are a prime 
> example.   Each server is spamming opens as fast as possible which ensures 
> that none of the gstorm servers can be de-prioritized because each principal 
> is a fraction of the total load from all principals.
> The secondary and more devasting problem is other abusive non-service 
> principals cannot be effectively de-prioritized.  The sum total of all gstorm 
> load prevents other principals from surpassing the priority thresholds.  
> Principals stay in the highest priority queues which allows the abusive 
> principals to overflow the entire call queue for extended periods of time.  
> Notably it prevents the FCQ from moderating the heavy create loads from p_gup 
> @ DB which cause significant performance degradation.
> Prioritization should be based on short name with configurable exemptions for 
> services like the DN/NM.
> [~daryn] suggested a solution that we applied on our clusters.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16096) HADOOP-15281/distcp -Xdirect needs to use commons-logging on 3.1

2019-02-07 Thread Eric Payne (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HADOOP-16096:

   Resolution: Fixed
Fix Version/s: 3.1.3
   Status: Resolved  (was: Patch Available)

I committed to branch-3.1.

> HADOOP-15281/distcp -Xdirect needs to use commons-logging on 3.1
> 
>
> Key: HADOOP-16096
> URL: https://issues.apache.org/jira/browse/HADOOP-16096
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.1.3
>Reporter: Eric Payne
>Assignee: Steve Loughran
>Priority: Critical
> Fix For: 3.1.3
>
> Attachments: HADOOP-15281-branch-3.1-001.patch
>
>
> HADOOP-15281 breaks the branch-3.1 build when building with java 1.8.
> {code:title="RetriableFileCopyCommand.java"}
> LOG.info("Copying {} to {}", source.getPath(), target);
> {code}
> Multiple lines have this error:
> {panel:title="Build Failure"}
> [ERROR] 
> hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/RetriableFileCopyCommand.java:[121,8]
>  no suitable method found for 
> info(java.lang.String,org.apache.hadoop.fs.Path,org.apache.hadoop.fs.Path)
> [ERROR] method org.apache.commons.logging.Log.info(java.lang.Object) is 
> not applicable
> [ERROR]   (actual and formal argument lists differ in length)
> [ERROR] method 
> org.apache.commons.logging.Log.info(java.lang.Object,java.lang.Throwable) is 
> not applicable
> [ERROR]   (actual and formal argument lists differ in length)
> {panel}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16096) HADOOP-15281/distcp -Xdirect needs to use commons-logging on 3.1

2019-02-07 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763164#comment-16763164
 ] 

Eric Payne commented on HADOOP-16096:
-

+1

Will commit shortly

> HADOOP-15281/distcp -Xdirect needs to use commons-logging on 3.1
> 
>
> Key: HADOOP-16096
> URL: https://issues.apache.org/jira/browse/HADOOP-16096
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.1.3
>Reporter: Eric Payne
>Assignee: Steve Loughran
>Priority: Critical
> Attachments: HADOOP-15281-branch-3.1-001.patch
>
>
> HADOOP-15281 breaks the branch-3.1 build when building with java 1.8.
> {code:title="RetriableFileCopyCommand.java"}
> LOG.info("Copying {} to {}", source.getPath(), target);
> {code}
> Multiple lines have this error:
> {panel:title="Build Failure"}
> [ERROR] 
> hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/RetriableFileCopyCommand.java:[121,8]
>  no suitable method found for 
> info(java.lang.String,org.apache.hadoop.fs.Path,org.apache.hadoop.fs.Path)
> [ERROR] method org.apache.commons.logging.Log.info(java.lang.Object) is 
> not applicable
> [ERROR]   (actual and formal argument lists differ in length)
> [ERROR] method 
> org.apache.commons.logging.Log.info(java.lang.Object,java.lang.Throwable) is 
> not applicable
> [ERROR]   (actual and formal argument lists differ in length)
> {panel}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16096) HADOOP-15281/distcp -Xdirect needs to use commons-logging on 3.1

2019-02-07 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763136#comment-16763136
 ] 

Eric Payne commented on HADOOP-16096:
-

Thanks a lot [~ste...@apache.org] for the quick turn around on the updated 
patch!

I will wait for the pre-commit build, but this patch builds on 3.1 and the 
changes look good to me. Is there someone else you would like to look at if?

> HADOOP-15281/distcp -Xdirect needs to use commons-logging on 3.1
> 
>
> Key: HADOOP-16096
> URL: https://issues.apache.org/jira/browse/HADOOP-16096
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.1.3
>Reporter: Eric Payne
>Assignee: Steve Loughran
>Priority: Critical
> Attachments: HADOOP-15281-branch-3.1-001.patch
>
>
> HADOOP-15281 breaks the branch-3.1 build when building with java 1.8.
> {code:title="RetriableFileCopyCommand.java"}
> LOG.info("Copying {} to {}", source.getPath(), target);
> {code}
> Multiple lines have this error:
> {panel:title="Build Failure"}
> [ERROR] 
> hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/RetriableFileCopyCommand.java:[121,8]
>  no suitable method found for 
> info(java.lang.String,org.apache.hadoop.fs.Path,org.apache.hadoop.fs.Path)
> [ERROR] method org.apache.commons.logging.Log.info(java.lang.Object) is 
> not applicable
> [ERROR]   (actual and formal argument lists differ in length)
> [ERROR] method 
> org.apache.commons.logging.Log.info(java.lang.Object,java.lang.Throwable) is 
> not applicable
> [ERROR]   (actual and formal argument lists differ in length)
> {panel}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16096) HADOOP-15281 breaks 3.1 build

2019-02-07 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763061#comment-16763061
 ] 

Eric Payne commented on HADOOP-16096:
-

I reverted HADOOP-15281 until a fix can be introduced.

> HADOOP-15281 breaks 3.1 build
> -
>
> Key: HADOOP-16096
> URL: https://issues.apache.org/jira/browse/HADOOP-16096
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.1.3
>Reporter: Eric Payne
>Priority: Critical
>
> HADOOP-15281 breaks the branch-3.1 build when building with java 1.8.
> {code:title="RetriableFileCopyCommand.java"}
> LOG.info("Copying {} to {}", source.getPath(), target);
> {code}
> Multiple lines have this error:
> {panel:title="Build Failure"}
> [ERROR] 
> hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/RetriableFileCopyCommand.java:[121,8]
>  no suitable method found for 
> info(java.lang.String,org.apache.hadoop.fs.Path,org.apache.hadoop.fs.Path)
> [ERROR] method org.apache.commons.logging.Log.info(java.lang.Object) is 
> not applicable
> [ERROR]   (actual and formal argument lists differ in length)
> [ERROR] method 
> org.apache.commons.logging.Log.info(java.lang.Object,java.lang.Throwable) is 
> not applicable
> [ERROR]   (actual and formal argument lists differ in length)
> {panel}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option

2019-02-07 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763045#comment-16763045
 ] 

Eric Payne commented on HADOOP-15281:
-

I'm going to revert this from 3.1 so that further development can continue.

> Distcp to add no-rename copy option
> ---
>
> Key: HADOOP-15281
> URL: https://issues.apache.org/jira/browse/HADOOP-15281
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Andrew Olson
>Priority: Major
> Fix For: 3.2.1, 3.1.3
>
> Attachments: HADOOP-15281-001.patch, HADOOP-15281-002.patch, 
> HADOOP-15281-003.patch, HADOOP-15281-004.patch
>
>
> Currently Distcp uploads a file by two strategies
> # append parts
> # copy to temp then rename
> option 2 executes the following sequence in {{promoteTmpToTarget}}
> {code}
> if ((fs.exists(target) && !fs.delete(target, false))
> || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent()))
> || !fs.rename(tmpTarget, target)) {
>   throw new IOException("Failed to promote tmp-file:" + tmpTarget
>   + " to: " + target);
> }
> {code}
> For any object store, that's a lot of HTTP requests; for S3A you are looking 
> at 12+ requests and an O(data) copy call. 
> This is not a good upload strategy for any store which manifests its output 
> atomically at the end of the write().
> Proposed: add a switch to write directly to the dest path, which can be 
> supplied as either a conf option (distcp.direct.write) or a CLI option 
> (-direct).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-16096) HADOOP-15281 breaks 3.1 build

2019-02-07 Thread Eric Payne (JIRA)
Eric Payne created HADOOP-16096:
---

 Summary: HADOOP-15281 breaks 3.1 build
 Key: HADOOP-16096
 URL: https://issues.apache.org/jira/browse/HADOOP-16096
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 3.1.3
Reporter: Eric Payne


HADOOP-15281 breaks the branch-3.1 build when building with java 1.8.
{code:title="RetriableFileCopyCommand.java"}
LOG.info("Copying {} to {}", source.getPath(), target);
{code}
Multiple lines have this error:
{panel:title="Build Failure"}
[ERROR] 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/RetriableFileCopyCommand.java:[121,8]
 no suitable method found for 
info(java.lang.String,org.apache.hadoop.fs.Path,org.apache.hadoop.fs.Path)
[ERROR] method org.apache.commons.logging.Log.info(java.lang.Object) is not 
applicable
[ERROR]   (actual and formal argument lists differ in length)
[ERROR] method 
org.apache.commons.logging.Log.info(java.lang.Object,java.lang.Throwable) is 
not applicable
[ERROR]   (actual and formal argument lists differ in length)
{panel}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15281) Distcp to add no-rename copy option

2019-02-07 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762957#comment-16762957
 ] 

Eric Payne commented on HADOOP-15281:
-

[~noslowerdna] and [~ste...@apache.org], this commit breaks the branch-3.1 
build when building with java 1.8.
{code:title="RetriableFileCopyCommand.java"}
LOG.info("Copying {} to {}", source.getPath(), target);
{code}
Multiple lines have this error:
{panel:title="Build Failure"}
[ERROR] 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/RetriableFileCopyCommand.java:[121,8]
 no suitable method found for 
info(java.lang.String,org.apache.hadoop.fs.Path,org.apache.hadoop.fs.Path)
[ERROR] method org.apache.commons.logging.Log.info(java.lang.Object) is not 
applicable
[ERROR]   (actual and formal argument lists differ in length)
[ERROR] method 
org.apache.commons.logging.Log.info(java.lang.Object,java.lang.Throwable) is 
not applicable
[ERROR]   (actual and formal argument lists differ in length)
{panel}

> Distcp to add no-rename copy option
> ---
>
> Key: HADOOP-15281
> URL: https://issues.apache.org/jira/browse/HADOOP-15281
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Andrew Olson
>Priority: Major
> Fix For: 3.2.1, 3.1.3
>
> Attachments: HADOOP-15281-001.patch, HADOOP-15281-002.patch, 
> HADOOP-15281-003.patch, HADOOP-15281-004.patch
>
>
> Currently Distcp uploads a file by two strategies
> # append parts
> # copy to temp then rename
> option 2 executes the following sequence in {{promoteTmpToTarget}}
> {code}
> if ((fs.exists(target) && !fs.delete(target, false))
> || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent()))
> || !fs.rename(tmpTarget, target)) {
>   throw new IOException("Failed to promote tmp-file:" + tmpTarget
>   + " to: " + target);
> }
> {code}
> For any object store, that's a lot of HTTP requests; for S3A you are looking 
> at 12+ requests and an O(data) copy call. 
> This is not a good upload strategy for any store which manifests its output 
> atomically at the end of the write().
> Proposed: add a switch to write directly to the dest path, which can be 
> supplied as either a conf option (distcp.direct.write) or a CLI option 
> (-direct).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15973) Configuration: Included properties are not cached if resource is a stream

2018-12-19 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725164#comment-16725164
 ] 

Eric Payne commented on HADOOP-15973:
-

bq. branch-2.8 will also have a different patch, if necessary.
Not necessary. This is not failing in branch-2.8.

bq. confirm the omission of quiet mode suppression in the new include handling 
was intentional
I agree that suppression of include handling is not desired.

> Configuration: Included properties are not cached if resource is a stream
> -
>
> Key: HADOOP-15973
> URL: https://issues.apache.org/jira/browse/HADOOP-15973
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Critical
> Attachments: HADOOP-15973.001.patch, HADOOP-15973.002.patch, 
> HADOOP-15973.003.branch-2.patch, HADOOP-15973.003.branch-3.0.patch, 
> HADOOP-15973.003.patch
>
>
> If a configuration resource is a bufferedinputstream and the resource has an 
> included xml file, the properties from the included file are read and stored 
> in the properties of the configuration, but they are not stored in the 
> resource cache. So, if a later resource is added to the config and the 
> properties are recalculated from the first resource, the included properties 
> are lost.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15973) Configuration: Included properties are not cached if resource is a stream

2018-12-19 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725097#comment-16725097
 ] 

Eric Payne commented on HADOOP-15973:
-

TestSSLFactory is not failing in my local environment.

Also, I uploaded a branch-2 patch for version 003. It backports cleanly and 
builds onto branch-2.9.

branch-2.8 will also have a different patch, if necessary.

> Configuration: Included properties are not cached if resource is a stream
> -
>
> Key: HADOOP-15973
> URL: https://issues.apache.org/jira/browse/HADOOP-15973
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Critical
> Attachments: HADOOP-15973.001.patch, HADOOP-15973.002.patch, 
> HADOOP-15973.003.branch-2.patch, HADOOP-15973.003.branch-3.0.patch, 
> HADOOP-15973.003.patch
>
>
> If a configuration resource is a bufferedinputstream and the resource has an 
> included xml file, the properties from the included file are read and stored 
> in the properties of the configuration, but they are not stored in the 
> resource cache. So, if a later resource is added to the config and the 
> properties are recalculated from the first resource, the included properties 
> are lost.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15973) Configuration: Included properties are not cached if resource is a stream

2018-12-19 Thread Eric Payne (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HADOOP-15973:

Attachment: HADOOP-15973.003.branch-2.patch

> Configuration: Included properties are not cached if resource is a stream
> -
>
> Key: HADOOP-15973
> URL: https://issues.apache.org/jira/browse/HADOOP-15973
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Critical
> Attachments: HADOOP-15973.001.patch, HADOOP-15973.002.patch, 
> HADOOP-15973.003.branch-2.patch, HADOOP-15973.003.branch-3.0.patch, 
> HADOOP-15973.003.patch
>
>
> If a configuration resource is a bufferedinputstream and the resource has an 
> included xml file, the properties from the included file are read and stored 
> in the properties of the configuration, but they are not stored in the 
> resource cache. So, if a later resource is added to the config and the 
> properties are recalculated from the first resource, the included properties 
> are lost.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15973) Configuration: Included properties are not cached if resource is a stream

2018-12-18 Thread Eric Payne (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HADOOP-15973:

Attachment: HADOOP-15973.003.branch-3.0.patch

> Configuration: Included properties are not cached if resource is a stream
> -
>
> Key: HADOOP-15973
> URL: https://issues.apache.org/jira/browse/HADOOP-15973
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Critical
> Attachments: HADOOP-15973.001.patch, HADOOP-15973.002.patch, 
> HADOOP-15973.003.branch-3.0.patch, HADOOP-15973.003.patch
>
>
> If a configuration resource is a bufferedinputstream and the resource has an 
> included xml file, the properties from the included file are read and stored 
> in the properties of the configuration, but they are not stored in the 
> resource cache. So, if a later resource is added to the config and the 
> properties are recalculated from the first resource, the included properties 
> are lost.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15973) Configuration: Included properties are not cached if resource is a stream

2018-12-18 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16724534#comment-16724534
 ] 

Eric Payne commented on HADOOP-15973:
-

Thanks [~jira.shegalov] for taking the time to review the code. As you can 
tell, this patch moved existing code to a utility method so it could be called 
in multiple places. As such, I'd rather not change the existing code as part of 
this patch. Thoughts?

> Configuration: Included properties are not cached if resource is a stream
> -
>
> Key: HADOOP-15973
> URL: https://issues.apache.org/jira/browse/HADOOP-15973
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Critical
> Attachments: HADOOP-15973.001.patch, HADOOP-15973.002.patch, 
> HADOOP-15973.003.patch
>
>
> If a configuration resource is a bufferedinputstream and the resource has an 
> included xml file, the properties from the included file are read and stored 
> in the properties of the configuration, but they are not stored in the 
> resource cache. So, if a later resource is added to the config and the 
> properties are recalculated from the first resource, the included properties 
> are lost.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15973) Configuration: Included properties are not cached if resource is a stream

2018-12-18 Thread Eric Payne (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HADOOP-15973:

Attachment: HADOOP-15973.003.patch

> Configuration: Included properties are not cached if resource is a stream
> -
>
> Key: HADOOP-15973
> URL: https://issues.apache.org/jira/browse/HADOOP-15973
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Critical
> Attachments: HADOOP-15973.001.patch, HADOOP-15973.002.patch, 
> HADOOP-15973.003.patch
>
>
> If a configuration resource is a bufferedinputstream and the resource has an 
> included xml file, the properties from the included file are read and stored 
> in the properties of the configuration, but they are not stored in the 
> resource cache. So, if a later resource is added to the config and the 
> properties are recalculated from the first resource, the included properties 
> are lost.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15973) Configuration: Included properties are not cached if resource is a stream

2018-12-18 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16724283#comment-16724283
 ] 

Eric Payne commented on HADOOP-15973:
-

Thanks a lot, [~jlowe]. I uploaded 003 with the suggested changes.

> Configuration: Included properties are not cached if resource is a stream
> -
>
> Key: HADOOP-15973
> URL: https://issues.apache.org/jira/browse/HADOOP-15973
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Critical
> Attachments: HADOOP-15973.001.patch, HADOOP-15973.002.patch, 
> HADOOP-15973.003.patch
>
>
> If a configuration resource is a bufferedinputstream and the resource has an 
> included xml file, the properties from the included file are read and stored 
> in the properties of the configuration, but they are not stored in the 
> resource cache. So, if a later resource is added to the config and the 
> properties are recalculated from the first resource, the included properties 
> are lost.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15973) Configuration: Included properties are not cached if resource is a stream

2018-12-17 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723447#comment-16723447
 ] 

Eric Payne commented on HADOOP-15973:
-

Attaching patch 002. This patch invokes a new parser when processing includes 
rather than loading a resource.

This should also fix HADOOP-16007.

> Configuration: Included properties are not cached if resource is a stream
> -
>
> Key: HADOOP-15973
> URL: https://issues.apache.org/jira/browse/HADOOP-15973
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Critical
> Attachments: HADOOP-15973.001.patch, HADOOP-15973.002.patch
>
>
> If a configuration resource is a bufferedinputstream and the resource has an 
> included xml file, the properties from the included file are read and stored 
> in the properties of the configuration, but they are not stored in the 
> resource cache. So, if a later resource is added to the config and the 
> properties are recalculated from the first resource, the included properties 
> are lost.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15973) Configuration: Included properties are not cached if resource is a stream

2018-12-17 Thread Eric Payne (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HADOOP-15973:

Attachment: HADOOP-15973.002.patch

> Configuration: Included properties are not cached if resource is a stream
> -
>
> Key: HADOOP-15973
> URL: https://issues.apache.org/jira/browse/HADOOP-15973
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Critical
> Attachments: HADOOP-15973.001.patch, HADOOP-15973.002.patch
>
>
> If a configuration resource is a bufferedinputstream and the resource has an 
> included xml file, the properties from the included file are read and stored 
> in the properties of the configuration, but they are not stored in the 
> resource cache. So, if a later resource is added to the config and the 
> properties are recalculated from the first resource, the included properties 
> are lost.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Assigned] (HADOOP-16007) Order of property settings is incorrect when includes are processed

2018-12-17 Thread Eric Payne (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne reassigned HADOOP-16007:
---

Assignee: Eric Payne

> Order of property settings is incorrect when includes are processed
> ---
>
> Key: HADOOP-16007
> URL: https://issues.apache.org/jira/browse/HADOOP-16007
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: conf
>Affects Versions: 3.2.0, 3.1.1, 3.0.4
>Reporter: Jason Lowe
>Assignee: Eric Payne
>Priority: Blocker
>
> If a configuration file contains a setting for a property then later includes 
> another file that also sets that property to a different value then the 
> property will be parsed incorrectly. For example, consider the following 
> configuration file:
> {noformat}
> http://www.w3.org/2001/XInclude;>
>  
>  myprop
>  val1
>  
> 
> 
> {noformat}
> with the contents of /some/other/file.xml as:
> {noformat}
>  
>myprop
>val2
>  
> {noformat}
> Parsing this configuration should result in myprop=val2, but it actually 
> results in myprop=val1.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15973) Configuration: Included properties are not cached if resource is a stream

2018-12-14 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721873#comment-16721873
 ] 

Eric Payne commented on HADOOP-15973:
-

One additional data point is that my manual tests do not show this problem in 
2.9 and 3.0, but the included unit test fails on 2.9 and 3.0 (as well as 3.1 
and 3.2) without any fix.

> Configuration: Included properties are not cached if resource is a stream
> -
>
> Key: HADOOP-15973
> URL: https://issues.apache.org/jira/browse/HADOOP-15973
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Critical
> Attachments: HADOOP-15973.001.patch
>
>
> If a configuration resource is a bufferedinputstream and the resource has an 
> included xml file, the properties from the included file are read and stored 
> in the properties of the configuration, but they are not stored in the 
> resource cache. So, if a later resource is added to the config and the 
> properties are recalculated from the first resource, the included properties 
> are lost.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15973) Configuration: Included properties are not cached if resource is a stream

2018-12-14 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721771#comment-16721771
 ] 

Eric Payne commented on HADOOP-15973:
-

[~sunilg], [~jlowe], [~ste...@apache.org], thanks for watching this JIRA. Do 
any of you want to review it?

> Configuration: Included properties are not cached if resource is a stream
> -
>
> Key: HADOOP-15973
> URL: https://issues.apache.org/jira/browse/HADOOP-15973
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Critical
> Attachments: HADOOP-15973.001.patch
>
>
> If a configuration resource is a bufferedinputstream and the resource has an 
> included xml file, the properties from the included file are read and stored 
> in the properties of the configuration, but they are not stored in the 
> resource cache. So, if a later resource is added to the config and the 
> properties are recalculated from the first resource, the included properties 
> are lost.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15973) Configuration: Included properties are not cached if resource is a stream

2018-12-06 Thread Eric Payne (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HADOOP-15973:

Attachment: HADOOP-15973.001.patch

> Configuration: Included properties are not cached if resource is a stream
> -
>
> Key: HADOOP-15973
> URL: https://issues.apache.org/jira/browse/HADOOP-15973
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Critical
> Attachments: HADOOP-15973.001.patch
>
>
> If a configuration resource is a bufferedinputstream and the resource has an 
> included xml file, the properties from the included file are read and stored 
> in the properties of the configuration, but they are not stored in the 
> resource cache. So, if a later resource is added to the config and the 
> properties are recalculated from the first resource, the included properties 
> are lost.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15973) Configuration: Included properties are not cached if resource is a stream

2018-12-06 Thread Eric Payne (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HADOOP-15973:

Status: Patch Available  (was: Open)

Submitted 001 version of the patch.

> Configuration: Included properties are not cached if resource is a stream
> -
>
> Key: HADOOP-15973
> URL: https://issues.apache.org/jira/browse/HADOOP-15973
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Critical
> Attachments: HADOOP-15973.001.patch
>
>
> If a configuration resource is a bufferedinputstream and the resource has an 
> included xml file, the properties from the included file are read and stored 
> in the properties of the configuration, but they are not stored in the 
> resource cache. So, if a later resource is added to the config and the 
> properties are recalculated from the first resource, the included properties 
> are lost.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15982) Support configurable trash location

2018-12-06 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16711653#comment-16711653
 ] 

Eric Payne commented on HADOOP-15982:
-

This JIRA is part of the wider discussion being done as part of HADOOP-7310.

> Support configurable trash location
> ---
>
> Key: HADOOP-15982
> URL: https://issues.apache.org/jira/browse/HADOOP-15982
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Affects Versions: 3.0.0
>Reporter: George Huang
>Assignee: George Huang
>Priority: Minor
>
> Currently some customer has users accounts that are functional ids (fid) to 
> manage application and application data under the path /data/FID. These fid's 
> also get a home directory under /user path. The user's home directories are 
> limited with space quota 60 G. When these fids delete data, due to customer 
> deletion policy they are placed in /user//.Trash location and run over 
> quota.
> For now they are increasing quotas for these functional users, but 
> considering growing applications they would like the .Trash location to be 
> configurable or something like  /trash/\{userid} that is owned by the user.
> What should the configurable path look like to make this happen? For example, 
> some thoughts may relate whether we want to configure it for per user or per 
> cluster, etc.
> Here is current behavior:
> fs.TrashPolicyDefault: Moved: 'hdfs://ns1/user/hdfs/test/1.txt to trash at: 
> hdfs://ns1/user/hdfs/.Trash/Current/user/hdfs/test/1.txt
> for path under encryption zone:
> fs.TrashPolicyDefault: Moved: 'hdfs://ns1/scale/2.txt' to trash at 
> hdfs://ns1/scale/.Trash/hdfs/Current/scale/2.txt
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Assigned] (HADOOP-15973) Configuration: Included properties are not cached if resource is a stream

2018-12-04 Thread Eric Payne (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne reassigned HADOOP-15973:
---

Assignee: Eric Payne

> Configuration: Included properties are not cached if resource is a stream
> -
>
> Key: HADOOP-15973
> URL: https://issues.apache.org/jira/browse/HADOOP-15973
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Critical
>
> If a configuration resource is a bufferedinputstream and the resource has an 
> included xml file, the properties from the included file are read and stored 
> in the properties of the configuration, but they are not stored in the 
> resource cache. So, if a later resource is added to the config and the 
> properties are recalculated from the first resource, the included properties 
> are lost.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15973) Configuration: Included properties are not cached if resource is a stream

2018-12-04 Thread Eric Payne (JIRA)
Eric Payne created HADOOP-15973:
---

 Summary: Configuration: Included properties are not cached if 
resource is a stream
 Key: HADOOP-15973
 URL: https://issues.apache.org/jira/browse/HADOOP-15973
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Eric Payne


If a configuration resource is a bufferedinputstream and the resource has an 
included xml file, the properties from the included file are read and stored in 
the properties of the configuration, but they are not stored in the resource 
cache. So, if a later resource is added to the config and the properties are 
recalculated from the first resource, the included properties are lost.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15548) Randomize local dirs

2018-06-29 Thread Eric Payne (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HADOOP-15548:

   Resolution: Fixed
Fix Version/s: 3.0.4
   2.8.5
   2.9.2
   3.1.1
   3.2.0
   2.10.0
   Status: Resolved  (was: Patch Available)

> Randomize local dirs
> 
>
> Key: HADOOP-15548
> URL: https://issues.apache.org/jira/browse/HADOOP-15548
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Fix For: 2.10.0, 3.2.0, 3.1.1, 2.9.2, 2.8.5, 3.0.4
>
> Attachments: HADOOP-15548-branch-2.001.patch, HADOOP-15548.001.patch, 
> HADOOP-15548.002.patch
>
>
> shuffle LOCAL_DIRS, LOG_DIRS and LOCAL_USER_DIRS when launching container. 
> Some applications will process these in exactly the same way in every 
> container (e.g. roundrobin) which can cause disks to get unnecessarily 
> overloaded (e.g. one output file written to first entry specified in the 
> environment variable).
> There are two paths for local dir allocation, depending on whether the size 
> is unknown or known.  The unknown path already uses a random algorithm.  The 
> known path initializes with a random starting point, and then goes 
> round-robin after that.  When selecting a dir, it increments the last used by 
> one and then checks sequentially until it finds a dir that satisfies the 
> request.  Proposal is to increment by a random value of between 1 and 
> num_dirs - 1, and then check sequentially from there.  This should result in 
> a more random selection in all cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15548) Randomize local dirs

2018-06-29 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16528211#comment-16528211
 ] 

Eric Payne commented on HADOOP-15548:
-

The precommit build failure is my fault. I should have waited for the precommit 
build to run before I committed the branch-2 patch. Sorry about that.

I've committed it to trunk, branch-3.1, branch-3.0, branch-2, branch-2.9, and 
branch-2.8

> Randomize local dirs
> 
>
> Key: HADOOP-15548
> URL: https://issues.apache.org/jira/browse/HADOOP-15548
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: HADOOP-15548-branch-2.001.patch, HADOOP-15548.001.patch, 
> HADOOP-15548.002.patch
>
>
> shuffle LOCAL_DIRS, LOG_DIRS and LOCAL_USER_DIRS when launching container. 
> Some applications will process these in exactly the same way in every 
> container (e.g. roundrobin) which can cause disks to get unnecessarily 
> overloaded (e.g. one output file written to first entry specified in the 
> environment variable).
> There are two paths for local dir allocation, depending on whether the size 
> is unknown or known.  The unknown path already uses a random algorithm.  The 
> known path initializes with a random starting point, and then goes 
> round-robin after that.  When selecting a dir, it increments the last used by 
> one and then checks sequentially until it finds a dir that satisfies the 
> request.  Proposal is to increment by a random value of between 1 and 
> num_dirs - 1, and then check sequentially from there.  This should result in 
> a more random selection in all cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15548) Randomize local dirs

2018-06-29 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16528090#comment-16528090
 ] 

Eric Payne commented on HADOOP-15548:
-

Hi [~Jim_Brennan]. I tried backporting this and building it in branch-2. It 
gets the following errors during build:
{code}
[ERROR] 
/home/ericp/hadoop/source/Apache/HADOOP-15548/branch-2/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestLocalDirAllocator.java:[284,5]
 cannot find symbol
  symbol:   method assumeNotWindows()
{code}
Can you please provide a 2.x patch?

> Randomize local dirs
> 
>
> Key: HADOOP-15548
> URL: https://issues.apache.org/jira/browse/HADOOP-15548
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: HADOOP-15548.001.patch, HADOOP-15548.002.patch
>
>
> shuffle LOCAL_DIRS, LOG_DIRS and LOCAL_USER_DIRS when launching container. 
> Some applications will process these in exactly the same way in every 
> container (e.g. roundrobin) which can cause disks to get unnecessarily 
> overloaded (e.g. one output file written to first entry specified in the 
> environment variable).
> There are two paths for local dir allocation, depending on whether the size 
> is unknown or known.  The unknown path already uses a random algorithm.  The 
> known path initializes with a random starting point, and then goes 
> round-robin after that.  When selecting a dir, it increments the last used by 
> one and then checks sequentially until it finds a dir that satisfies the 
> request.  Proposal is to increment by a random value of between 1 and 
> num_dirs - 1, and then check sequentially from there.  This should result in 
> a more random selection in all cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15548) Randomize local dirs

2018-06-29 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16528034#comment-16528034
 ] 

Eric Payne commented on HADOOP-15548:
-

Thanks [~Jim_Brennan].
+1
I will commit shortly

> Randomize local dirs
> 
>
> Key: HADOOP-15548
> URL: https://issues.apache.org/jira/browse/HADOOP-15548
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: HADOOP-15548.001.patch, HADOOP-15548.002.patch
>
>
> shuffle LOCAL_DIRS, LOG_DIRS and LOCAL_USER_DIRS when launching container. 
> Some applications will process these in exactly the same way in every 
> container (e.g. roundrobin) which can cause disks to get unnecessarily 
> overloaded (e.g. one output file written to first entry specified in the 
> environment variable).
> There are two paths for local dir allocation, depending on whether the size 
> is unknown or known.  The unknown path already uses a random algorithm.  The 
> known path initializes with a random starting point, and then goes 
> round-robin after that.  When selecting a dir, it increments the last used by 
> one and then checks sequentially until it finds a dir that satisfies the 
> request.  Proposal is to increment by a random value of between 1 and 
> num_dirs - 1, and then check sequentially from there.  This should result in 
> a more random selection in all cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15548) Randomize local dirs

2018-06-28 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16526758#comment-16526758
 ] 

Eric Payne commented on HADOOP-15548:
-

Thanks [~Jim_Brennan] for reporting this problem and providing the fix.

The patch looks fine, but I have one concern with the test. It succeeds even 
without changing {{/LocalDirAllocator}}. Can you please modify the test so that 
it failes with the original code? 

> Randomize local dirs
> 
>
> Key: HADOOP-15548
> URL: https://issues.apache.org/jira/browse/HADOOP-15548
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: HADOOP-15548.001.patch
>
>
> shuffle LOCAL_DIRS, LOG_DIRS and LOCAL_USER_DIRS when launching container. 
> Some applications will process these in exactly the same way in every 
> container (e.g. roundrobin) which can cause disks to get unnecessarily 
> overloaded (e.g. one output file written to first entry specified in the 
> environment variable).
> There are two paths for local dir allocation, depending on whether the size 
> is unknown or known.  The unknown path already uses a random algorithm.  The 
> known path initializes with a random starting point, and then goes 
> round-robin after that.  When selecting a dir, it increments the last used by 
> one and then checks sequentially until it finds a dir that satisfies the 
> request.  Proposal is to increment by a random value of between 1 and 
> num_dirs - 1, and then check sequentially from there.  This should result in 
> a more random selection in all cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Reopened] (HADOOP-9383) mvn clean compile fails without install goal

2017-09-08 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne reopened HADOOP-9383:


Something similar is definitely still happening on fedora and redhat. This is 
what I'm getting.
{noformat}
Could not find artifact 
org.apache.hadoop:hadoop-maven-plugins:jar:3.1.0-SNAPSHOT
{noformat}
I'm reopening the JIRA.

> mvn clean compile fails without install goal
> 
>
> Key: HADOOP-9383
> URL: https://issues.apache.org/jira/browse/HADOOP-9383
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha1
>Reporter: Arpit Agarwal
>
> 'mvn -Pnative-win clean compile' fails with the following error:
> [ERROR] Could not find goal 'protoc' in plugin 
> org.apache.hadoop:hadoop-maven-plugins:3.0.0-SNAPSHOT among available goals 
> -> [Help 1]
> The build succeeds if the install goal is specified.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-9747) Reduce unnecessary UGI synchronization

2017-08-25 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16141807#comment-16141807
 ] 

Eric Payne commented on HADOOP-9747:


[~daryn], I started to commit this, but I ran into a couple of issues:

# If this fix needs to go into branch-2.8, we may need a separate 2.8 patch. I 
tried applying the branch-2 patch to branch-2.8, and there were several 
conflicts in {{UserGroupInformation.java}}
# {{HADOOP-9747.2.branch-2.patch}} does not apply cleanly to branch-2. It's 
just a minor import conflict that I could fix myself, but as long as you need 
to address the branch-2.8 conflicts...

> Reduce unnecessary UGI synchronization
> --
>
> Key: HADOOP-9747
> URL: https://issues.apache.org/jira/browse/HADOOP-9747
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: security
>Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0-alpha1
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Critical
> Attachments: HADOOP-9747.2.branch-2.patch, HADOOP-9747.2.trunk.patch, 
> HADOOP-9747.branch-2.patch, HADOOP-9747.trunk.patch
>
>
> Jstacks of heavily loaded NNs show up to dozens of threads blocking in the 
> UGI.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-9747) Reduce unnecessary UGI synchronization

2017-08-23 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16138923#comment-16138923
 ] 

Eric Payne commented on HADOOP-9747:


[~daryn], Thanks for providing the fixes for the YARN tests.

+1. The patch LGTM. If there are no concerns, I will commit tomorrow afternoon.

> Reduce unnecessary UGI synchronization
> --
>
> Key: HADOOP-9747
> URL: https://issues.apache.org/jira/browse/HADOOP-9747
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: security
>Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0-alpha1
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Critical
> Attachments: HADOOP-9747.2.branch-2.patch, HADOOP-9747.2.trunk.patch, 
> HADOOP-9747.branch-2.patch, HADOOP-9747.trunk.patch
>
>
> Jstacks of heavily loaded NNs show up to dozens of threads blocking in the 
> UGI.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-9747) Reduce unnecessary UGI synchronization

2017-08-14 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16126644#comment-16126644
 ] 

Eric Payne commented on HADOOP-9747:


[~daryn], the following tests are failing with this patch and succeeding 
without it on trunk:

{noformat}
TestTokenClientRMService#testTokenRenewalByLoginUser
testTokenRenewalByLoginUser(org.apache.hadoop.yarn.server.resourcemanager.TestTokenClientRMService)
  Time elapsed: 0.043 sec  <<< ERROR!
java.lang.reflect.UndeclaredThrowableException: null
at 
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.renewDelegationToken(ClientRMService.java:1058)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestTokenClientRMService.checkTokenRenewal(TestTokenClientRMService.java:169)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestTokenClientRMService.access$500(TestTokenClientRMService.java:46)

TestRMDelegationTokens#testRMDTMasterKeyStateOnRollingMasterKey
testRMDTMasterKeyStateOnRollingMasterKey(org.apache.hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens)
  Time elapsed: 0.792 sec  <<< ERROR!
org.apache.hadoop.yarn.exceptions.YarnException: java.io.IOException: 
Delegation Token can be issued only with kerberos authentication
at 
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getDelegationToken(ClientRMService.java:1022)
at 
org.apache.hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens.testRMDTMasterKeyStateOnRollingMasterKey(TestRMDelegationTokens.java:102)
{noformat}

> Reduce unnecessary UGI synchronization
> --
>
> Key: HADOOP-9747
> URL: https://issues.apache.org/jira/browse/HADOOP-9747
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: security
>Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0-alpha1
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Critical
> Attachments: HADOOP-9747.2.branch-2.patch, HADOOP-9747.2.trunk.patch, 
> HADOOP-9747.branch-2.patch, HADOOP-9747.trunk.patch
>
>
> Jstacks of heavily loaded NNs show up to dozens of threads blocking in the 
> UGI.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14343) Wrong pid file name in error message when starting secure daemon

2017-07-31 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16108082#comment-16108082
 ] 

Eric Payne commented on HADOOP-14343:
-

[~boky01], thanks for the effort on this patch.

Patch LGTM. +1

[~aw], did you have anything you wanted to add?

> Wrong pid file name in error message when starting secure daemon
> 
>
> Key: HADOOP-14343
> URL: https://issues.apache.org/jira/browse/HADOOP-14343
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Andras Bokor
>Assignee: Andras Bokor
>Priority: Minor
> Attachments: HADOOP-14343.01.patch, HADOOP-14343.02.patch
>
>
> {code}# this is for the daemon pid creation
>   #shellcheck disable=SC2086
>   echo $! > "${jsvcpidfile}" 2>/dev/null
>   if [[ $? -gt 0 ]]; then
> hadoop_error "ERROR:  Cannot write ${daemonname} pid ${daemonpidfile}."
>   fi{code}
> It will log datanode's pid file instead of JSVC's pid file.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14320) TestIPC.testIpcWithReaderQueuing fails intermittently

2017-04-28 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HADOOP-14320:

   Resolution: Fixed
Fix Version/s: 3.0.0-alpha3
   2.8.1
   2.9.0
   Status: Resolved  (was: Patch Available)

> TestIPC.testIpcWithReaderQueuing fails intermittently
> -
>
> Key: HADOOP-14320
> URL: https://issues.apache.org/jira/browse/HADOOP-14320
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Fix For: 2.9.0, 2.8.1, 3.0.0-alpha3
>
> Attachments: HADOOP-14320.001.patch
>
>
> {noformat}
> org.mockito.exceptions.verification.TooLittleActualInvocations: 
> callQueueManager.put();
> Wanted 2 times:
> -> at org.apache.hadoop.ipc.TestIPC.checkBlocking(TestIPC.java:810)
> But was 1 time:
> -> at org.apache.hadoop.ipc.Server.queueCall(Server.java:2466)
>   at org.apache.hadoop.ipc.TestIPC.checkBlocking(TestIPC.java:810)
>   at 
> org.apache.hadoop.ipc.TestIPC.testIpcWithReaderQueuing(TestIPC.java:738)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14320) TestIPC.testIpcWithReaderQueuing fails intermittently

2017-04-28 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989379#comment-15989379
 ] 

Eric Payne commented on HADOOP-14320:
-

Thanks [~ebadger].
+1

> TestIPC.testIpcWithReaderQueuing fails intermittently
> -
>
> Key: HADOOP-14320
> URL: https://issues.apache.org/jira/browse/HADOOP-14320
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HADOOP-14320.001.patch
>
>
> {noformat}
> org.mockito.exceptions.verification.TooLittleActualInvocations: 
> callQueueManager.put();
> Wanted 2 times:
> -> at org.apache.hadoop.ipc.TestIPC.checkBlocking(TestIPC.java:810)
> But was 1 time:
> -> at org.apache.hadoop.ipc.Server.queueCall(Server.java:2466)
>   at org.apache.hadoop.ipc.TestIPC.checkBlocking(TestIPC.java:810)
>   at 
> org.apache.hadoop.ipc.TestIPC.testIpcWithReaderQueuing(TestIPC.java:738)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-12605) Fix intermittent failure of TestIPC.testIpcWithReaderQueuing

2017-01-04 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-12605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HADOOP-12605:

Fix Version/s: 2.8.0

Thanks [~iwasakims] for this fix.

I backported it to branch-2.8.

> Fix intermittent failure of TestIPC.testIpcWithReaderQueuing
> 
>
> Key: HADOOP-12605
> URL: https://issues.apache.org/jira/browse/HADOOP-12605
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: test
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Minor
> Fix For: 2.8.0, 2.9.0, 3.0.0-alpha1
>
> Attachments: HADOOP-12605.001.patch, HADOOP-12605.002.patch, 
> HADOOP-12605.003.patch, HADOOP-12605.004.patch, HADOOP-12605.005.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13671) Fix ClassFormatException in trunk build.

2016-09-30 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15536655#comment-15536655
 ] 

Eric Payne commented on HADOOP-13671:
-

Thanks [~kihwal]. +1
Committing to trunk.

> Fix ClassFormatException in trunk build.
> 
>
> Key: HADOOP-13671
> URL: https://issues.apache.org/jira/browse/HADOOP-13671
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
> Attachments: HADOOP-13671.patch
>
>
> The maven-project-info-reports-plugin version 2.7 depends on 
> maven-shared-jar-1.1, which uses bcel 5.2.  This does not work well with the 
> new lamda expression.  The 2.9 depends on maven-shared-jar-1.2, which works 
> around this problem by using the custom release of bcel 6.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-12418) TestRPC.testRPCInterruptedSimple fails intermittently

2016-08-30 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-12418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HADOOP-12418:

Fix Version/s: 2.7.4

Thanks [~kihwal] and [~steve_l] for your work on this issue. This patch 
backports cleanly to 2.7 (with only contextual diffs). We would like this fix 
in 2.7, so I backported it.

> TestRPC.testRPCInterruptedSimple fails intermittently
> -
>
> Key: HADOOP-12418
> URL: https://issues.apache.org/jira/browse/HADOOP-12418
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.0.0-alpha1
> Environment: Jenkins, Java 8
>Reporter: Steve Loughran
>Assignee: Kihwal Lee
> Fix For: 2.8.0, 2.7.4, 3.0.0-alpha1
>
> Attachments: HADOOP-12418.patch, HADOOP-12418.v2.patch
>
>
> Jenkins trunk + java 8 saw a failure of  
> {{TestRPC.testRPCInterruptedSimple}}; the interrupt wasn't picked up. Race in 
> test -or a surfacing of a bug in RPC where at some points interrupt 
> exceptions are not picked up?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-12582) Using BytesWritable's getLength() and getBytes() instead of get() and getSize()

2015-11-18 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012195#comment-15012195
 ] 

Eric Payne commented on HADOOP-12582:
-

bq. Honestly, we need a massive "remove usage of deprecated methods" patch for 
all of Hadoop.
I think it would be better to do it piecemeal. Easier to review, easier to test.

> Using BytesWritable's getLength() and getBytes() instead of get() and 
> getSize()
> ---
>
> Key: HADOOP-12582
> URL: https://issues.apache.org/jira/browse/HADOOP-12582
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Tsuyoshi Ozawa
>
> BytesWritable's deprecated methods,  get() and getSize(), are still used in 
> some tests: TestTFileSeek, TestTFileSeqFileComparison, TestSequenceFile, and 
> so on. We can also remove them if targeting this to 3.0.0
> https://builds.apache.org/job/PreCommit-HADOOP-Build/8084/artifact/patchprocess/diff-compile-javac-root-jdk1.7.0_85.txt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-10321) TestCompositeService should cover all enumerations of adding a service to a parent service

2015-05-08 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HADOOP-10321:

Labels: supportability test  (was: BB2015-05-TBR supportability test)

 TestCompositeService should cover all enumerations of adding a service to a 
 parent service
 --

 Key: HADOOP-10321
 URL: https://issues.apache.org/jira/browse/HADOOP-10321
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Ray Chiang
  Labels: supportability, test
 Attachments: HADOOP-10321-02.patch, HADOOP-10321-03.patch, 
 HADOOP-10321-04.patch, HADOOP10321-01.patch


 HADOOP-10085 fixes some synchronization issues in 
 CompositeService#addService(). The tests should cover all cases. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11802) DomainSocketWatcher thread terminates sometimes after there is an I/O error during requestShortCircuitShm

2015-04-22 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507815#comment-14507815
 ] 

Eric Payne commented on HADOOP-11802:
-

Thanks for the new patch, [~cmccabe].

I have verified that patch 003 still fixes the problem of the dying 
{{DomainSocketWatcher}} thread in my manual tests. I have also verified that 
the new unit test fails without the patch and succeeds with it.

+1 : LGTM

 DomainSocketWatcher thread terminates sometimes after there is an I/O error 
 during requestShortCircuitShm
 -

 Key: HADOOP-11802
 URL: https://issues.apache.org/jira/browse/HADOOP-11802
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Eric Payne
Assignee: Colin Patrick McCabe
 Attachments: HADOOP-11802.001.patch, HADOOP-11802.002.patch, 
 HADOOP-11802.003.patch


 In {{DataXceiver#requestShortCircuitShm}}, we attempt to recover from some 
 errors by closing the {{DomainSocket}}.  However, this violates the invariant 
 that the domain socket should never be closed when it is being managed by the 
 {{DomainSocketWatcher}}.  Instead, we should call {{shutdown}} on the 
 {{DomainSocket}}.  When this bug hits, it terminates the 
 {{DomainSocketWatcher}} thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11802) DomainSocketWatcher thread terminates sometimes after there is an I/O error during requestShortCircuitShm

2015-04-16 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498828#comment-14498828
 ] 

Eric Payne commented on HADOOP-11802:
-

[~cmccabe], Thanks very much for the patch.

I was able to manually verify that the patch fixed the problem we were 
encountering when {{DomainSocketWatcher}}'s main thread was dying. Using the 
same methods as used previously to generate the exception in 
{{DataXceiver#requestShortCircuitShm}}, I was able to verify that the main 
thread of {{DomainSocketWatcher}} remains running.

However, I don't think the unit test is verifying this use case. Here's what I 
did:
1. I patched branch-2 with {{HADOOP-11802.002.patch}}, built it, and ran the 
test for 
{{TestShortCircuitCache#testDataXceiverHandlesRequestShortCircuitShmFailure}}. 
This was successful.
2. I commented out the following code in {{DataXceiver#requestShortCircuitShm}}
{code}
  if ((!success)  releasedSocket) {
try {
  sock.shutdown();
} catch (IOException e) {
  LOG.warn(Failed to shut down socket in error handler, e);
}
  }
{code}
and replaced it with the original code:
{code}
  if ((!success)  (peer == null)) {
IOUtils.cleanup(null, sock);
  }
{code}
This also succeeded.

 DomainSocketWatcher thread terminates sometimes after there is an I/O error 
 during requestShortCircuitShm
 -

 Key: HADOOP-11802
 URL: https://issues.apache.org/jira/browse/HADOOP-11802
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Eric Payne
Assignee: Colin Patrick McCabe
 Attachments: HADOOP-11802.001.patch, HADOOP-11802.002.patch


 In {{DataXceiver#requestShortCircuitShm}}, we attempt to recover from some 
 errors by closing the {{DomainSocket}}.  However, this violates the invariant 
 that the domain socket should never be closed when it is being managed by the 
 {{DomainSocketWatcher}}.  Instead, we should call {{shutdown}} on the 
 {{DomainSocket}}.  When this bug hits, it terminates the 
 {{DomainSocketWatcher}} thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11802) DomainSocketWatcher#watcherThread can encounter IllegalStateException in finally block when calling sendCallback

2015-04-13 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14493182#comment-14493182
 ] 

Eric Payne commented on HADOOP-11802:
-

Thanks again, [~cmccabe], for your comments and taking time on this issue.

One thing to note is that just prior to these problems, a 195-second GC was 
taking place on the DN.

I added a catch of {{Throwable}} in the main thread of the 
{{DomainSocketWatcher}} and reproduced the problem. AFAICT, the following 
represents what is happening:

- Request for short circuit read is received
- {{DataXceiver#requestShortCircuitShm}} calls 
{{ShortCircuitRegistry#createNewMemorySegment}}, which creates a shared memory 
segment and associates it with the passed domain socket in the 
{{DomainSocketWatcher}}. Then, in that thread, {{createNewMemorySegment}} waits 
on that socket/shm entry in {{DomainSocketWatcher#add}}.
{code}
  public NewShmInfo createNewMemorySegment(String clientName,
...
watcher.add(sock, shm);
...
{code}
- It's at this point that things get confusing, and I'm still working on why 
this happens. The wait wakes up, but things are not normal, but it wasn't woken 
up because of an exception, either. You can tell that no exception was thrown 
inside {{createNewMemorySegment}} to wake it up because the following code goes 
on to call {{sendShmSuccessRespons}}, which is where the next bad thing happens:
{code}
public void requestShortCircuitShm(String clientName) throws IOException {
...
  try {
shmInfo = datanode.shortCircuitRegistry.
createNewMemorySegment(clientName, sock);
// After calling #{ShortCircuitRegistry#createNewMemorySegment}, the
// socket is managed by the DomainSocketWatcher, not the DataXceiver.
releaseSocket();
  } catch (UnsupportedOperationException e) {
sendShmErrorResponse(ERROR_UNSUPPORTED, 
This datanode has not been configured to support  +
short-circuit shared memory segments.);
return;
  } catch (IOException e) {
sendShmErrorResponse(ERROR,
Failed to create shared file descriptor:  + e.getMessage());
return;
  }
  sendShmSuccessResponse(sock, shmInfo);
...
{code}
- At this point, the call to {{sendShmSuccessResponse}} gets an exception:
{noformat}
2015-04-04 13:12:30,973 [DataXceiver for client 
unix:/home/gs/var/run/hdfs/dn_socket [Waiting for operation #1]]
  INFO DataNode.clienttrace: cliID: 
DFSClient_attempt_1427231924849_569269_m_002116_0_-161414780_1,
  src: 127.0.0.1, dest: 127.0.0.1, op: REQUEST_SHORT_CIRCUIT_SHM, shmId: 
n/a,
  srvID: a2d3bac0-e98b-4b73-a5a1-82c7eb557a7a, success: false
2015-04-04 13:12:30,984 [DataXceiver for client 
unix:/home/gs/var/run/hdfs/dn_socket [Waiting for operation #1]]
  ERROR datanode.DataNode: host.domain.com:1004:DataXceiver error processing
  REQUEST_SHORT_CIRCUIT_SHM operation  src: 
unix:/home/gs/var/run/hdfs/dn_socket dst: local
 
java.net.SocketException: write(2) error: Broken pipe
at org.apache.hadoop.net.unix.DomainSocket.writeArray0(Native Method)
at 
org.apache.hadoop.net.unix.DomainSocket.access$300(DomainSocket.java:45)
at 
org.apache.hadoop.net.unix.DomainSocket$DomainOutputStream.write(DomainSocket.java:601)
at 
com.google.protobuf.CodedOutputStream.refreshBuffer(CodedOutputStream.java:833)
at 
com.google.protobuf.CodedOutputStream.flush(CodedOutputStream.java:843)
at 
com.google.protobuf.AbstractMessageLite.writeDelimitedTo(AbstractMessageLite.java:91)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.sendShmSuccessResponse(DataXceiver.java:380)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:418)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:214)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:95)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
at java.lang.Thread.run(Thread.java:722)
{noformat}
- At this point, it bubbles back up to {{DataXceiver#requestShortCircuitShm}}, 
which cleans up, closing the socket:
{code}
...
  if ((!success)  (peer == null)) {
// If we failed to pass the shared memory segment to the client,
// close the UNIX domain socket now.  This will trigger the 
// DomainSocketWatcher callback, cleaning up the segment.
IOUtils.cleanup(null, sock);
  }
{code}
- Then, the main {{DomainSocketWatcher}} thread wakes up (after regular timeout 
interval has expired), and tries to call {{sendCallbackAndRemove}}, which 
encounters the following {{IllegalArgumentException}}:
{code}
  final Thread watcherThread = new Thread(new Runnable() {
...
while (true) {
  lock.lock();
  

[jira] [Commented] (HADOOP-11802) DomainSocketWatcher#watcherThread can encounter IllegalStateException in finally block when calling sendCallback

2015-04-08 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14485380#comment-14485380
 ] 

Eric Payne commented on HADOOP-11802:
-

Sorry, I just noticed that the following was the first exception in the series:
{noformat}
2015-04-02 11:48:09,866 [DataXceiver for client 
unix:/home/gs/var/run/hdfs/dn_socket [Waiting for operation #1]] ERROR 
datanode.DataNode: gsta70851.tan.ygrid.yahoo.com:1004:DataXceiver error 
processing REQUEST_SHORT_CIRCUIT_SHM operation  src: 
unix:/home/gs/var/run/hdfs/dn_socket dst: local
java.net.SocketException: write(2) error: Broken pipe
at org.apache.hadoop.net.unix.DomainSocket.writeArray0(Native Method)
at 
org.apache.hadoop.net.unix.DomainSocket.access$300(DomainSocket.java:45)
at 
org.apache.hadoop.net.unix.DomainSocket$DomainOutputStream.write(DomainSocket.java:601)
at 
com.google.protobuf.CodedOutputStream.refreshBuffer(CodedOutputStream.java:833)
at 
com.google.protobuf.CodedOutputStream.flush(CodedOutputStream.java:843)
at 
com.google.protobuf.AbstractMessageLite.writeDelimitedTo(AbstractMessageLite.java:91)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.sendShmSuccessResponse(DataXceiver.java:380)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:418)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:214)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:95)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
{noformat}


 DomainSocketWatcher#watcherThread can encounter IllegalStateException in 
 finally block when calling sendCallback
 

 Key: HADOOP-11802
 URL: https://issues.apache.org/jira/browse/HADOOP-11802
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Eric Payne
Assignee: Eric Payne

 In the main finally block of the {{DomainSocketWatcher#watcherThread}}, the 
 call to {{sendCallback}} can encounter an {{IllegalStateException}}, and 
 leave some cleanup tasks undone.
 {code}
   } finally {
 lock.lock();
 try {
   kick(); // allow the handler for notificationSockets[0] to read a 
 byte
   for (Entry entry : entries.values()) {
 // We do not remove from entries as we iterate, because that can
 // cause a ConcurrentModificationException.
 sendCallback(close, entries, fdSet, entry.getDomainSocket().fd);
   }
   entries.clear();
   fdSet.close();
 } finally {
   lock.unlock();
 }
   }
 {code}
 The exception causes {{watcherThread}} to skip the calls to 
 {{entries.clear()}} and {{fdSet.close()}}.
 {code}
 2015-04-02 11:48:09,941 [DataXceiver for client 
 unix:/home/gs/var/run/hdfs/dn_socket [Waiting for operation #1]] INFO 
 DataNode.clienttrace: cliID: DFSClient_NONMAPREDUCE_-807148576_1, src: 
 127.0.0.1, dest: 127.0.0.1, op: REQUEST_SHORT_CIRCUIT_SHM, shmId: n/a, srvID: 
 e6b6cdd7-1bf8-415f-a412-32d8493554df, success: false
 2015-04-02 11:48:09,941 [Thread-14] ERROR unix.DomainSocketWatcher: 
 Thread[Thread-14,5,main] terminating on unexpected exception
 java.lang.IllegalStateException: failed to remove 
 b845649551b6b1eab5c17f630e42489d
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:145)
 at 
 org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.removeShm(ShortCircuitRegistry.java:119)
 at 
 org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry$RegisteredShm.handle(ShortCircuitRegistry.java:102)
 at 
 org.apache.hadoop.net.unix.DomainSocketWatcher.sendCallback(DomainSocketWatcher.java:402)
 at 
 org.apache.hadoop.net.unix.DomainSocketWatcher.access$1100(DomainSocketWatcher.java:52)
 at 
 org.apache.hadoop.net.unix.DomainSocketWatcher$2.run(DomainSocketWatcher.java:522)
 at java.lang.Thread.run(Thread.java:722)
 {code}
 Please note that this is not a duplicate of HADOOP-11333, HADOOP-11604, or 
 HADOOP-10404. The cluster installation is running code with all of these 
 fixes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11802) DomainSocketWatcher#watcherThread can encounter IllegalStateException in finally block when calling sendCallback

2015-04-08 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14485289#comment-14485289
 ] 

Eric Payne commented on HADOOP-11802:
-

Thanks [~cmccabe] for your comment and interest in this issue.

This problem is happening in multiple different live clusters. Only a small 
percentage of datanodes are affected each day, but once they hit this and the 
threads pile up, the datanodes must be restarted.

The only 'terminating on' message in the DN log is coming from 
DomainSocketWatchers unhandled exception handler. That is, it's the one 
documented in the description above:
{quote}
{noformat}
2015-04-04 13:12:31,059 [Thread-12] ERROR unix.DomainSocketWatcher: 
Thread[Thread-12,5,main] terminating on unexpected exception
java.lang.IllegalStateException: failed to remove 
17e33191fa8238098d7d22142f5787e2
2015-04-02 11:48:09,941 [DataXceiver for client 
unix:/home/gs/var/run/hdfs/dn_socket [Waiting for operation #1]] INFO 
DataNode.clienttrace: cliID: DFSClient_NONMAPREDUCE_-807148576_1, src: 
127.0.0.1, dest: 127.0.0.1, op: REQUEST_SHORT_CIRCUIT_SHM, shmId: n/a, srvID: 
e6b6cdd7-1bf8-415f-a412-32d8493554df, success: false
2015-04-02 11:48:09,941 [Thread-14] ERROR unix.DomainSocketWatcher: 
Thread[Thread-14,5,main] terminating on unexpected exception
java.lang.IllegalStateException: failed to remove 
b845649551b6b1eab5c17f630e42489d
...
{noformat}
{quote}
However, as you pointed out, that is happening after something went wrong in 
the main try block of the watcher thread. Since I'm seeing neither 'terminating 
on InterruptedException' nor 'terminating on IOException', there must be some 
other exception occurring. However, the only reference in the DN log of 
{{DomainSocketWatcher}} is in the stack trace already mentioned.

However, just above the IllegalStateException stacktrace is the following that 
indicated a premature EOF occurred. There were several of these, but it's not 
clear that they are related to the reason why the DomainSocketWatcher exited.
Your input would be greatly appreciated.
{noformat}
2015-04-02 11:48:09,885 [DataXceiver for client 
DFSClient_attempt_1427231924849_569467_m_000135_0_346288762_1 at 
/xxx.xxx.xxx.xxx:41908 [Receiving block 
BP-658831282-xxx.xxx.xxx.xxx-1351509219914:blk_3365919992_1105804585360]] ERROR 
datanode.DataNode: gsta70851.tan.ygrid.yahoo.com:1004:DataXceiver error 
processing WRITE_BLOCK operation  src: /xxx.xxx.xxx.xxx:41908 dst: 
/xxx.xxx.xxx.xxx:1004
java.io.IOException: Premature EOF from inputStream
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:467)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:781)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:730)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
at java.lang.Thread.run(Thread.java:722)
{noformat}

 DomainSocketWatcher#watcherThread can encounter IllegalStateException in 
 finally block when calling sendCallback
 

 Key: HADOOP-11802
 URL: https://issues.apache.org/jira/browse/HADOOP-11802
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Eric Payne
Assignee: Eric Payne

 In the main finally block of the {{DomainSocketWatcher#watcherThread}}, the 
 call to {{sendCallback}} can encounter an {{IllegalStateException}}, and 
 leave some cleanup tasks undone.
 {code}
   } finally {
 lock.lock();
 try {
   kick(); // allow the handler for notificationSockets[0] to read a 
 byte
   for (Entry entry : entries.values()) {
 // We do not remove from entries as we iterate, because that can
 // cause a ConcurrentModificationException.
 sendCallback(close, entries, fdSet, entry.getDomainSocket().fd);
   }
   entries.clear();
   fdSet.close();
 } finally {
   lock.unlock();
 }
   }
 {code}
 The exception causes {{watcherThread}} to skip the calls to 
 {{entries.clear()}} and {{fdSet.close()}}.
 {code}

[jira] [Created] (HADOOP-11802) DomainSocketWatcher#watcherThread encounters IllegalStateException in finally block when calling sendCallback

2015-04-03 Thread Eric Payne (JIRA)
Eric Payne created HADOOP-11802:
---

 Summary: DomainSocketWatcher#watcherThread encounters 
IllegalStateException in finally block when calling sendCallback
 Key: HADOOP-11802
 URL: https://issues.apache.org/jira/browse/HADOOP-11802
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Eric Payne


In the main finally block of the {{DomainSocketWatcher#watcherThread}}, the 
call to {{sendCallback}} can encountering an {{IllegalStateException}}, and 
leave some cleanup tasks undone.

{code}
  } finally {
lock.lock();
try {
  kick(); // allow the handler for notificationSockets[0] to read a byte
  for (Entry entry : entries.values()) {
// We do not remove from entries as we iterate, because that can
// cause a ConcurrentModificationException.
sendCallback(close, entries, fdSet, entry.getDomainSocket().fd);
  }
  entries.clear();
  fdSet.close();
} finally {
  lock.unlock();
}
  }
{code}

The exception causes {{watcherThread}} to skip the calls to {{entries.clear()}} 
and {{fdSet.close()}}.

{code}
2015-04-02 11:48:09,941 [DataXceiver for client 
unix:/home/gs/var/run/hdfs/dn_socket [Waiting for operation #1]] INFO 
DataNode.clienttrace: cliID: DFSClient_NONMAPREDUCE_-807148576_1, src: 
127.0.0.1, dest: 127.0.0.1, op: REQUEST_SHORT_CIRCUIT_SHM, shmId: n/a, srvID: 
e6b6cdd7-1bf8-415f-a412-32d8493554df, success: false
2015-04-02 11:48:09,941 [Thread-14] ERROR unix.DomainSocketWatcher: 
Thread[Thread-14,5,main] terminating on unexpected exception
java.lang.IllegalStateException: failed to remove 
b845649551b6b1eab5c17f630e42489d
at 
com.google.common.base.Preconditions.checkState(Preconditions.java:145)
at 
org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.removeShm(ShortCircuitRegistry.java:119)
at 
org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry$RegisteredShm.handle(ShortCircuitRegistry.java:102)
at 
org.apache.hadoop.net.unix.DomainSocketWatcher.sendCallback(DomainSocketWatcher.java:402)
at 
org.apache.hadoop.net.unix.DomainSocketWatcher.access$1100(DomainSocketWatcher.java:52)
at 
org.apache.hadoop.net.unix.DomainSocketWatcher$2.run(DomainSocketWatcher.java:522)
at java.lang.Thread.run(Thread.java:722)
{code}

Please note that this is not a duplicate of HADOOP-11333, HADOOP-11604, or 
HADOOP-10404. The cluster installation is running code with all of these fixes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HADOOP-11802) DomainSocketWatcher#watcherThread encounters IllegalStateException in finally block when calling sendCallback

2015-04-03 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne reassigned HADOOP-11802:
---

Assignee: Eric Payne

 DomainSocketWatcher#watcherThread encounters IllegalStateException in finally 
 block when calling sendCallback
 -

 Key: HADOOP-11802
 URL: https://issues.apache.org/jira/browse/HADOOP-11802
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Eric Payne
Assignee: Eric Payne

 In the main finally block of the {{DomainSocketWatcher#watcherThread}}, the 
 call to {{sendCallback}} can encountering an {{IllegalStateException}}, and 
 leave some cleanup tasks undone.
 {code}
   } finally {
 lock.lock();
 try {
   kick(); // allow the handler for notificationSockets[0] to read a 
 byte
   for (Entry entry : entries.values()) {
 // We do not remove from entries as we iterate, because that can
 // cause a ConcurrentModificationException.
 sendCallback(close, entries, fdSet, entry.getDomainSocket().fd);
   }
   entries.clear();
   fdSet.close();
 } finally {
   lock.unlock();
 }
   }
 {code}
 The exception causes {{watcherThread}} to skip the calls to 
 {{entries.clear()}} and {{fdSet.close()}}.
 {code}
 2015-04-02 11:48:09,941 [DataXceiver for client 
 unix:/home/gs/var/run/hdfs/dn_socket [Waiting for operation #1]] INFO 
 DataNode.clienttrace: cliID: DFSClient_NONMAPREDUCE_-807148576_1, src: 
 127.0.0.1, dest: 127.0.0.1, op: REQUEST_SHORT_CIRCUIT_SHM, shmId: n/a, srvID: 
 e6b6cdd7-1bf8-415f-a412-32d8493554df, success: false
 2015-04-02 11:48:09,941 [Thread-14] ERROR unix.DomainSocketWatcher: 
 Thread[Thread-14,5,main] terminating on unexpected exception
 java.lang.IllegalStateException: failed to remove 
 b845649551b6b1eab5c17f630e42489d
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:145)
 at 
 org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.removeShm(ShortCircuitRegistry.java:119)
 at 
 org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry$RegisteredShm.handle(ShortCircuitRegistry.java:102)
 at 
 org.apache.hadoop.net.unix.DomainSocketWatcher.sendCallback(DomainSocketWatcher.java:402)
 at 
 org.apache.hadoop.net.unix.DomainSocketWatcher.access$1100(DomainSocketWatcher.java:52)
 at 
 org.apache.hadoop.net.unix.DomainSocketWatcher$2.run(DomainSocketWatcher.java:522)
 at java.lang.Thread.run(Thread.java:722)
 {code}
 Please note that this is not a duplicate of HADOOP-11333, HADOOP-11604, or 
 HADOOP-10404. The cluster installation is running code with all of these 
 fixes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11802) DomainSocketWatcher#watcherThread encounters IllegalStateException in finally block when calling sendCallback

2015-04-03 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14394675#comment-14394675
 ] 

Eric Payne commented on HADOOP-11802:
-

The place in {{sendCallback}} where it is encountering the exception is
{code}
if (entry.getHandler().handle(sock)) {
{code}

Once the {{IllegalStateException}} occurs, I am seeing 4069 datanode threads 
getting stuck in {{DomainSocketWatcher#add}} when {{DataXceiver}} is trying to 
request a new short circuit read. This is similar to the symptoms seen in 
HADOOP-11333, but, as I mentioned above, the cluster is already running with 
that fix.

Here is the stack trace from the stuck threads, for reference:
{noformat}
DataXceiver for client unix:/home/gs/var/run/hdfs/dn_socket [Waiting for operat
ion #1] daemon prio=10 tid=0x7fcbbcae1000 nid=0x498a waiting on condition [
0x7fcb61132000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  0xd06c3a78 (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at 
org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:323)
at 
org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:322)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:403)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:214)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:95)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
at java.lang.Thread.run(Thread.java:722)
{noformat}

 DomainSocketWatcher#watcherThread encounters IllegalStateException in finally 
 block when calling sendCallback
 -

 Key: HADOOP-11802
 URL: https://issues.apache.org/jira/browse/HADOOP-11802
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Eric Payne
Assignee: Eric Payne

 In the main finally block of the {{DomainSocketWatcher#watcherThread}}, the 
 call to {{sendCallback}} can encountering an {{IllegalStateException}}, and 
 leave some cleanup tasks undone.
 {code}
   } finally {
 lock.lock();
 try {
   kick(); // allow the handler for notificationSockets[0] to read a 
 byte
   for (Entry entry : entries.values()) {
 // We do not remove from entries as we iterate, because that can
 // cause a ConcurrentModificationException.
 sendCallback(close, entries, fdSet, entry.getDomainSocket().fd);
   }
   entries.clear();
   fdSet.close();
 } finally {
   lock.unlock();
 }
   }
 {code}
 The exception causes {{watcherThread}} to skip the calls to 
 {{entries.clear()}} and {{fdSet.close()}}.
 {code}
 2015-04-02 11:48:09,941 [DataXceiver for client 
 unix:/home/gs/var/run/hdfs/dn_socket [Waiting for operation #1]] INFO 
 DataNode.clienttrace: cliID: DFSClient_NONMAPREDUCE_-807148576_1, src: 
 127.0.0.1, dest: 127.0.0.1, op: REQUEST_SHORT_CIRCUIT_SHM, shmId: n/a, srvID: 
 e6b6cdd7-1bf8-415f-a412-32d8493554df, success: false
 2015-04-02 11:48:09,941 [Thread-14] ERROR unix.DomainSocketWatcher: 
 Thread[Thread-14,5,main] terminating on unexpected exception
 java.lang.IllegalStateException: failed to remove 
 b845649551b6b1eab5c17f630e42489d
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:145)
 at 
 org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.removeShm(ShortCircuitRegistry.java:119)
 at 
 org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry$RegisteredShm.handle(ShortCircuitRegistry.java:102)
 at 
 org.apache.hadoop.net.unix.DomainSocketWatcher.sendCallback(DomainSocketWatcher.java:402)
 at 
 org.apache.hadoop.net.unix.DomainSocketWatcher.access$1100(DomainSocketWatcher.java:52)
 at 
 org.apache.hadoop.net.unix.DomainSocketWatcher$2.run(DomainSocketWatcher.java:522)
 at java.lang.Thread.run(Thread.java:722)
 {code}
 Please note that this is not a duplicate of HADOOP-11333, HADOOP-11604, or 
 HADOOP-10404. The cluster installation is running code with all of these 
 fixes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11802) DomainSocketWatcher#watcherThread can encounter IllegalStateException in finally block when calling sendCallback

2015-04-03 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HADOOP-11802:

Summary: DomainSocketWatcher#watcherThread can encounter 
IllegalStateException in finally block when calling sendCallback  (was: 
DomainSocketWatcher#watcherThread encounters IllegalStateException in finally 
block when calling sendCallback)

 DomainSocketWatcher#watcherThread can encounter IllegalStateException in 
 finally block when calling sendCallback
 

 Key: HADOOP-11802
 URL: https://issues.apache.org/jira/browse/HADOOP-11802
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Eric Payne
Assignee: Eric Payne

 In the main finally block of the {{DomainSocketWatcher#watcherThread}}, the 
 call to {{sendCallback}} can encounter an {{IllegalStateException}}, and 
 leave some cleanup tasks undone.
 {code}
   } finally {
 lock.lock();
 try {
   kick(); // allow the handler for notificationSockets[0] to read a 
 byte
   for (Entry entry : entries.values()) {
 // We do not remove from entries as we iterate, because that can
 // cause a ConcurrentModificationException.
 sendCallback(close, entries, fdSet, entry.getDomainSocket().fd);
   }
   entries.clear();
   fdSet.close();
 } finally {
   lock.unlock();
 }
   }
 {code}
 The exception causes {{watcherThread}} to skip the calls to 
 {{entries.clear()}} and {{fdSet.close()}}.
 {code}
 2015-04-02 11:48:09,941 [DataXceiver for client 
 unix:/home/gs/var/run/hdfs/dn_socket [Waiting for operation #1]] INFO 
 DataNode.clienttrace: cliID: DFSClient_NONMAPREDUCE_-807148576_1, src: 
 127.0.0.1, dest: 127.0.0.1, op: REQUEST_SHORT_CIRCUIT_SHM, shmId: n/a, srvID: 
 e6b6cdd7-1bf8-415f-a412-32d8493554df, success: false
 2015-04-02 11:48:09,941 [Thread-14] ERROR unix.DomainSocketWatcher: 
 Thread[Thread-14,5,main] terminating on unexpected exception
 java.lang.IllegalStateException: failed to remove 
 b845649551b6b1eab5c17f630e42489d
 at 
 com.google.common.base.Preconditions.checkState(Preconditions.java:145)
 at 
 org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.removeShm(ShortCircuitRegistry.java:119)
 at 
 org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry$RegisteredShm.handle(ShortCircuitRegistry.java:102)
 at 
 org.apache.hadoop.net.unix.DomainSocketWatcher.sendCallback(DomainSocketWatcher.java:402)
 at 
 org.apache.hadoop.net.unix.DomainSocketWatcher.access$1100(DomainSocketWatcher.java:52)
 at 
 org.apache.hadoop.net.unix.DomainSocketWatcher$2.run(DomainSocketWatcher.java:522)
 at java.lang.Thread.run(Thread.java:722)
 {code}
 Please note that this is not a duplicate of HADOOP-11333, HADOOP-11604, or 
 HADOOP-10404. The cluster installation is running code with all of these 
 fixes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11802) DomainSocketWatcher#watcherThread encounters IllegalStateException in finally block when calling sendCallback

2015-04-03 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HADOOP-11802:

Description: 
In the main finally block of the {{DomainSocketWatcher#watcherThread}}, the 
call to {{sendCallback}} can encounter an {{IllegalStateException}}, and leave 
some cleanup tasks undone.

{code}
  } finally {
lock.lock();
try {
  kick(); // allow the handler for notificationSockets[0] to read a byte
  for (Entry entry : entries.values()) {
// We do not remove from entries as we iterate, because that can
// cause a ConcurrentModificationException.
sendCallback(close, entries, fdSet, entry.getDomainSocket().fd);
  }
  entries.clear();
  fdSet.close();
} finally {
  lock.unlock();
}
  }
{code}

The exception causes {{watcherThread}} to skip the calls to {{entries.clear()}} 
and {{fdSet.close()}}.

{code}
2015-04-02 11:48:09,941 [DataXceiver for client 
unix:/home/gs/var/run/hdfs/dn_socket [Waiting for operation #1]] INFO 
DataNode.clienttrace: cliID: DFSClient_NONMAPREDUCE_-807148576_1, src: 
127.0.0.1, dest: 127.0.0.1, op: REQUEST_SHORT_CIRCUIT_SHM, shmId: n/a, srvID: 
e6b6cdd7-1bf8-415f-a412-32d8493554df, success: false
2015-04-02 11:48:09,941 [Thread-14] ERROR unix.DomainSocketWatcher: 
Thread[Thread-14,5,main] terminating on unexpected exception
java.lang.IllegalStateException: failed to remove 
b845649551b6b1eab5c17f630e42489d
at 
com.google.common.base.Preconditions.checkState(Preconditions.java:145)
at 
org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.removeShm(ShortCircuitRegistry.java:119)
at 
org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry$RegisteredShm.handle(ShortCircuitRegistry.java:102)
at 
org.apache.hadoop.net.unix.DomainSocketWatcher.sendCallback(DomainSocketWatcher.java:402)
at 
org.apache.hadoop.net.unix.DomainSocketWatcher.access$1100(DomainSocketWatcher.java:52)
at 
org.apache.hadoop.net.unix.DomainSocketWatcher$2.run(DomainSocketWatcher.java:522)
at java.lang.Thread.run(Thread.java:722)
{code}

Please note that this is not a duplicate of HADOOP-11333, HADOOP-11604, or 
HADOOP-10404. The cluster installation is running code with all of these fixes.

  was:
In the main finally block of the {{DomainSocketWatcher#watcherThread}}, the 
call to {{sendCallback}} can encountering an {{IllegalStateException}}, and 
leave some cleanup tasks undone.

{code}
  } finally {
lock.lock();
try {
  kick(); // allow the handler for notificationSockets[0] to read a byte
  for (Entry entry : entries.values()) {
// We do not remove from entries as we iterate, because that can
// cause a ConcurrentModificationException.
sendCallback(close, entries, fdSet, entry.getDomainSocket().fd);
  }
  entries.clear();
  fdSet.close();
} finally {
  lock.unlock();
}
  }
{code}

The exception causes {{watcherThread}} to skip the calls to {{entries.clear()}} 
and {{fdSet.close()}}.

{code}
2015-04-02 11:48:09,941 [DataXceiver for client 
unix:/home/gs/var/run/hdfs/dn_socket [Waiting for operation #1]] INFO 
DataNode.clienttrace: cliID: DFSClient_NONMAPREDUCE_-807148576_1, src: 
127.0.0.1, dest: 127.0.0.1, op: REQUEST_SHORT_CIRCUIT_SHM, shmId: n/a, srvID: 
e6b6cdd7-1bf8-415f-a412-32d8493554df, success: false
2015-04-02 11:48:09,941 [Thread-14] ERROR unix.DomainSocketWatcher: 
Thread[Thread-14,5,main] terminating on unexpected exception
java.lang.IllegalStateException: failed to remove 
b845649551b6b1eab5c17f630e42489d
at 
com.google.common.base.Preconditions.checkState(Preconditions.java:145)
at 
org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.removeShm(ShortCircuitRegistry.java:119)
at 
org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry$RegisteredShm.handle(ShortCircuitRegistry.java:102)
at 
org.apache.hadoop.net.unix.DomainSocketWatcher.sendCallback(DomainSocketWatcher.java:402)
at 
org.apache.hadoop.net.unix.DomainSocketWatcher.access$1100(DomainSocketWatcher.java:52)
at 
org.apache.hadoop.net.unix.DomainSocketWatcher$2.run(DomainSocketWatcher.java:522)
at java.lang.Thread.run(Thread.java:722)
{code}

Please note that this is not a duplicate of HADOOP-11333, HADOOP-11604, or 
HADOOP-10404. The cluster installation is running code with all of these fixes.


 DomainSocketWatcher#watcherThread encounters IllegalStateException in finally 
 block when calling sendCallback
 -

 Key: HADOOP-11802
 URL: https://issues.apache.org/jira/browse/HADOOP-11802
 Project: Hadoop Common
 

[jira] [Updated] (HADOOP-11327) BloomFilter#not() omits the last bit, resulting in an incorrect filter

2015-01-16 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HADOOP-11327:

Attachment: HADOOP-11327.v2.txt

Thanks, [~jlowe], for the review and comments.

I have updated the test case in version 2 of the patch.

 BloomFilter#not() omits the last bit, resulting in an incorrect filter
 --

 Key: HADOOP-11327
 URL: https://issues.apache.org/jira/browse/HADOOP-11327
 Project: Hadoop Common
  Issue Type: Bug
  Components: util
Affects Versions: 2.5.1
Reporter: Tim Luo
Assignee: Eric Payne
Priority: Minor
 Attachments: HADOOP-11327.v1.txt, HADOOP-11327.v2.txt


 There's an off-by-one error in {{BloomFilter#not()}}:
 {{BloomFilter#not}} calls {{BitSet#flip(0, vectorSize - 1)}}, but according 
 to the javadoc for that method, {{toIndex}} is end-_exclusive_:
 {noformat}
 * @param  toIndex index after the last bit to flip
 {noformat}
 This means that the last bit in the bit array is not flipped.
 Specifically, this was discovered in the following scenario:
 1. A new/empty {{BloomFilter}} was created with vectorSize=7.
 2. Invoke {{bloomFilter.not()}}; now expecting a bloom filter with all 7 bits 
 (0 through 6) flipped to 1 and membershipTest(...) to always return true.
 3. However, membershipTest(...) was found to often not return true, and upon 
 inspection, the BitSet only had bits 0 through 5 flipped.
 The fix should be simple: remove the - 1 from the call to {{BitSet#flip}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (HADOOP-11327) BloomFilter#not() omits the last bit, resulting in an incorrect filter

2015-01-13 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HADOOP-11327 started by Eric Payne.
---
 BloomFilter#not() omits the last bit, resulting in an incorrect filter
 --

 Key: HADOOP-11327
 URL: https://issues.apache.org/jira/browse/HADOOP-11327
 Project: Hadoop Common
  Issue Type: Bug
  Components: util
Affects Versions: 2.5.1
Reporter: Tim Luo
Assignee: Eric Payne
Priority: Minor
 Attachments: HADOOP-11327.v1.txt


 There's an off-by-one error in {{BloomFilter#not()}}:
 {{BloomFilter#not}} calls {{BitSet#flip(0, vectorSize - 1)}}, but according 
 to the javadoc for that method, {{toIndex}} is end-_exclusive_:
 {noformat}
 * @param  toIndex index after the last bit to flip
 {noformat}
 This means that the last bit in the bit array is not flipped.
 Specifically, this was discovered in the following scenario:
 1. A new/empty {{BloomFilter}} was created with vectorSize=7.
 2. Invoke {{bloomFilter.not()}}; now expecting a bloom filter with all 7 bits 
 (0 through 6) flipped to 1 and membershipTest(...) to always return true.
 3. However, membershipTest(...) was found to often not return true, and upon 
 inspection, the BitSet only had bits 0 through 5 flipped.
 The fix should be simple: remove the - 1 from the call to {{BitSet#flip}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work stopped] (HADOOP-11327) BloomFilter#not() omits the last bit, resulting in an incorrect filter

2015-01-13 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HADOOP-11327 stopped by Eric Payne.
---
 BloomFilter#not() omits the last bit, resulting in an incorrect filter
 --

 Key: HADOOP-11327
 URL: https://issues.apache.org/jira/browse/HADOOP-11327
 Project: Hadoop Common
  Issue Type: Bug
  Components: util
Affects Versions: 2.5.1
Reporter: Tim Luo
Assignee: Eric Payne
Priority: Minor
 Attachments: HADOOP-11327.v1.txt


 There's an off-by-one error in {{BloomFilter#not()}}:
 {{BloomFilter#not}} calls {{BitSet#flip(0, vectorSize - 1)}}, but according 
 to the javadoc for that method, {{toIndex}} is end-_exclusive_:
 {noformat}
 * @param  toIndex index after the last bit to flip
 {noformat}
 This means that the last bit in the bit array is not flipped.
 Specifically, this was discovered in the following scenario:
 1. A new/empty {{BloomFilter}} was created with vectorSize=7.
 2. Invoke {{bloomFilter.not()}}; now expecting a bloom filter with all 7 bits 
 (0 through 6) flipped to 1 and membershipTest(...) to always return true.
 3. However, membershipTest(...) was found to often not return true, and upon 
 inspection, the BitSet only had bits 0 through 5 flipped.
 The fix should be simple: remove the - 1 from the call to {{BitSet#flip}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11327) BloomFilter#not() omits the last bit, resulting in an incorrect filter

2015-01-13 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HADOOP-11327:

Attachment: HADOOP-11327.v1.txt

Thanks [~tim.luo]. Here is patch, version 1.

 BloomFilter#not() omits the last bit, resulting in an incorrect filter
 --

 Key: HADOOP-11327
 URL: https://issues.apache.org/jira/browse/HADOOP-11327
 Project: Hadoop Common
  Issue Type: Bug
  Components: util
Affects Versions: 2.5.1
Reporter: Tim Luo
Assignee: Eric Payne
Priority: Minor
 Attachments: HADOOP-11327.v1.txt


 There's an off-by-one error in {{BloomFilter#not()}}:
 {{BloomFilter#not}} calls {{BitSet#flip(0, vectorSize - 1)}}, but according 
 to the javadoc for that method, {{toIndex}} is end-_exclusive_:
 {noformat}
 * @param  toIndex index after the last bit to flip
 {noformat}
 This means that the last bit in the bit array is not flipped.
 Specifically, this was discovered in the following scenario:
 1. A new/empty {{BloomFilter}} was created with vectorSize=7.
 2. Invoke {{bloomFilter.not()}}; now expecting a bloom filter with all 7 bits 
 (0 through 6) flipped to 1 and membershipTest(...) to always return true.
 3. However, membershipTest(...) was found to often not return true, and upon 
 inspection, the BitSet only had bits 0 through 5 flipped.
 The fix should be simple: remove the - 1 from the call to {{BitSet#flip}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work stopped] (HADOOP-11327) BloomFilter#not() omits the last bit, resulting in an incorrect filter

2015-01-13 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HADOOP-11327 stopped by Eric Payne.
---
 BloomFilter#not() omits the last bit, resulting in an incorrect filter
 --

 Key: HADOOP-11327
 URL: https://issues.apache.org/jira/browse/HADOOP-11327
 Project: Hadoop Common
  Issue Type: Bug
  Components: util
Affects Versions: 2.5.1
Reporter: Tim Luo
Assignee: Eric Payne
Priority: Minor
 Attachments: HADOOP-11327.v1.txt


 There's an off-by-one error in {{BloomFilter#not()}}:
 {{BloomFilter#not}} calls {{BitSet#flip(0, vectorSize - 1)}}, but according 
 to the javadoc for that method, {{toIndex}} is end-_exclusive_:
 {noformat}
 * @param  toIndex index after the last bit to flip
 {noformat}
 This means that the last bit in the bit array is not flipped.
 Specifically, this was discovered in the following scenario:
 1. A new/empty {{BloomFilter}} was created with vectorSize=7.
 2. Invoke {{bloomFilter.not()}}; now expecting a bloom filter with all 7 bits 
 (0 through 6) flipped to 1 and membershipTest(...) to always return true.
 3. However, membershipTest(...) was found to often not return true, and upon 
 inspection, the BitSet only had bits 0 through 5 flipped.
 The fix should be simple: remove the - 1 from the call to {{BitSet#flip}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11327) BloomFilter#not() omits the last bit, resulting in an incorrect filter

2015-01-13 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HADOOP-11327:

Target Version/s: 2.7.0
  Status: Patch Available  (was: Open)

 BloomFilter#not() omits the last bit, resulting in an incorrect filter
 --

 Key: HADOOP-11327
 URL: https://issues.apache.org/jira/browse/HADOOP-11327
 Project: Hadoop Common
  Issue Type: Bug
  Components: util
Affects Versions: 2.5.1
Reporter: Tim Luo
Assignee: Eric Payne
Priority: Minor
 Attachments: HADOOP-11327.v1.txt


 There's an off-by-one error in {{BloomFilter#not()}}:
 {{BloomFilter#not}} calls {{BitSet#flip(0, vectorSize - 1)}}, but according 
 to the javadoc for that method, {{toIndex}} is end-_exclusive_:
 {noformat}
 * @param  toIndex index after the last bit to flip
 {noformat}
 This means that the last bit in the bit array is not flipped.
 Specifically, this was discovered in the following scenario:
 1. A new/empty {{BloomFilter}} was created with vectorSize=7.
 2. Invoke {{bloomFilter.not()}}; now expecting a bloom filter with all 7 bits 
 (0 through 6) flipped to 1 and membershipTest(...) to always return true.
 3. However, membershipTest(...) was found to often not return true, and upon 
 inspection, the BitSet only had bits 0 through 5 flipped.
 The fix should be simple: remove the - 1 from the call to {{BitSet#flip}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (HADOOP-11327) BloomFilter#not() omits the last bit, resulting in an incorrect filter

2015-01-13 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HADOOP-11327 started by Eric Payne.
---
 BloomFilter#not() omits the last bit, resulting in an incorrect filter
 --

 Key: HADOOP-11327
 URL: https://issues.apache.org/jira/browse/HADOOP-11327
 Project: Hadoop Common
  Issue Type: Bug
  Components: util
Affects Versions: 2.5.1
Reporter: Tim Luo
Assignee: Eric Payne
Priority: Minor
 Attachments: HADOOP-11327.v1.txt


 There's an off-by-one error in {{BloomFilter#not()}}:
 {{BloomFilter#not}} calls {{BitSet#flip(0, vectorSize - 1)}}, but according 
 to the javadoc for that method, {{toIndex}} is end-_exclusive_:
 {noformat}
 * @param  toIndex index after the last bit to flip
 {noformat}
 This means that the last bit in the bit array is not flipped.
 Specifically, this was discovered in the following scenario:
 1. A new/empty {{BloomFilter}} was created with vectorSize=7.
 2. Invoke {{bloomFilter.not()}}; now expecting a bloom filter with all 7 bits 
 (0 through 6) flipped to 1 and membershipTest(...) to always return true.
 3. However, membershipTest(...) was found to often not return true, and upon 
 inspection, the BitSet only had bits 0 through 5 flipped.
 The fix should be simple: remove the - 1 from the call to {{BitSet#flip}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11327) BloomFilter#not() omits the last bit, resulting in an incorrect filter

2015-01-13 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276012#comment-14276012
 ] 

Eric Payne commented on HADOOP-11327:
-

Hi [~tim.luo]. I'm interested in seeing this issue resolved. Please let me know 
if you plan on working on it any time soon. Otherwise, would it be okay if I 
took it over?

 BloomFilter#not() omits the last bit, resulting in an incorrect filter
 --

 Key: HADOOP-11327
 URL: https://issues.apache.org/jira/browse/HADOOP-11327
 Project: Hadoop Common
  Issue Type: Bug
  Components: util
Affects Versions: 2.5.1
Reporter: Tim Luo
Assignee: Tim Luo
Priority: Minor

 There's an off-by-one error in {{BloomFilter#not()}}:
 {{BloomFilter#not}} calls {{BitSet#flip(0, vectorSize - 1)}}, but according 
 to the javadoc for that method, {{toIndex}} is end-_exclusive_:
 {noformat}
 * @param  toIndex index after the last bit to flip
 {noformat}
 This means that the last bit in the bit array is not flipped.
 Specifically, this was discovered in the following scenario:
 1. A new/empty {{BloomFilter}} was created with vectorSize=7.
 2. Invoke {{bloomFilter.not()}}; now expecting a bloom filter with all 7 bits 
 (0 through 6) flipped to 1 and membershipTest(...) to always return true.
 3. However, membershipTest(...) was found to often not return true, and upon 
 inspection, the BitSet only had bits 0 through 5 flipped.
 The fix should be simple: remove the - 1 from the call to {{BitSet#flip}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11012) hadoop fs -text of zero-length file causes EOFException

2014-08-28 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HADOOP-11012:


Status: Open  (was: Patch Available)

 hadoop fs -text of zero-length file causes EOFException
 ---

 Key: HADOOP-11012
 URL: https://issues.apache.org/jira/browse/HADOOP-11012
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 2.5.0
Reporter: Eric Payne
Assignee: Eric Payne
 Attachments: HDFS-6915.201408271824.txt, HDFS-6915.201408272144.txt


 List:
 $ $HADOOP_PREFIX/bin/hadoop fs -ls /user/ericp/foo
 -rw---   3 ericp hdfs  0 2014-08-22 16:37 /user/ericp/foo
 Cat:
 $ $HADOOP_PREFIX/bin/hadoop fs -cat /user/ericp/foo
 Text:
 $ $HADOOP_PREFIX/bin/hadoop fs -text /user/ericp/foo
 text: java.io.EOFException
   at java.io.DataInputStream.readShort(DataInputStream.java:315)
   at 
 org.apache.hadoop.fs.shell.Display$Text.getInputStream(Display.java:130)
   at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:98)
   at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:306)
   at 
 org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:278)
   at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:260)
   at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:244)
   at 
 org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:190)
   at org.apache.hadoop.fs.shell.Command.run(Command.java:154)
   at org.apache.hadoop.fs.FsShell.run(FsShell.java:287)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
   at org.apache.hadoop.fs.FsShell.main(FsShell.java:340)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-11012) hadoop fs -text of zero-length file causes EOFException

2014-08-28 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HADOOP-11012:


Status: Patch Available  (was: Open)

 hadoop fs -text of zero-length file causes EOFException
 ---

 Key: HADOOP-11012
 URL: https://issues.apache.org/jira/browse/HADOOP-11012
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 2.5.0
Reporter: Eric Payne
Assignee: Eric Payne
 Attachments: HDFS-6915.201408271824.txt, HDFS-6915.201408272144.txt, 
 HDFS-6915.201408282053.txt


 List:
 $ $HADOOP_PREFIX/bin/hadoop fs -ls /user/ericp/foo
 -rw---   3 ericp hdfs  0 2014-08-22 16:37 /user/ericp/foo
 Cat:
 $ $HADOOP_PREFIX/bin/hadoop fs -cat /user/ericp/foo
 Text:
 $ $HADOOP_PREFIX/bin/hadoop fs -text /user/ericp/foo
 text: java.io.EOFException
   at java.io.DataInputStream.readShort(DataInputStream.java:315)
   at 
 org.apache.hadoop.fs.shell.Display$Text.getInputStream(Display.java:130)
   at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:98)
   at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:306)
   at 
 org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:278)
   at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:260)
   at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:244)
   at 
 org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:190)
   at org.apache.hadoop.fs.shell.Command.run(Command.java:154)
   at org.apache.hadoop.fs.FsShell.run(FsShell.java:287)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
   at org.apache.hadoop.fs.FsShell.main(FsShell.java:340)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-11012) hadoop fs -text of zero-length file causes EOFException

2014-08-28 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HADOOP-11012:


Attachment: HDFS-6915.201408282053.txt

[~daryn], thank you for reviewing this patch. using {{fsshell.run}} might not 
be simple since that would entail creating a new output stream, setting the 
{{out}} instance variable for {{Display.Text}}, and then reading from that 
stream. 

However, with this patch, I was able to anonymously extend the {{Display.Text}} 
class and override the {{getInputStream}} method to be public, and then call 
{{getInputStream}} directly. Please let me know what you think.

 hadoop fs -text of zero-length file causes EOFException
 ---

 Key: HADOOP-11012
 URL: https://issues.apache.org/jira/browse/HADOOP-11012
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 2.5.0
Reporter: Eric Payne
Assignee: Eric Payne
 Attachments: HDFS-6915.201408271824.txt, HDFS-6915.201408272144.txt, 
 HDFS-6915.201408282053.txt


 List:
 $ $HADOOP_PREFIX/bin/hadoop fs -ls /user/ericp/foo
 -rw---   3 ericp hdfs  0 2014-08-22 16:37 /user/ericp/foo
 Cat:
 $ $HADOOP_PREFIX/bin/hadoop fs -cat /user/ericp/foo
 Text:
 $ $HADOOP_PREFIX/bin/hadoop fs -text /user/ericp/foo
 text: java.io.EOFException
   at java.io.DataInputStream.readShort(DataInputStream.java:315)
   at 
 org.apache.hadoop.fs.shell.Display$Text.getInputStream(Display.java:130)
   at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:98)
   at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:306)
   at 
 org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:278)
   at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:260)
   at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:244)
   at 
 org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:190)
   at org.apache.hadoop.fs.shell.Command.run(Command.java:154)
   at org.apache.hadoop.fs.FsShell.run(FsShell.java:287)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
   at org.apache.hadoop.fs.FsShell.main(FsShell.java:340)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-11012) hadoop fs -text of zero-length file causes EOFException

2014-08-27 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HADOOP-11012:


Target Version/s: 3.0.0, 2.6.0  (was: 2.6.0)

 hadoop fs -text of zero-length file causes EOFException
 ---

 Key: HADOOP-11012
 URL: https://issues.apache.org/jira/browse/HADOOP-11012
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 2.5.0
Reporter: Eric Payne
Assignee: Eric Payne
 Attachments: HDFS-6915.201408271824.txt


 List:
 $ $HADOOP_PREFIX/bin/hadoop fs -ls /user/ericp/foo
 -rw---   3 ericp hdfs  0 2014-08-22 16:37 /user/ericp/foo
 Cat:
 $ $HADOOP_PREFIX/bin/hadoop fs -cat /user/ericp/foo
 Text:
 $ $HADOOP_PREFIX/bin/hadoop fs -text /user/ericp/foo
 text: java.io.EOFException
   at java.io.DataInputStream.readShort(DataInputStream.java:315)
   at 
 org.apache.hadoop.fs.shell.Display$Text.getInputStream(Display.java:130)
   at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:98)
   at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:306)
   at 
 org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:278)
   at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:260)
   at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:244)
   at 
 org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:190)
   at org.apache.hadoop.fs.shell.Command.run(Command.java:154)
   at org.apache.hadoop.fs.FsShell.run(FsShell.java:287)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
   at org.apache.hadoop.fs.FsShell.main(FsShell.java:340)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-11012) hadoop fs -text of zero-length file causes EOFException

2014-08-27 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14112914#comment-14112914
 ] 

Eric Payne commented on HADOOP-11012:
-

[~jira.shegalov], thank you for reviewing this patch.
bq. Now you read the magic twice however. I woud change the original code 
just by enclosing the switch statement into try-catch-EOF.
Would it be sufficient to do as [~jlowe] suggests and save the magic bytes and 
switch on that?

 hadoop fs -text of zero-length file causes EOFException
 ---

 Key: HADOOP-11012
 URL: https://issues.apache.org/jira/browse/HADOOP-11012
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 2.5.0
Reporter: Eric Payne
Assignee: Eric Payne
 Attachments: HDFS-6915.201408271824.txt


 List:
 $ $HADOOP_PREFIX/bin/hadoop fs -ls /user/ericp/foo
 -rw---   3 ericp hdfs  0 2014-08-22 16:37 /user/ericp/foo
 Cat:
 $ $HADOOP_PREFIX/bin/hadoop fs -cat /user/ericp/foo
 Text:
 $ $HADOOP_PREFIX/bin/hadoop fs -text /user/ericp/foo
 text: java.io.EOFException
   at java.io.DataInputStream.readShort(DataInputStream.java:315)
   at 
 org.apache.hadoop.fs.shell.Display$Text.getInputStream(Display.java:130)
   at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:98)
   at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:306)
   at 
 org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:278)
   at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:260)
   at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:244)
   at 
 org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:190)
   at org.apache.hadoop.fs.shell.Command.run(Command.java:154)
   at org.apache.hadoop.fs.FsShell.run(FsShell.java:287)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
   at org.apache.hadoop.fs.FsShell.main(FsShell.java:340)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-11012) hadoop fs -text of zero-length file causes EOFException

2014-08-27 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated HADOOP-11012:


Attachment: HDFS-6915.201408272144.txt

[~jlowe], thank you for taking the time to review this patch.
{quote}
It would be more efficient and a bit clearer if we saved off the result of the 
initial readShort call and switched on that rather than throwing it away as a 
test read.
{quote}
This has been done in this new patch.
{quote}
There's a lot of duplication setting up the input stream in the unit tests, and 
it's probably worth it to factor this out.

Given there appears to be no overlap between the three added test cases 
(0-byte, 1-byte, and 2-byte files) it would be nice to put these in separate 
unit tests. Then the unit test that fails makes it obvious which test case is 
broken.
{quote}
In this latest patch, I created separate tests for each of the added use cases 
and combine the duplicate setup code into a separate method.

 hadoop fs -text of zero-length file causes EOFException
 ---

 Key: HADOOP-11012
 URL: https://issues.apache.org/jira/browse/HADOOP-11012
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 2.5.0
Reporter: Eric Payne
Assignee: Eric Payne
 Attachments: HDFS-6915.201408271824.txt, HDFS-6915.201408272144.txt


 List:
 $ $HADOOP_PREFIX/bin/hadoop fs -ls /user/ericp/foo
 -rw---   3 ericp hdfs  0 2014-08-22 16:37 /user/ericp/foo
 Cat:
 $ $HADOOP_PREFIX/bin/hadoop fs -cat /user/ericp/foo
 Text:
 $ $HADOOP_PREFIX/bin/hadoop fs -text /user/ericp/foo
 text: java.io.EOFException
   at java.io.DataInputStream.readShort(DataInputStream.java:315)
   at 
 org.apache.hadoop.fs.shell.Display$Text.getInputStream(Display.java:130)
   at org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:98)
   at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:306)
   at 
 org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:278)
   at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:260)
   at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:244)
   at 
 org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:190)
   at org.apache.hadoop.fs.shell.Command.run(Command.java:154)
   at org.apache.hadoop.fs.FsShell.run(FsShell.java:287)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
   at org.apache.hadoop.fs.FsShell.main(FsShell.java:340)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-8087) Paths that start with a double slash cause No filesystem for scheme: null errors

2014-04-11 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-8087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13966460#comment-13966460
 ] 

Eric Payne commented on HADOOP-8087:


[~daryn] and [~cmccabe] : 

I came across this issue as part of a 0.23 backlog review.

Will this issue be resolved in 0.23 or 2.0? If not, can we remove the 0.23.3 
and 2.0.0-alpha targets and leave this JIRA targeted for 3.0.0?

 Paths that start with a double slash cause No filesystem for scheme: null 
 errors
 --

 Key: HADOOP-8087
 URL: https://issues.apache.org/jira/browse/HADOOP-8087
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 0.23.0, 0.24.0
Reporter: Daryn Sharp
Assignee: Colin Patrick McCabe
 Attachments: HADOOP-8087.001.patch, HADOOP-8087.002.patch


 {{Path}} is incorrectly parsing {{//dir/path}} in a very unexpected way.  
 While it should translate to the directory {{$fs.default.name}/dir/path}}, it 
 instead discards the {{//dir}} and returns
 {{$fs.default.name/path}}.  The problem is {{Path}} is trying to parsing an 
 authority even when a scheme is not present.



--
This message was sent by Atlassian JIRA
(v6.2#6252)