[jira] [Commented] (HADOOP-10442) Group look-up can cause segmentation fault when certain JNI-based mapping module is used.

2014-03-31 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13955576#comment-13955576
 ] 

Kihwal Lee commented on HADOOP-10442:
-

[~cmccabe]:  I also think the version of nslcd we used is buggy.  The return 
code handling before your change was just masking it, but it likely had other 
side effects.  I observed many lookup timeouts in NN prior to crashes, while my 
own program calling the same libc functions running on the same box at the same 
time had no issue.  The nslcd lookup timeout was configured to be 20 seconds in 
/etc/nslcd.conf.

{panel}
12:15:21,106  WARN security.Groups: Potential performance problem:
getGroups(user=) took 20020 milliseconds.
 12:15:21,107  WARN security.UserGroupInformation: No groups available for user 

{panel}

bq. Also, looking at this more closely, I believe we mishandle the case where 
the user is a member of no groups. This would be a pretty odd configuration (I 
wonder if it's possible?).

Getting no groups after a successful getpwnam() can probably only happen when 
the user was removed in between the two calls. All other cases might be 
considered as errors.  I saw cases of an admin user getting permission refused 
for certain operations. It was fixed after the refresh command was issued.  It 
must have hit the no-group error when building the acl and the result was 
negatively cached. If it didn't do negative caching, user-level retries would 
have worked.

So, the solution might be letting the native code return 0 even on error 
conditions as you suggested, but making netgroup modules not do negative 
caching.  That's when a valid user name has no netgroups.

 Group look-up can cause segmentation fault when certain JNI-based mapping 
 module is used.
 -

 Key: HADOOP-10442
 URL: https://issues.apache.org/jira/browse/HADOOP-10442
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.3.0, 2.4.0
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Blocker
 Fix For: 3.0.0, 2.4.0, 2.5.0

 Attachments: HADOOP-10442.patch


 When JniBasedUnixGroupsNetgroupMapping or JniBasedUnixGroupsMapping is used, 
 we get segmentation fault very often. The same system ran 2.2 for months 
 without any problem, but as soon as upgrading to 2.3, it started crashing.  
 This resulted in multiple name node crashes per day.
 The server was running nslcd (nss-pam-ldapd-0.7.5-15.el6_3.2). We did not see 
 this problem on the servers running sssd. 
 There was one change in the C code and it modified the return code handling 
 after getgrouplist() call. If the function returns 0 or a negative value less 
 than -1, it will do realloc() instead of returning failure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10442) Group look-up can cause segmentation fault when certain JNI-based mapping module is used.

2014-03-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13950607#comment-13950607
 ] 

Hudson commented on HADOOP-10442:
-

FAILURE: Integrated in Hadoop-Yarn-trunk #523 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/523/])
HADOOP-10442. Group look-up can cause segmentation fault when certain JNI-based 
mapping module is used. (Kihwal Lee via jeagles) (jeagles: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1582451)
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/hadoop_user_info.c


 Group look-up can cause segmentation fault when certain JNI-based mapping 
 module is used.
 -

 Key: HADOOP-10442
 URL: https://issues.apache.org/jira/browse/HADOOP-10442
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.3.0, 2.4.0
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Blocker
 Fix For: 3.0.0, 2.4.0, 2.5.0

 Attachments: HADOOP-10442.patch


 When JniBasedUnixGroupsNetgroupMapping or JniBasedUnixGroupsMapping is used, 
 we get segmentation fault very often. The same system ran 2.2 for months 
 without any problem, but as soon as upgrading to 2.3, it started crashing.  
 This resulted in multiple name node crashes per day.
 The server was running nslcd (nss-pam-ldapd-0.7.5-15.el6_3.2). We did not see 
 this problem on the servers running sssd. 
 There was one change in the C code and it modified the return code handling 
 after getgrouplist() call. If the function returns 0 or a negative value less 
 than -1, it will do realloc() instead of returning failure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10442) Group look-up can cause segmentation fault when certain JNI-based mapping module is used.

2014-03-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13950702#comment-13950702
 ] 

Hudson commented on HADOOP-10442:
-

FAILURE: Integrated in Hadoop-Hdfs-trunk #1715 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1715/])
HADOOP-10442. Group look-up can cause segmentation fault when certain JNI-based 
mapping module is used. (Kihwal Lee via jeagles) (jeagles: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1582451)
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/hadoop_user_info.c


 Group look-up can cause segmentation fault when certain JNI-based mapping 
 module is used.
 -

 Key: HADOOP-10442
 URL: https://issues.apache.org/jira/browse/HADOOP-10442
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.3.0, 2.4.0
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Blocker
 Fix For: 3.0.0, 2.4.0, 2.5.0

 Attachments: HADOOP-10442.patch


 When JniBasedUnixGroupsNetgroupMapping or JniBasedUnixGroupsMapping is used, 
 we get segmentation fault very often. The same system ran 2.2 for months 
 without any problem, but as soon as upgrading to 2.3, it started crashing.  
 This resulted in multiple name node crashes per day.
 The server was running nslcd (nss-pam-ldapd-0.7.5-15.el6_3.2). We did not see 
 this problem on the servers running sssd. 
 There was one change in the C code and it modified the return code handling 
 after getgrouplist() call. If the function returns 0 or a negative value less 
 than -1, it will do realloc() instead of returning failure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10442) Group look-up can cause segmentation fault when certain JNI-based mapping module is used.

2014-03-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13950708#comment-13950708
 ] 

Hudson commented on HADOOP-10442:
-

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1740 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1740/])
HADOOP-10442. Group look-up can cause segmentation fault when certain JNI-based 
mapping module is used. (Kihwal Lee via jeagles) (jeagles: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1582451)
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/hadoop_user_info.c


 Group look-up can cause segmentation fault when certain JNI-based mapping 
 module is used.
 -

 Key: HADOOP-10442
 URL: https://issues.apache.org/jira/browse/HADOOP-10442
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.3.0, 2.4.0
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Blocker
 Fix For: 3.0.0, 2.4.0, 2.5.0

 Attachments: HADOOP-10442.patch


 When JniBasedUnixGroupsNetgroupMapping or JniBasedUnixGroupsMapping is used, 
 we get segmentation fault very often. The same system ran 2.2 for months 
 without any problem, but as soon as upgrading to 2.3, it started crashing.  
 This resulted in multiple name node crashes per day.
 The server was running nslcd (nss-pam-ldapd-0.7.5-15.el6_3.2). We did not see 
 this problem on the servers running sssd. 
 There was one change in the C code and it modified the return code handling 
 after getgrouplist() call. If the function returns 0 or a negative value less 
 than -1, it will do realloc() instead of returning failure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10442) Group look-up can cause segmentation fault when certain JNI-based mapping module is used.

2014-03-27 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13949585#comment-13949585
 ] 

Chris Nauroth commented on HADOOP-10442:


That sounds great, Kihwal.  I think we can commit this and resolve the blocker.

 Group look-up can cause segmentation fault when certain JNI-based mapping 
 module is used.
 -

 Key: HADOOP-10442
 URL: https://issues.apache.org/jira/browse/HADOOP-10442
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.3.0, 2.4.0
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Blocker
 Attachments: HADOOP-10442.patch


 When JniBasedUnixGroupsNetgroupMapping or JniBasedUnixGroupsMapping is used, 
 we get segmentation fault very often. The same system ran 2.2 for months 
 without any problem, but as soon as upgrading to 2.3, it started crashing.  
 This resulted in multiple name node crashes per day.
 The server was running nslcd (nss-pam-ldapd-0.7.5-15.el6_3.2). We did not see 
 this problem on the servers running sssd. 
 There was one change in the C code and it modified the return code handling 
 after getgrouplist() call. If the function returns 0 or a negative value less 
 than -1, it will do realloc() instead of returning failure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10442) Group look-up can cause segmentation fault when certain JNI-based mapping module is used.

2014-03-27 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13949856#comment-13949856
 ] 

Jonathan Eagles commented on HADOOP-10442:
--

+1. Checking this into trunk, branch-2, branch-2.4

 Group look-up can cause segmentation fault when certain JNI-based mapping 
 module is used.
 -

 Key: HADOOP-10442
 URL: https://issues.apache.org/jira/browse/HADOOP-10442
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.3.0, 2.4.0
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Blocker
 Attachments: HADOOP-10442.patch


 When JniBasedUnixGroupsNetgroupMapping or JniBasedUnixGroupsMapping is used, 
 we get segmentation fault very often. The same system ran 2.2 for months 
 without any problem, but as soon as upgrading to 2.3, it started crashing.  
 This resulted in multiple name node crashes per day.
 The server was running nslcd (nss-pam-ldapd-0.7.5-15.el6_3.2). We did not see 
 this problem on the servers running sssd. 
 There was one change in the C code and it modified the return code handling 
 after getgrouplist() call. If the function returns 0 or a negative value less 
 than -1, it will do realloc() instead of returning failure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10442) Group look-up can cause segmentation fault when certain JNI-based mapping module is used.

2014-03-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13949862#comment-13949862
 ] 

Hudson commented on HADOOP-10442:
-

SUCCESS: Integrated in Hadoop-trunk-Commit #5418 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5418/])
HADOOP-10442. Group look-up can cause segmentation fault when certain JNI-based 
mapping module is used. (Kihwal Lee via jeagles) (jeagles: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1582451)
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/hadoop_user_info.c


 Group look-up can cause segmentation fault when certain JNI-based mapping 
 module is used.
 -

 Key: HADOOP-10442
 URL: https://issues.apache.org/jira/browse/HADOOP-10442
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.3.0, 2.4.0
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Blocker
 Attachments: HADOOP-10442.patch


 When JniBasedUnixGroupsNetgroupMapping or JniBasedUnixGroupsMapping is used, 
 we get segmentation fault very often. The same system ran 2.2 for months 
 without any problem, but as soon as upgrading to 2.3, it started crashing.  
 This resulted in multiple name node crashes per day.
 The server was running nslcd (nss-pam-ldapd-0.7.5-15.el6_3.2). We did not see 
 this problem on the servers running sssd. 
 There was one change in the C code and it modified the return code handling 
 after getgrouplist() call. If the function returns 0 or a negative value less 
 than -1, it will do realloc() instead of returning failure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10442) Group look-up can cause segmentation fault when certain JNI-based mapping module is used.

2014-03-27 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13950125#comment-13950125
 ] 

Colin Patrick McCabe commented on HADOOP-10442:
---

Thanks for this patch, Kihwal.

To be honest, I find the behavior of {{getgrouplist}} that you are seeing to be 
puzzling.  The man page doesn't describe any negative return codes other than 
-1.

{code}
RETURN VALUE
   If the number of groups of which user is a member is less than or equal 
to *ngroups, then the value *ngroups is returned.

   If the user is a member of more than *ngroups groups, then 
getgrouplist() returns -1.  In this case the value returned in *ngroups can  be 
 used  to
   resize the buffer passed to a further call getgrouplist().
{code}

What negative return code did you see besides -1?  I guess what you're seeing 
is undocumented, and possibly a bug in nslcd (or the man page?)

Also, looking at this more closely, I believe we mishandle the case where the 
user is a member of no groups.  This would be a pretty odd configuration (I 
wonder if it's possible?).  Just to be sure, I think we should consider 
getgrouplist returning 0 to be ok.

 Group look-up can cause segmentation fault when certain JNI-based mapping 
 module is used.
 -

 Key: HADOOP-10442
 URL: https://issues.apache.org/jira/browse/HADOOP-10442
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.3.0, 2.4.0
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Blocker
 Fix For: 3.0.0, 2.4.0, 2.5.0

 Attachments: HADOOP-10442.patch


 When JniBasedUnixGroupsNetgroupMapping or JniBasedUnixGroupsMapping is used, 
 we get segmentation fault very often. The same system ran 2.2 for months 
 without any problem, but as soon as upgrading to 2.3, it started crashing.  
 This resulted in multiple name node crashes per day.
 The server was running nslcd (nss-pam-ldapd-0.7.5-15.el6_3.2). We did not see 
 this problem on the servers running sssd. 
 There was one change in the C code and it modified the return code handling 
 after getgrouplist() call. If the function returns 0 or a negative value less 
 than -1, it will do realloc() instead of returning failure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10442) Group look-up can cause segmentation fault when certain JNI-based mapping module is used.

2014-03-26 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13948350#comment-13948350
 ] 

Kihwal Lee commented on HADOOP-10442:
-

The return code handling was modified in HADOOP-10087. This is the only change 
in the JNI user-group mapping modules between 2.2 and 2.3.

 Group look-up can cause segmentation fault when certain JNI-based mapping 
 module is used.
 -

 Key: HADOOP-10442
 URL: https://issues.apache.org/jira/browse/HADOOP-10442
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.3.0, 2.4.0
Reporter: Kihwal Lee
Priority: Blocker

 When JniBasedUnixGroupsNetgroupMapping or JniBasedUnixGroupsMapping is used, 
 we get segmentation fault very often. The same system ran 2.2 for months 
 without any problem, but as soon as upgrading to 2.3, it started crashing.  
 This resulted in multiple name node crashes per day.
 The server was running nslcd (nss-pam-ldapd-0.7.5-15.el6_3.2). We did not see 
 this problem on the servers running sssd. 
 There was one change in the C code and it modified the return code handling 
 after getgrouplist() call. If the function returns 0 or a negative value less 
 than -1, it will do realloc() instead of returning failure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10442) Group look-up can cause segmentation fault when certain JNI-based mapping module is used.

2014-03-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13948459#comment-13948459
 ] 

Hadoop QA commented on HADOOP-10442:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12636983/HADOOP-10442.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/3721//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/3721//console

This message is automatically generated.

 Group look-up can cause segmentation fault when certain JNI-based mapping 
 module is used.
 -

 Key: HADOOP-10442
 URL: https://issues.apache.org/jira/browse/HADOOP-10442
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.3.0, 2.4.0
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Blocker
 Attachments: HADOOP-10442.patch


 When JniBasedUnixGroupsNetgroupMapping or JniBasedUnixGroupsMapping is used, 
 we get segmentation fault very often. The same system ran 2.2 for months 
 without any problem, but as soon as upgrading to 2.3, it started crashing.  
 This resulted in multiple name node crashes per day.
 The server was running nslcd (nss-pam-ldapd-0.7.5-15.el6_3.2). We did not see 
 this problem on the servers running sssd. 
 There was one change in the C code and it modified the return code handling 
 after getgrouplist() call. If the function returns 0 or a negative value less 
 than -1, it will do realloc() instead of returning failure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-10442) Group look-up can cause segmentation fault when certain JNI-based mapping module is used.

2014-03-26 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13948832#comment-13948832
 ] 

Kihwal Lee commented on HADOOP-10442:
-

A 2.3 NN has been running with this fix for some time.  The NN crashed every 
3-5 hours before this. 

 Group look-up can cause segmentation fault when certain JNI-based mapping 
 module is used.
 -

 Key: HADOOP-10442
 URL: https://issues.apache.org/jira/browse/HADOOP-10442
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.3.0, 2.4.0
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Blocker
 Attachments: HADOOP-10442.patch


 When JniBasedUnixGroupsNetgroupMapping or JniBasedUnixGroupsMapping is used, 
 we get segmentation fault very often. The same system ran 2.2 for months 
 without any problem, but as soon as upgrading to 2.3, it started crashing.  
 This resulted in multiple name node crashes per day.
 The server was running nslcd (nss-pam-ldapd-0.7.5-15.el6_3.2). We did not see 
 this problem on the servers running sssd. 
 There was one change in the C code and it modified the return code handling 
 after getgrouplist() call. If the function returns 0 or a negative value less 
 than -1, it will do realloc() instead of returning failure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)