[jira] [Created] (HDFS-13992) cross-cluster rack awareness for distcp

2018-10-14 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created HDFS-13992:


 Summary: cross-cluster rack awareness for distcp
 Key: HDFS-13992
 URL: https://issues.apache.org/jira/browse/HDFS-13992
 Project: Hadoop HDFS
  Issue Type: New Feature
Affects Versions: 2.7.7, 3.0.3, 3.1.1, 2.8.4
Reporter: Ruslan Dautkhanov


Would be great if distcp supported cross-cluster rack awareness.

For example, we have hdfs cluster1 and hdfs cluster2.
Both clusters span three switches, and both have rack awareness enabled.
And also both clusters name same switches same way.

So when distcp runs data replication job, it could replicate hdfs blocks 
only to counterpart datanodes on destination cluster that are in the same 
physical network 
switch, minimizing latencies and maximizing bandwidth. 

It could be an option, activate through `distcp` clommand-line switch.
We have multiple clusters with default replication of 3 and all those cluster 
live in same three different "racks" / "top of the rack switches".

This could drastically minimize inter-switch network traffic during huge distcp 
jobs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10427) Write and Read SequenceFile Parallelly - java.io.IOException: Cannot obtain block length for LocatedBlock

2018-05-26 Thread Ruslan Dautkhanov (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491791#comment-16491791
 ] 

Ruslan Dautkhanov commented on HDFS-10427:
--

We have same issue in Hadoop 2.6

> Write and Read SequenceFile Parallelly - java.io.IOException: Cannot obtain 
> block length for LocatedBlock
> -
>
> Key: HDFS-10427
> URL: https://issues.apache.org/jira/browse/HDFS-10427
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 2.7.2
>Reporter: Syed Akram
>Priority: Critical
>
> Trying to Write a key/value and Read already written key/value in a 
> SequenceFile parallelly, But while doing that 
> Writer - appendOption true
> java.io.IOException: Cannot obtain block length for 
> LocatedBlock{BP-1019538077-localhost-1459944245378:blk_1075356142_3219260; 
> getBlockSize()=2409; corrupt=false; offset=0; 
> locs=[DatanodeInfoWithStorage[dn1:50010,DS-21698924-4178-4c08-ba41-aa86770ef0d0,DISK],
>  
> DatanodeInfoWithStorage[dn3:50010,DS-8e3dc8c0-4e34-4d12-86a3-48b189b78f5d,DISK],
>  
> DatanodeInfoWithStorage[dn2:50010,DS-fb22c1c2-e059-4e0e-91e0-df838beb86f9,DISK]]}
>   at 
> org.apache.hadoop.hdfs.DFSInputStream.readBlockLength(DFSInputStream.java:428)
>   at 
> org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:336)
>   at 
> org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:272)
>   at org.apache.hadoop.hdfs.DFSInputStream.(DFSInputStream.java:264)
>   at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1526)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:303)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:299)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:299)
>   at 
> org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1902)
>   at 
> org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1822)
>   
> But when i'm trying to read when write (SequenceFile.Writer )is opened, it 
> works fine, 
> But when we do parallelly (both start write with append=true and then read 
> already existing key/value) then facing this issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-13096) HDFS group quota

2018-01-31 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created HDFS-13096:


 Summary: HDFS group quota
 Key: HDFS-13096
 URL: https://issues.apache.org/jira/browse/HDFS-13096
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: datanode, fs, hdfs, nn
Affects Versions: 3.0.0, 2.7.5, 2.8.3
Reporter: Ruslan Dautkhanov


We have groups of people that have their own set of HDFS directories. 
For example, they have HDFS staging place for new files:
/datascience
/analysts 
... 
but at the same time they have Hive warehouse directory 
/hivewarehouse/datascience
/hivewarehouse/analysts 
... 
on top of that they also have some files stored under /user/${username}/ 

It's always been a challenge to maintain a combined quota on all HDFS locations 
a particular group of people owns. As we're currently forced to put a 
particular quota for each directory independently.

It would be great if HDFS would have a quota tied either
- to a set of HDFS locations ;
- or to a group of people (where `group`is defined as which HDFS group a 
particular file/directory belongs to).

Linux allows to define quotas at group level, i.e. `edquota -g devel` etc.. 
would be great to have the same at HDFS level.

Other thoughts and ideas?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-12601) Implement new hdfs balancer's threshold units

2017-10-05 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created HDFS-12601:


 Summary: Implement new hdfs balancer's threshold units
 Key: HDFS-12601
 URL: https://issues.apache.org/jira/browse/HDFS-12601
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer & mover
Affects Versions: 3.0.0-alpha3, 2.7.4, 2.6.5
Reporter: Ruslan Dautkhanov


Balancer threshold unit is inappropriate in most cases for new clusters, that 
have a lot of capacity and small used%. 

For example, in one of our new clusters HDFS capacity is *2.2 Pb* and only 
*160Tb* used (across all DNs). So 1% threshold equals to *0.55* Tb (there are 
40 nodes in this cluster) for `hdfs balancer -threshold` parameter. 
Now we have some DNs that have as low as *3.5*Tb 
and other DNs as high as *4.6* Tb. 

So the actual disbalance is more like *24%*.
`hdfs balancer -threshold *1*` command says there is nothing to balance (and I 
can't put smaller value than 1).
Balancer now thinks that the disbalance is less than 1% (based on full 
capacity), 
when it's actually 24%. 

We see that those nodes with more data actually getting more processing tasks 
(because of data locality).

It would be great to introduce a suffix for -threshold balancer parameter:
* 10c ('c' for `c`apacity) would mean 10% from DN's capacity (current behavior, 
will default to 'c' if not specified so this change is backward compatible);
* 10u ('u' for `u`sed space variance across all DNs) would be measured as 
%min_used / %max_used. For the example above, the cluster would get rebalanced 
correctly as current disbalance is 24%.






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11851) getGlobalJNIEnv() may deadlock if exception is thrown

2017-09-26 Thread Ruslan Dautkhanov (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16181149#comment-16181149
 ] 

Ruslan Dautkhanov commented on HDFS-11851:
--

thank you [~jzhuge]

> getGlobalJNIEnv() may deadlock if exception is thrown
> -
>
> Key: HDFS-11851
> URL: https://issues.apache.org/jira/browse/HDFS-11851
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: libhdfs
>Affects Versions: 3.0.0-alpha4
>Reporter: Henry Robinson
>Assignee: Sailesh Mukil
>Priority: Blocker
> Fix For: 3.0.0-alpha4
>
> Attachments: HDFS-11851.000.patch, HDFS-11851.001.patch, 
> HDFS-11851.002.patch, HDFS-11851.003.patch, HDFS-11851.004.patch, 
> HDFS-11851.005.patch
>
>
> HDFS-11529 introduced a deadlock into {{getGlobalJNIEnv()}} if an exception 
> is thrown. {{getGlobalJNIEnv()}} holds {{jvmMutex}}, but 
> {{printExceptionAndFree()}} will eventually try to acquire that lock in 
> {{setTLSExceptionStrings()}}.
> The exception might get caught from {{loadFileSystems}}:
> {code}
> jthr = invokeMethod(env, NULL, STATIC, NULL,
>  "org/apache/hadoop/fs/FileSystem",
>  "loadFileSystems", "()V");
> if (jthr) {
> printExceptionAndFree(env, jthr, PRINT_EXC_ALL, 
> "loadFileSystems");
> }
> }
> {code}
> and here's the relevant parts of the stack trace from where I call this API 
> in Impala, which uses {{libhdfs}}:
> {code}
> #0  __lll_lock_wait () at 
> ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
> #1  0x74a8d657 in _L_lock_909 () from 
> /lib/x86_64-linux-gnu/libpthread.so.0
> #2  0x74a8d480 in __GI___pthread_mutex_lock (mutex=0x47ce960 
> ) at ../nptl/pthread_mutex_lock.c:79
> #3  0x02f06056 in mutexLock (m=) at 
> /data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/os/posix/mutexes.c:28
> #4  0x02efe817 in setTLSExceptionStrings (rootCause=0x0, 
> stackTrace=0x0) at 
> /data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/jni_helper.c:581
> #5  0x02f065d7 in printExceptionAndFreeV (env=0x513c1e8, 
> exc=0x508a8c0, noPrintFlags=, fmt=0x34349cf "loadFileSystems", 
> ap=0x7fffb660)
> at 
> /data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/exception.c:183
> #6  0x02f0683d in printExceptionAndFree (env=, 
> exc=, noPrintFlags=, fmt=)
> at 
> /data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/exception.c:213
> #7  0x02eff60f in getGlobalJNIEnv () at 
> /data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/jni_helper.c:463
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11851) getGlobalJNIEnv() may deadlock if exception is thrown

2017-09-19 Thread Ruslan Dautkhanov (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16171172#comment-16171172
 ] 

Ruslan Dautkhanov commented on HDFS-11851:
--

{quote}
Please set "ulimit -c unlimited" before reproducing the issue in order to 
generate a core dump. Upload the core dump or run "gdb  " and then 
"bt" to get the stack trace.
{quote}

I already did .. a few days days ago and posted `bt` output 
[here|https://issues.apache.org/jira/browse/HDFS-11851?focusedCommentId=16166959=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16166959].

{quote}Got a similar SEGV{quote}
Thanks [~jzhuge]! Glad it wasn't something in our environment that caused this.

> getGlobalJNIEnv() may deadlock if exception is thrown
> -
>
> Key: HDFS-11851
> URL: https://issues.apache.org/jira/browse/HDFS-11851
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: libhdfs
>Affects Versions: 3.0.0-alpha4
>Reporter: Henry Robinson
>Assignee: Sailesh Mukil
>Priority: Blocker
> Fix For: 3.0.0-alpha4
>
> Attachments: HDFS-11851.000.patch, HDFS-11851.001.patch, 
> HDFS-11851.002.patch, HDFS-11851.003.patch, HDFS-11851.004.patch, 
> HDFS-11851.005.patch
>
>
> HDFS-11529 introduced a deadlock into {{getGlobalJNIEnv()}} if an exception 
> is thrown. {{getGlobalJNIEnv()}} holds {{jvmMutex}}, but 
> {{printExceptionAndFree()}} will eventually try to acquire that lock in 
> {{setTLSExceptionStrings()}}.
> The exception might get caught from {{loadFileSystems}}:
> {code}
> jthr = invokeMethod(env, NULL, STATIC, NULL,
>  "org/apache/hadoop/fs/FileSystem",
>  "loadFileSystems", "()V");
> if (jthr) {
> printExceptionAndFree(env, jthr, PRINT_EXC_ALL, 
> "loadFileSystems");
> }
> }
> {code}
> and here's the relevant parts of the stack trace from where I call this API 
> in Impala, which uses {{libhdfs}}:
> {code}
> #0  __lll_lock_wait () at 
> ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
> #1  0x74a8d657 in _L_lock_909 () from 
> /lib/x86_64-linux-gnu/libpthread.so.0
> #2  0x74a8d480 in __GI___pthread_mutex_lock (mutex=0x47ce960 
> ) at ../nptl/pthread_mutex_lock.c:79
> #3  0x02f06056 in mutexLock (m=) at 
> /data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/os/posix/mutexes.c:28
> #4  0x02efe817 in setTLSExceptionStrings (rootCause=0x0, 
> stackTrace=0x0) at 
> /data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/jni_helper.c:581
> #5  0x02f065d7 in printExceptionAndFreeV (env=0x513c1e8, 
> exc=0x508a8c0, noPrintFlags=, fmt=0x34349cf "loadFileSystems", 
> ap=0x7fffb660)
> at 
> /data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/exception.c:183
> #6  0x02f0683d in printExceptionAndFree (env=, 
> exc=, noPrintFlags=, fmt=)
> at 
> /data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/exception.c:213
> #7  0x02eff60f in getGlobalJNIEnv () at 
> /data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/jni_helper.c:463
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11851) getGlobalJNIEnv() may deadlock if exception is thrown

2017-09-18 Thread Ruslan Dautkhanov (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16171130#comment-16171130
 ] 

Ruslan Dautkhanov commented on HDFS-11851:
--

[~jzhuge], yes I did {{export CLASSPATH=`hadoop classpath`}} before so 
{{CLASSPATH}} does have all required jars including {{hadoop-common.jar}}
I think some elements of that hs_err..log file are misleading as it shows 
internal events including when the classes were discovered, 
for example,
{noformat}
Event: 0.067 Thread 0x00e0 Exception  
(0x000580169ab0) thrown at 
[/HUDSON3/workspace/8-2-build-linux-amd64/jdk8u141/9370/hotspot/src/share/vm/classfile/systemDictionary.cpp,
 line 199]
Event: 0.067 Thread 0x00e0 Exception  (0x00058016c1f0) thrown 
at 
[/HUDSON3/workspace/8-2-build-linux-amd64/jdk8u141/9370/hotspot/src/share/vm/classfile/systemDictionary.cpp,
 line 199]
Event: 0.068 Thread 0x00e0 Exception  (0x00058016e708) thrown 
at 
[/HUDSON3/workspace/8-2-build-linux-amd64/jdk8u141/9370/hotspot/src/share/vm/classfile/systemDictionary.cpp,
 line 199]
{noformat}

but then right down below you see 
{noformat}
Event: 0.066 loading class java/io/FileNotFoundException
Event: 0.066 loading class java/io/IOException
Event: 0.066 loading class java/io/IOException done
Event: 0.066 loading class java/io/FileNotFoundException done
Event: 0.066 loading class java/security/PrivilegedActionException
Event: 0.066 loading class java/security/PrivilegedActionException done
Event: 0.067 loading class org/apache/commons/lang/exception/ExceptionUtils
Event: 0.067 loading class org/apache/commons/lang/exception/ExceptionUtils done
Event: 0.067 loading class org/apache/commons/lang/exception/ExceptionUtils
Event: 0.067 loading class org/apache/commons/lang/exception/ExceptionUtils done
{noformat}

Notice {{org/apache/commons/lang/exception/ExceptionUtils}} for example list 
twice in that "Internal exceptions (10 events)" section, but then shown as 
loaded fine in "Events (10 events)" section. So I think "Internal exceptions" 
is misleading? 



> getGlobalJNIEnv() may deadlock if exception is thrown
> -
>
> Key: HDFS-11851
> URL: https://issues.apache.org/jira/browse/HDFS-11851
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: libhdfs
>Affects Versions: 3.0.0-alpha4
>Reporter: Henry Robinson
>Assignee: Sailesh Mukil
>Priority: Blocker
> Fix For: 3.0.0-alpha4
>
> Attachments: HDFS-11851.000.patch, HDFS-11851.001.patch, 
> HDFS-11851.002.patch, HDFS-11851.003.patch, HDFS-11851.004.patch, 
> HDFS-11851.005.patch
>
>
> HDFS-11529 introduced a deadlock into {{getGlobalJNIEnv()}} if an exception 
> is thrown. {{getGlobalJNIEnv()}} holds {{jvmMutex}}, but 
> {{printExceptionAndFree()}} will eventually try to acquire that lock in 
> {{setTLSExceptionStrings()}}.
> The exception might get caught from {{loadFileSystems}}:
> {code}
> jthr = invokeMethod(env, NULL, STATIC, NULL,
>  "org/apache/hadoop/fs/FileSystem",
>  "loadFileSystems", "()V");
> if (jthr) {
> printExceptionAndFree(env, jthr, PRINT_EXC_ALL, 
> "loadFileSystems");
> }
> }
> {code}
> and here's the relevant parts of the stack trace from where I call this API 
> in Impala, which uses {{libhdfs}}:
> {code}
> #0  __lll_lock_wait () at 
> ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
> #1  0x74a8d657 in _L_lock_909 () from 
> /lib/x86_64-linux-gnu/libpthread.so.0
> #2  0x74a8d480 in __GI___pthread_mutex_lock (mutex=0x47ce960 
> ) at ../nptl/pthread_mutex_lock.c:79
> #3  0x02f06056 in mutexLock (m=) at 
> /data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/os/posix/mutexes.c:28
> #4  0x02efe817 in setTLSExceptionStrings (rootCause=0x0, 
> stackTrace=0x0) at 
> /data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/jni_helper.c:581
> #5  0x02f065d7 in printExceptionAndFreeV (env=0x513c1e8, 
> exc=0x508a8c0, noPrintFlags=, fmt=0x34349cf "loadFileSystems", 
> ap=0x7fffb660)
> at 
> /data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/exception.c:183
> #6  0x02f0683d in printExceptionAndFree (env=, 
> exc=, noPrintFlags=, fmt=)
> at 
> /data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/exception.c:213
> #7  0x02eff60f in getGlobalJNIEnv () at 
> /data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/jni_helper.c:463
> {code}



--
This message was sent by 

[jira] [Commented] (HDFS-11851) getGlobalJNIEnv() may deadlock if exception is thrown

2017-09-15 Thread Ruslan Dautkhanov (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16168084#comment-16168084
 ] 

Ruslan Dautkhanov commented on HDFS-11851:
--

[~jzhuge], please have a look at 
https://github.com/Tagar/shared/blob/master/hs_err_pid18963.log thanks

> getGlobalJNIEnv() may deadlock if exception is thrown
> -
>
> Key: HDFS-11851
> URL: https://issues.apache.org/jira/browse/HDFS-11851
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: libhdfs
>Affects Versions: 3.0.0-alpha4
>Reporter: Henry Robinson
>Assignee: Sailesh Mukil
>Priority: Blocker
> Fix For: 3.0.0-alpha4
>
> Attachments: HDFS-11851.000.patch, HDFS-11851.001.patch, 
> HDFS-11851.002.patch, HDFS-11851.003.patch, HDFS-11851.004.patch, 
> HDFS-11851.005.patch
>
>
> HDFS-11529 introduced a deadlock into {{getGlobalJNIEnv()}} if an exception 
> is thrown. {{getGlobalJNIEnv()}} holds {{jvmMutex}}, but 
> {{printExceptionAndFree()}} will eventually try to acquire that lock in 
> {{setTLSExceptionStrings()}}.
> The exception might get caught from {{loadFileSystems}}:
> {code}
> jthr = invokeMethod(env, NULL, STATIC, NULL,
>  "org/apache/hadoop/fs/FileSystem",
>  "loadFileSystems", "()V");
> if (jthr) {
> printExceptionAndFree(env, jthr, PRINT_EXC_ALL, 
> "loadFileSystems");
> }
> }
> {code}
> and here's the relevant parts of the stack trace from where I call this API 
> in Impala, which uses {{libhdfs}}:
> {code}
> #0  __lll_lock_wait () at 
> ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
> #1  0x74a8d657 in _L_lock_909 () from 
> /lib/x86_64-linux-gnu/libpthread.so.0
> #2  0x74a8d480 in __GI___pthread_mutex_lock (mutex=0x47ce960 
> ) at ../nptl/pthread_mutex_lock.c:79
> #3  0x02f06056 in mutexLock (m=) at 
> /data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/os/posix/mutexes.c:28
> #4  0x02efe817 in setTLSExceptionStrings (rootCause=0x0, 
> stackTrace=0x0) at 
> /data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/jni_helper.c:581
> #5  0x02f065d7 in printExceptionAndFreeV (env=0x513c1e8, 
> exc=0x508a8c0, noPrintFlags=, fmt=0x34349cf "loadFileSystems", 
> ap=0x7fffb660)
> at 
> /data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/exception.c:183
> #6  0x02f0683d in printExceptionAndFree (env=, 
> exc=, noPrintFlags=, fmt=)
> at 
> /data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/exception.c:213
> #7  0x02eff60f in getGlobalJNIEnv () at 
> /data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/jni_helper.c:463
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11851) getGlobalJNIEnv() may deadlock if exception is thrown

2017-09-14 Thread Ruslan Dautkhanov (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16166959#comment-16166959
 ] 

Ruslan Dautkhanov commented on HDFS-11851:
--

After applying this patch program started core dumping - here's gdb back trace 

{code}
(gdb) bt
#0  0x7fe78a34b1d7 in raise () from /lib64/libc.so.6
#1  0x7fe78a34c8c8 in abort () from /lib64/libc.so.6
#2  0x7fe78b212185 in os::abort(bool) () from 
/usr/java/default/jre/lib/amd64/server/libjvm.so
#3  0x7fe78b3b4593 in VMError::report_and_die() () from 
/usr/java/default/jre/lib/amd64/server/libjvm.so
#4  0x7fe78b21768f in JVM_handle_linux_signal () from 
/usr/java/default/jre/lib/amd64/server/libjvm.so
#5  0x7fe78b20dbe3 in signalHandler(int, siginfo*, void*) () from 
/usr/java/default/jre/lib/amd64/server/libjvm.so
#6  
#7  0x7fe78a6db8b0 in setTLSExceptionStrings () from 
/opt/cloudera/parcels/CDH/lib64/libhdfs.so.0.0.0
#8  0x7fe78a6da52c in printExceptionAndFreeV () from 
/opt/cloudera/parcels/CDH/lib64/libhdfs.so.0.0.0
#9  0x7fe78a6da6cd in printExceptionAndFree () from 
/opt/cloudera/parcels/CDH/lib64/libhdfs.so.0.0.0
#10 0x7fe78a6db60b in getJNIEnv () from 
/opt/cloudera/parcels/CDH/lib64/libhdfs.so.0.0.0
#11 0x7fe78a6dd034 in hdfsBuilderConnect () from 
/opt/cloudera/parcels/CDH/lib64/libhdfs.so.0.0.0
#12 0x00400950 in main ()
{code}

As you can see it happens in setTLSExceptionStrings () so definitely related to 
this patch. 
I can upload a hs_err*.log log file if it will be helpful.


> getGlobalJNIEnv() may deadlock if exception is thrown
> -
>
> Key: HDFS-11851
> URL: https://issues.apache.org/jira/browse/HDFS-11851
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: libhdfs
>Affects Versions: 3.0.0-alpha4
>Reporter: Henry Robinson
>Assignee: Sailesh Mukil
>Priority: Blocker
> Fix For: 3.0.0-alpha4
>
> Attachments: HDFS-11851.000.patch, HDFS-11851.001.patch, 
> HDFS-11851.002.patch, HDFS-11851.003.patch, HDFS-11851.004.patch, 
> HDFS-11851.005.patch
>
>
> HDFS-11529 introduced a deadlock into {{getGlobalJNIEnv()}} if an exception 
> is thrown. {{getGlobalJNIEnv()}} holds {{jvmMutex}}, but 
> {{printExceptionAndFree()}} will eventually try to acquire that lock in 
> {{setTLSExceptionStrings()}}.
> The exception might get caught from {{loadFileSystems}}:
> {code}
> jthr = invokeMethod(env, NULL, STATIC, NULL,
>  "org/apache/hadoop/fs/FileSystem",
>  "loadFileSystems", "()V");
> if (jthr) {
> printExceptionAndFree(env, jthr, PRINT_EXC_ALL, 
> "loadFileSystems");
> }
> }
> {code}
> and here's the relevant parts of the stack trace from where I call this API 
> in Impala, which uses {{libhdfs}}:
> {code}
> #0  __lll_lock_wait () at 
> ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
> #1  0x74a8d657 in _L_lock_909 () from 
> /lib/x86_64-linux-gnu/libpthread.so.0
> #2  0x74a8d480 in __GI___pthread_mutex_lock (mutex=0x47ce960 
> ) at ../nptl/pthread_mutex_lock.c:79
> #3  0x02f06056 in mutexLock (m=) at 
> /data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/os/posix/mutexes.c:28
> #4  0x02efe817 in setTLSExceptionStrings (rootCause=0x0, 
> stackTrace=0x0) at 
> /data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/jni_helper.c:581
> #5  0x02f065d7 in printExceptionAndFreeV (env=0x513c1e8, 
> exc=0x508a8c0, noPrintFlags=, fmt=0x34349cf "loadFileSystems", 
> ap=0x7fffb660)
> at 
> /data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/exception.c:183
> #6  0x02f0683d in printExceptionAndFree (env=, 
> exc=, noPrintFlags=, fmt=)
> at 
> /data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/exception.c:213
> #7  0x02eff60f in getGlobalJNIEnv () at 
> /data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/jni_helper.c:463
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11851) getGlobalJNIEnv() may deadlock if exception is thrown

2017-09-11 Thread Ruslan Dautkhanov (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16162260#comment-16162260
 ] 

Ruslan Dautkhanov commented on HDFS-11851:
--

[~jzhuge], I have following exception that matches HDFS-11851, doesn't it? See 
below:

{code}
$ gdb -p 8248
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-94.el7
  /cut/   
(gdb)
(gdb) bt
#0  0x7fddd9f141bd in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x7fddd9f0fd02 in _L_lock_791 () from /lib64/libpthread.so.0
#2  0x7fddd9f0fc08 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x7fddda9f3e26 in mutexLock () from 
/opt/cloudera/parcels/CDH/lib64/libhdfs.so.0.0.0
#4  0x7fddda9ed6f1 in setTLSExceptionStrings () from 
/opt/cloudera/parcels/CDH/lib64/libhdfs.so.0.0.0
#5  0x7fddda9ec38c in printExceptionAndFreeV () from 
/opt/cloudera/parcels/CDH/lib64/libhdfs.so.0.0.0
#6  0x7fddda9ec52d in printExceptionAndFree () from 
/opt/cloudera/parcels/CDH/lib64/libhdfs.so.0.0.0
#7  0x7fddda9ed46b in getJNIEnv () from 
/opt/cloudera/parcels/CDH/lib64/libhdfs.so.0.0.0
#8  0x7fddda9eee94 in hdfsBuilderConnect () from 
/opt/cloudera/parcels/CDH/lib64/libhdfs.so.0.0.0
#9  0x00400950 in main ()

{code}

It is from a Hadoop-2.6 based distribution of Hadoop (CDH 5.10).

> getGlobalJNIEnv() may deadlock if exception is thrown
> -
>
> Key: HDFS-11851
> URL: https://issues.apache.org/jira/browse/HDFS-11851
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: libhdfs
>Affects Versions: 3.0.0-alpha4
>Reporter: Henry Robinson
>Assignee: Sailesh Mukil
>Priority: Blocker
> Fix For: 3.0.0-alpha4
>
> Attachments: HDFS-11851.000.patch, HDFS-11851.001.patch, 
> HDFS-11851.002.patch, HDFS-11851.003.patch, HDFS-11851.004.patch, 
> HDFS-11851.005.patch
>
>
> HDFS-11529 introduced a deadlock into {{getGlobalJNIEnv()}} if an exception 
> is thrown. {{getGlobalJNIEnv()}} holds {{jvmMutex}}, but 
> {{printExceptionAndFree()}} will eventually try to acquire that lock in 
> {{setTLSExceptionStrings()}}.
> The exception might get caught from {{loadFileSystems}}:
> {code}
> jthr = invokeMethod(env, NULL, STATIC, NULL,
>  "org/apache/hadoop/fs/FileSystem",
>  "loadFileSystems", "()V");
> if (jthr) {
> printExceptionAndFree(env, jthr, PRINT_EXC_ALL, 
> "loadFileSystems");
> }
> }
> {code}
> and here's the relevant parts of the stack trace from where I call this API 
> in Impala, which uses {{libhdfs}}:
> {code}
> #0  __lll_lock_wait () at 
> ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
> #1  0x74a8d657 in _L_lock_909 () from 
> /lib/x86_64-linux-gnu/libpthread.so.0
> #2  0x74a8d480 in __GI___pthread_mutex_lock (mutex=0x47ce960 
> ) at ../nptl/pthread_mutex_lock.c:79
> #3  0x02f06056 in mutexLock (m=) at 
> /data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/os/posix/mutexes.c:28
> #4  0x02efe817 in setTLSExceptionStrings (rootCause=0x0, 
> stackTrace=0x0) at 
> /data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/jni_helper.c:581
> #5  0x02f065d7 in printExceptionAndFreeV (env=0x513c1e8, 
> exc=0x508a8c0, noPrintFlags=, fmt=0x34349cf "loadFileSystems", 
> ap=0x7fffb660)
> at 
> /data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/exception.c:183
> #6  0x02f0683d in printExceptionAndFree (env=, 
> exc=, noPrintFlags=, fmt=)
> at 
> /data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/exception.c:213
> #7  0x02eff60f in getGlobalJNIEnv () at 
> /data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/jni_helper.c:463
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11851) getGlobalJNIEnv() may deadlock if exception is thrown

2017-09-11 Thread Ruslan Dautkhanov (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16162251#comment-16162251
 ] 

Ruslan Dautkhanov commented on HDFS-11851:
--

Would it be possible to backport this patch to HDFS 2.6 ? Thanks.

> getGlobalJNIEnv() may deadlock if exception is thrown
> -
>
> Key: HDFS-11851
> URL: https://issues.apache.org/jira/browse/HDFS-11851
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: libhdfs
>Affects Versions: 3.0.0-alpha4
>Reporter: Henry Robinson
>Assignee: Sailesh Mukil
>Priority: Blocker
> Fix For: 3.0.0-alpha4
>
> Attachments: HDFS-11851.000.patch, HDFS-11851.001.patch, 
> HDFS-11851.002.patch, HDFS-11851.003.patch, HDFS-11851.004.patch, 
> HDFS-11851.005.patch
>
>
> HDFS-11529 introduced a deadlock into {{getGlobalJNIEnv()}} if an exception 
> is thrown. {{getGlobalJNIEnv()}} holds {{jvmMutex}}, but 
> {{printExceptionAndFree()}} will eventually try to acquire that lock in 
> {{setTLSExceptionStrings()}}.
> The exception might get caught from {{loadFileSystems}}:
> {code}
> jthr = invokeMethod(env, NULL, STATIC, NULL,
>  "org/apache/hadoop/fs/FileSystem",
>  "loadFileSystems", "()V");
> if (jthr) {
> printExceptionAndFree(env, jthr, PRINT_EXC_ALL, 
> "loadFileSystems");
> }
> }
> {code}
> and here's the relevant parts of the stack trace from where I call this API 
> in Impala, which uses {{libhdfs}}:
> {code}
> #0  __lll_lock_wait () at 
> ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
> #1  0x74a8d657 in _L_lock_909 () from 
> /lib/x86_64-linux-gnu/libpthread.so.0
> #2  0x74a8d480 in __GI___pthread_mutex_lock (mutex=0x47ce960 
> ) at ../nptl/pthread_mutex_lock.c:79
> #3  0x02f06056 in mutexLock (m=) at 
> /data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/os/posix/mutexes.c:28
> #4  0x02efe817 in setTLSExceptionStrings (rootCause=0x0, 
> stackTrace=0x0) at 
> /data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/jni_helper.c:581
> #5  0x02f065d7 in printExceptionAndFreeV (env=0x513c1e8, 
> exc=0x508a8c0, noPrintFlags=, fmt=0x34349cf "loadFileSystems", 
> ap=0x7fffb660)
> at 
> /data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/exception.c:183
> #6  0x02f0683d in printExceptionAndFree (env=, 
> exc=, noPrintFlags=, fmt=)
> at 
> /data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/exception.c:213
> #7  0x02eff60f in getGlobalJNIEnv () at 
> /data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/jni_helper.c:463
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12339) NFS Gateway on Shutdown Gives Unregistration Failure. Does Not Unregister with rpcbind Portmapper

2017-08-30 Thread Ruslan Dautkhanov (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16147443#comment-16147443
 ] 

Ruslan Dautkhanov commented on HDFS-12339:
--

We think this might be the root cause for nfs clients hanging issue we see 
sometimes when gdfs nfs gateway stops.
Timeouts don't work as an nfs client can see rpc service for nfs is up, but  
there is no actual live service behind that rpc service.. 
kinda zombie rpc service if it makes sense to you. Although we are not entirely 
certain if that's the root cause. 
Would be great to have this fixed anyway.

> NFS Gateway on Shutdown Gives Unregistration Failure. Does Not Unregister 
> with rpcbind Portmapper
> -
>
> Key: HDFS-12339
> URL: https://issues.apache.org/jira/browse/HDFS-12339
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Sailesh Patel
>Assignee: Mukul Kumar Singh
>
> When stopping NFS Gateway the following error is thrown in the NFS gateway 
> role logs.
> 2017-08-17 18:09:16,529 ERROR org.apache.hadoop.oncrpc.RpcProgram: 
> Unregistration failure with localhost:2049, portmap entry: 
> (PortmapMapping-13:3:6:2049)
> 2017-08-17 18:09:16,531 WARN org.apache.hadoop.util.ShutdownHookManager: 
> ShutdownHook 'NfsShutdownHook' failed, java.lang.RuntimeException: 
> Unregistration failure
> java.lang.RuntimeException: Unregistration failure
> ..
> Caused by: java.net.SocketException: Socket is closed
> at java.net.DatagramSocket.send(DatagramSocket.java:641)
> at org.apache.hadoop.oncrpc.SimpleUdpClient.run(SimpleUdpClient.java:62)
> Checking rpcinfo -p : the following entry is still there:
> " 13 3 tcp 2049 nfs"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12113) `hadoop fs -setrep` requries huge amount of memory on client side

2017-07-11 Thread Ruslan Dautkhanov (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081750#comment-16081750
 ] 

Ruslan Dautkhanov commented on HDFS-12113:
--

[~brahmareddy], looks very similar. I left comment in HADOOP-12502. Thanks for 
pointing to that jira.

> `hadoop fs -setrep` requries huge amount of memory on client side
> -
>
> Key: HDFS-12113
> URL: https://issues.apache.org/jira/browse/HDFS-12113
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0, 2.6.5
> Environment: Java 7
>Reporter: Ruslan Dautkhanov
>
> {code}
> $ hadoop fs -setrep -w 3 /
> {code}
> was failing with 
> {noformat}
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
> at java.util.Arrays.copyOf(Arrays.java:2367)
> at 
> java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)
> at 
> java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114)
> at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:415)
> at java.lang.StringBuilder.append(StringBuilder.java:132)
> at 
> org.apache.hadoop.fs.shell.PathData.getStringForChildPath(PathData.java:305)
> at org.apache.hadoop.fs.shell.PathData.getDirectoryContents(PathData.java:272)
> at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:373)
> at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:319)
> at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:373)
> at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:319)
> at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:373)
> at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:319)
> at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:373)
> at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:319)
> at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:289)
> at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:271)
> at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:255)
> at 
> org.apache.hadoop.fs.shell.SetReplication.processArguments(SetReplication.java:76)
> at 
> org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:118)
> at org.apache.hadoop.fs.shell.Command.run(Command.java:165)
> at org.apache.hadoop.fs.FsShell.run(FsShell.java:315)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at org.apache.hadoop.fs.FsShell.main(FsShell.java:372)
> {noformat}
> Until hadoop fs cli command's Java heap memory was allowed to grow to 5Gb:
> {code}
> HADOOP_HEAPSIZE=5000 hadoop fs -setrep -w 3 /
> {code}
> Notice that this setrep change was done for whole HDFS filesystem.
> So looks like there is a dependency on amount of memory used by `hadoop fs 
> -setrep` command on how many files total HDFS has? This is not a huge HDFS 
> filesystem, I would say even "small" by current standards.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-12113) `hadoop fs -setrep` requries huge amount of memory on client side

2017-07-10 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created HDFS-12113:


 Summary: `hadoop fs -setrep` requries huge amount of memory on 
client side
 Key: HDFS-12113
 URL: https://issues.apache.org/jira/browse/HDFS-12113
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.5, 2.6.0
 Environment: Java 7
Reporter: Ruslan Dautkhanov


{code}
$ hadoop fs -setrep -w 3 /
{code}

was failing with 
{noformat}
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2367)
at 
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)
at 
java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:415)
at java.lang.StringBuilder.append(StringBuilder.java:132)
at org.apache.hadoop.fs.shell.PathData.getStringForChildPath(PathData.java:305)
at org.apache.hadoop.fs.shell.PathData.getDirectoryContents(PathData.java:272)
at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:373)
at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:319)
at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:373)
at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:319)
at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:373)
at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:319)
at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:373)
at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:319)
at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:289)
at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:271)
at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:255)
at 
org.apache.hadoop.fs.shell.SetReplication.processArguments(SetReplication.java:76)
at org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:118)
at org.apache.hadoop.fs.shell.Command.run(Command.java:165)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:315)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:372)
{noformat}

Until hadoop fs cli command's Java heap memory was allowed to grow to 5Gb:
{code}
HADOOP_HEAPSIZE=5000 hadoop fs -setrep -w 3 /
{code}

Notice that this setrep change was done for whole HDFS filesystem.

So looks like there is a dependency on amount of memory used by `hadoop fs 
-setrep` command on how many files total HDFS has? This is not a huge HDFS 
filesystem, I would say even "small" by current standards.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-6949) Add NFS-ACL protocol support

2017-05-09 Thread Ruslan Dautkhanov (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16003342#comment-16003342
 ] 

Ruslan Dautkhanov commented on HDFS-6949:
-

https://tools.ietf.org/html/rfc3530#section-5.11 

NFS v4 defined ACL explicitly in RFC 3530.


> Add NFS-ACL protocol support
> 
>
> Key: HDFS-6949
> URL: https://issues.apache.org/jira/browse/HDFS-6949
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: nfs
>Reporter: Brandon Li
>
> This is the umbrella JIRA to track the effort of adding NFS ACL support.
> ACL support for NFSv3 is known as NFSACL. It is a separate out of band 
> protocol (for NFSv3) to support ACL operations (GETACL and SETACL). There is 
> no formal documentation or RFC on this protocol.
> NFSACL program number is 100227 and version is 3. 
> The program listens on tcp port 38467.
> More reference:
> http://lwn.net/Articles/120338/
> http://cateee.net/lkddb/web-lkddb/NFS_V3_ACL.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-8131) Implement a space balanced block placement policy

2017-04-24 Thread Ruslan Dautkhanov (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981732#comment-15981732
 ] 

Ruslan Dautkhanov commented on HDFS-8131:
-

Thanks for this great improvement! 
When using AvailableSpaceBlockPlacementPolicy, the default below logic does not 
work anymore?
{quote}
1. Place the first replica somewhere – either a random rack and node (if the 
HDFS client is outside the hadoop cluster) or on the local node (if the HDFS 
client is running on a node inside the cluster).
2. The second replica is written to a different rack from the first, chosen at 
random.
3. The third replica is written to the same rack as the second replica, but on 
a different node.
4. If there are more replicas – spread them across the rest of the racks.
{quote}
What is this logic now? When it comes to rackawareness and such? 
Is it by pure available space and rack awareness logic doesn't kick in?


> Implement a space balanced block placement policy
> -
>
> Key: HDFS-8131
> URL: https://issues.apache.org/jira/browse/HDFS-8131
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Liu Shaohui
>Assignee: Liu Shaohui
>Priority: Minor
>  Labels: BlockPlacementPolicy
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: balanced.png, HDFS-8131.004.patch, HDFS-8131.005.patch, 
> HDFS-8131.006.patch, HDFS-8131-v1.diff, HDFS-8131-v2.diff, HDFS-8131-v3.diff
>
>
> The default block placement policy will choose datanodes for new blocks 
> randomly, which will result in unbalanced space used percent among datanodes 
> after an cluster expansion. The old datanodes always are in high used percent 
> of space and new added ones are in low percent.
> Through we can used the external balance tool to balance the space used rate, 
> it will cost extra network IO and it's not easy to control the balance speed.
> An easy solution is to implement an balanced block placement policy which 
> will choose low used percent datanodes for new blocks with a little high 
> possibility. In a not long term, the used percent of datanodes will trend to 
> be balanced.
> Suggestions and discussions are welcomed. Thanks



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-8960) DFS client says "no more good datanodes being available to try" on a single drive failure

2016-01-27 Thread Ruslan Dautkhanov (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15119793#comment-15119793
 ] 

Ruslan Dautkhanov commented on HDFS-8960:
-

We seems getting the same problem on a Hive job too:

{quote}
Error: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: Failed 
to replace a bad datanode on the existing pipeline due to no more good 
datanodes being available to try. (Nodes: 
current=[DatanodeInfoWithStorage[10.20.32.60:1004,DS-1cc9c7cd-f1f9-4cad-b6e2-c9821d644033,DISK]],
 
original=[DatanodeInfoWithStorage[10.20.32.60:1004,DS-1cc9c7cd-f1f9-4cad-b6e2-c9821d644033,DISK]]).
 The current failed datanode replacement policy is DEFAULT, and a client may 
configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' 
in its configuration. at 
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:265) at 
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444) at 
org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) at 
org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at 
java.security.AccessController.doPrivileged(Native Method) at 
javax.security.auth.Subject.doAs(Subject.java:415) at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: 
org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: Failed 
to replace a bad datanode on the existing pipeline due to no more good 
datanodes being available to try. (Nodes: 
current=[DatanodeInfoWithStorage[10.20.32.60:1004,DS-1cc9c7cd-f1f9-4cad-b6e2-c9821d644033,DISK]],
 
original=[DatanodeInfoWithStorage[10.20.32.60:1004,DS-1cc9c7cd-f1f9-4cad-b6e2-c9821d644033,DISK]]).
 The current failed datanode replacement policy is DEFAULT, and a client may 
configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' 
in its configuration. at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:729)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:1047)
 at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.flushHashTable(GroupByOperator.java:1015)
 at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:833)
 at 
{quote}

> DFS client says "no more good datanodes being available to try" on a single 
> drive failure
> -
>
> Key: HDFS-8960
> URL: https://issues.apache.org/jira/browse/HDFS-8960
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.7.1
> Environment: openjdk version "1.8.0_45-internal"
> OpenJDK Runtime Environment (build 1.8.0_45-internal-b14)
> OpenJDK 64-Bit Server VM (build 25.45-b02, mixed mode)
>Reporter: Benoit Sigoure
> Attachments: blk_1073817519_77099.log, r12s13-datanode.log, 
> r12s16-datanode.log
>
>
> Since we upgraded to 2.7.1 we regularly see single-drive failures cause 
> widespread problems at the HBase level (with the default 3x replication 
> target).
> Here's an example.  This HBase RegionServer is r12s16 (172.24.32.16) and is 
> writing its WAL to [172.24.32.16:10110, 172.24.32.8:10110, 
> 172.24.32.13:10110] as can be seen by the following occasional messages:
> {code}
> 2015-08-23 06:28:40,272 INFO  [sync.3] wal.FSHLog: Slow sync cost: 123 ms, 
> current pipeline: [172.24.32.16:10110, 172.24.32.8:10110, 172.24.32.13:10110]
> {code}
> A bit later, the second node in the pipeline above is going to experience an 
> HDD failure.
> {code}
> 2015-08-23 07:21:58,720 WARN  [DataStreamer for file 
> /hbase/WALs/r12s16.sjc.aristanetworks.com,9104,1439917659071/r12s16.sjc.aristanetworks.com%2C9104%2C1439917659071.default.1440314434998
>  block BP-1466258523-172.24.32.1-1437768622582:blk_1073817519_77099] 
> hdfs.DFSClient: Error Recovery for block 
> BP-1466258523-172.24.32.1-1437768622582:blk_1073817519_77099 in pipeline 
> 172.24.32.16:10110, 172.24.32.13:10110, 172.24.32.8:10110: bad datanode 
> 172.24.32.8:10110
> {code}
> And then HBase will go like "omg I can't write to my WAL, let me commit 
> suicide".
> {code}
> 2015-08-23 07:22:26,060 FATAL 
> [regionserver/r12s16.sjc.aristanetworks.com/172.24.32.16:9104.append-pool1-t1]
>  wal.FSHLog: Could not append. Requesting close of wal
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[172.24.32.16:10110, 172.24.32.13:10110], 
> original=[172.24.32.16:10110, 172.24.32.13:10110]). The current failed 
> datanode replacement policy is DEFAULT, and a client may configure this via 

[jira] [Commented] (HDFS-6255) fuse_dfs will not adhere to ACL permissions in some cases

2016-01-19 Thread Ruslan Dautkhanov (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15107321#comment-15107321
 ] 

Ruslan Dautkhanov commented on HDFS-6255:
-

It makes sense. Thanks a lot Chris!

> fuse_dfs will not adhere to ACL permissions in some cases
> -
>
> Key: HDFS-6255
> URL: https://issues.apache.org/jira/browse/HDFS-6255
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: fuse-dfs
>Affects Versions: 3.0.0, 2.4.0
>Reporter: Stephen Chu
>Assignee: Chris Nauroth
>
> As hdfs user, I created a directory /tmp/acl_dir/ and set permissions to 700. 
> Then I set a new acl group:jenkins:rwx on /tmp/acl_dir.
> {code}
> jenkins@hdfs-vanilla-1 ~]$ hdfs dfs -getfacl /tmp/acl_dir
> # file: /tmp/acl_dir
> # owner: hdfs
> # group: supergroup
> user::rwx
> group::---
> group:jenkins:rwx
> mask::rwx
> other::---
> {code}
> Through the FsShell, the jenkins user can list /tmp/acl_dir as well as create 
> a file and directory inside.
> {code}
> [jenkins@hdfs-vanilla-1 ~]$ hdfs dfs -touchz /tmp/acl_dir/testfile1
> [jenkins@hdfs-vanilla-1 ~]$ hdfs dfs -mkdir /tmp/acl_dir/testdir1
> hdfs dfs -ls /tmp/acl[jenkins@hdfs-vanilla-1 ~]$ hdfs dfs -ls /tmp/acl_dir/
> Found 2 items
> drwxr-xr-x   - jenkins supergroup  0 2014-04-17 19:11 
> /tmp/acl_dir/testdir1
> -rw-r--r--   1 jenkins supergroup  0 2014-04-17 19:11 
> /tmp/acl_dir/testfile1
> [jenkins@hdfs-vanilla-1 ~]$ 
> {code}
> However, as the same jenkins user, when I try to cd into /tmp/acl_dir using a 
> fuse_dfs mount, I get permission denied. Same permission denied when I try to 
> create or list files.
> {code}
> [jenkins@hdfs-vanilla-1 tmp]$ ls -l
> total 16
> drwxrwx--- 4 hdfsnobody 4096 Apr 17 19:11 acl_dir
> drwx-- 2 hdfsnobody 4096 Apr 17 18:30 acl_dir_2
> drwxr-xr-x 3 mapred  nobody 4096 Mar 11 03:53 mapred
> drwxr-xr-x 4 jenkins nobody 4096 Apr 17 07:25 testcli
> -rwx-- 1 hdfsnobody0 Apr  7 17:18 tf1
> [jenkins@hdfs-vanilla-1 tmp]$ cd acl_dir
> bash: cd: acl_dir: Permission denied
> [jenkins@hdfs-vanilla-1 tmp]$ touch acl_dir/testfile2
> touch: cannot touch `acl_dir/testfile2': Permission denied
> [jenkins@hdfs-vanilla-1 tmp]$ mkdir acl_dir/testdir2
> mkdir: cannot create directory `acl_dir/testdir2': Permission denied
> [jenkins@hdfs-vanilla-1 tmp]$ 
> {code}
> The fuse_dfs debug output doesn't show any error for the above operations:
> {code}
> unique: 18, opcode: OPENDIR (27), nodeid: 2, insize: 48
>unique: 18, success, outsize: 32
> unique: 19, opcode: READDIR (28), nodeid: 2, insize: 80
> readdir[0] from 0
>unique: 19, success, outsize: 312
> unique: 20, opcode: GETATTR (3), nodeid: 2, insize: 56
> getattr /tmp
>unique: 20, success, outsize: 120
> unique: 21, opcode: READDIR (28), nodeid: 2, insize: 80
>unique: 21, success, outsize: 16
> unique: 22, opcode: RELEASEDIR (29), nodeid: 2, insize: 64
>unique: 22, success, outsize: 16
> unique: 23, opcode: GETATTR (3), nodeid: 2, insize: 56
> getattr /tmp
>unique: 23, success, outsize: 120
> unique: 24, opcode: GETATTR (3), nodeid: 3, insize: 56
> getattr /tmp/acl_dir
>unique: 24, success, outsize: 120
> unique: 25, opcode: GETATTR (3), nodeid: 3, insize: 56
> getattr /tmp/acl_dir
>unique: 25, success, outsize: 120
> unique: 26, opcode: GETATTR (3), nodeid: 3, insize: 56
> getattr /tmp/acl_dir
>unique: 26, success, outsize: 120
> unique: 27, opcode: GETATTR (3), nodeid: 3, insize: 56
> getattr /tmp/acl_dir
>unique: 27, success, outsize: 120
> unique: 28, opcode: GETATTR (3), nodeid: 3, insize: 56
> getattr /tmp/acl_dir
>unique: 28, success, outsize: 120
> {code}
> In other scenarios, ACL permissions are enforced successfully. For example, 
> as hdfs user I create /tmp/acl_dir_2 and set permissions to 777. I then set 
> the acl user:jenkins:--- on the directory. On the fuse mount, I am not able 
> to ls, mkdir, or touch to that directory as jenkins user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6255) fuse_dfs will not adhere to ACL permissions in some cases

2016-01-18 Thread Ruslan Dautkhanov (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15105992#comment-15105992
 ] 

Ruslan Dautkhanov commented on HDFS-6255:
-

Chris, 

Thank you for prompt response.

Yep, we have hadoop-fuse-dfs mount:

$ grep hadoop /etc/fstab
hadoop-fuse-dfs#dfs://epsdatalake /hdfs_mount fuse usetrash,rw 0 0

It is not picking up ACLs at all.

Test 1 - doesn't work through fuse mount:

$ ls -l /hdfs_mount/agility
ls: cannot open directory /hdfs_mount/agility: Permission denied

Test 2 - work through hadoop fs commands:

$ hadoop fs -ls /agility/
Found 6 items
. . .  /skip 6 lines/

$ hadoop fs -ls / | grep agility
dr-xr-x---+  - user1 group1   0 2016-01-14 13:25 /agility

Hadoop/HDFS 2.6 (CDH 5.5.1), but it was always a problem for us for all older 
version we have used.

> fuse_dfs will not adhere to ACL permissions in some cases
> -
>
> Key: HDFS-6255
> URL: https://issues.apache.org/jira/browse/HDFS-6255
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fuse-dfs
>Affects Versions: 3.0.0, 2.4.0
>Reporter: Stephen Chu
>Assignee: Chris Nauroth
>
> As hdfs user, I created a directory /tmp/acl_dir/ and set permissions to 700. 
> Then I set a new acl group:jenkins:rwx on /tmp/acl_dir.
> {code}
> jenkins@hdfs-vanilla-1 ~]$ hdfs dfs -getfacl /tmp/acl_dir
> # file: /tmp/acl_dir
> # owner: hdfs
> # group: supergroup
> user::rwx
> group::---
> group:jenkins:rwx
> mask::rwx
> other::---
> {code}
> Through the FsShell, the jenkins user can list /tmp/acl_dir as well as create 
> a file and directory inside.
> {code}
> [jenkins@hdfs-vanilla-1 ~]$ hdfs dfs -touchz /tmp/acl_dir/testfile1
> [jenkins@hdfs-vanilla-1 ~]$ hdfs dfs -mkdir /tmp/acl_dir/testdir1
> hdfs dfs -ls /tmp/acl[jenkins@hdfs-vanilla-1 ~]$ hdfs dfs -ls /tmp/acl_dir/
> Found 2 items
> drwxr-xr-x   - jenkins supergroup  0 2014-04-17 19:11 
> /tmp/acl_dir/testdir1
> -rw-r--r--   1 jenkins supergroup  0 2014-04-17 19:11 
> /tmp/acl_dir/testfile1
> [jenkins@hdfs-vanilla-1 ~]$ 
> {code}
> However, as the same jenkins user, when I try to cd into /tmp/acl_dir using a 
> fuse_dfs mount, I get permission denied. Same permission denied when I try to 
> create or list files.
> {code}
> [jenkins@hdfs-vanilla-1 tmp]$ ls -l
> total 16
> drwxrwx--- 4 hdfsnobody 4096 Apr 17 19:11 acl_dir
> drwx-- 2 hdfsnobody 4096 Apr 17 18:30 acl_dir_2
> drwxr-xr-x 3 mapred  nobody 4096 Mar 11 03:53 mapred
> drwxr-xr-x 4 jenkins nobody 4096 Apr 17 07:25 testcli
> -rwx-- 1 hdfsnobody0 Apr  7 17:18 tf1
> [jenkins@hdfs-vanilla-1 tmp]$ cd acl_dir
> bash: cd: acl_dir: Permission denied
> [jenkins@hdfs-vanilla-1 tmp]$ touch acl_dir/testfile2
> touch: cannot touch `acl_dir/testfile2': Permission denied
> [jenkins@hdfs-vanilla-1 tmp]$ mkdir acl_dir/testdir2
> mkdir: cannot create directory `acl_dir/testdir2': Permission denied
> [jenkins@hdfs-vanilla-1 tmp]$ 
> {code}
> The fuse_dfs debug output doesn't show any error for the above operations:
> {code}
> unique: 18, opcode: OPENDIR (27), nodeid: 2, insize: 48
>unique: 18, success, outsize: 32
> unique: 19, opcode: READDIR (28), nodeid: 2, insize: 80
> readdir[0] from 0
>unique: 19, success, outsize: 312
> unique: 20, opcode: GETATTR (3), nodeid: 2, insize: 56
> getattr /tmp
>unique: 20, success, outsize: 120
> unique: 21, opcode: READDIR (28), nodeid: 2, insize: 80
>unique: 21, success, outsize: 16
> unique: 22, opcode: RELEASEDIR (29), nodeid: 2, insize: 64
>unique: 22, success, outsize: 16
> unique: 23, opcode: GETATTR (3), nodeid: 2, insize: 56
> getattr /tmp
>unique: 23, success, outsize: 120
> unique: 24, opcode: GETATTR (3), nodeid: 3, insize: 56
> getattr /tmp/acl_dir
>unique: 24, success, outsize: 120
> unique: 25, opcode: GETATTR (3), nodeid: 3, insize: 56
> getattr /tmp/acl_dir
>unique: 25, success, outsize: 120
> unique: 26, opcode: GETATTR (3), nodeid: 3, insize: 56
> getattr /tmp/acl_dir
>unique: 26, success, outsize: 120
> unique: 27, opcode: GETATTR (3), nodeid: 3, insize: 56
> getattr /tmp/acl_dir
>unique: 27, success, outsize: 120
> unique: 28, opcode: GETATTR (3), nodeid: 3, insize: 56
> getattr /tmp/acl_dir
>unique: 28, success, outsize: 120
> {code}
> In other scenarios, ACL permissions are enforced successfully. For example, 
> as hdfs user I create /tmp/acl_dir_2 and set permissions to 777. I then set 
> the acl user:jenkins:--- on the directory. On the fuse mount, I am not able 
> to ls, mkdir, or touch to that directory as jenkins user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6255) fuse_dfs will not adhere to ACL permissions in some cases

2016-01-18 Thread Ruslan Dautkhanov (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15106052#comment-15106052
 ] 

Ruslan Dautkhanov commented on HDFS-6255:
-

I jsut used Navigator to see if there are hdfs denials .. nope it does not have 
enything. So you're right, it looks like rejected directly at FUSE layer.. Do 
you know any possible workarounds for fuse to respect HDFS ACLs? Thank you for 
quick turnaround.

> fuse_dfs will not adhere to ACL permissions in some cases
> -
>
> Key: HDFS-6255
> URL: https://issues.apache.org/jira/browse/HDFS-6255
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fuse-dfs
>Affects Versions: 3.0.0, 2.4.0
>Reporter: Stephen Chu
>Assignee: Chris Nauroth
>
> As hdfs user, I created a directory /tmp/acl_dir/ and set permissions to 700. 
> Then I set a new acl group:jenkins:rwx on /tmp/acl_dir.
> {code}
> jenkins@hdfs-vanilla-1 ~]$ hdfs dfs -getfacl /tmp/acl_dir
> # file: /tmp/acl_dir
> # owner: hdfs
> # group: supergroup
> user::rwx
> group::---
> group:jenkins:rwx
> mask::rwx
> other::---
> {code}
> Through the FsShell, the jenkins user can list /tmp/acl_dir as well as create 
> a file and directory inside.
> {code}
> [jenkins@hdfs-vanilla-1 ~]$ hdfs dfs -touchz /tmp/acl_dir/testfile1
> [jenkins@hdfs-vanilla-1 ~]$ hdfs dfs -mkdir /tmp/acl_dir/testdir1
> hdfs dfs -ls /tmp/acl[jenkins@hdfs-vanilla-1 ~]$ hdfs dfs -ls /tmp/acl_dir/
> Found 2 items
> drwxr-xr-x   - jenkins supergroup  0 2014-04-17 19:11 
> /tmp/acl_dir/testdir1
> -rw-r--r--   1 jenkins supergroup  0 2014-04-17 19:11 
> /tmp/acl_dir/testfile1
> [jenkins@hdfs-vanilla-1 ~]$ 
> {code}
> However, as the same jenkins user, when I try to cd into /tmp/acl_dir using a 
> fuse_dfs mount, I get permission denied. Same permission denied when I try to 
> create or list files.
> {code}
> [jenkins@hdfs-vanilla-1 tmp]$ ls -l
> total 16
> drwxrwx--- 4 hdfsnobody 4096 Apr 17 19:11 acl_dir
> drwx-- 2 hdfsnobody 4096 Apr 17 18:30 acl_dir_2
> drwxr-xr-x 3 mapred  nobody 4096 Mar 11 03:53 mapred
> drwxr-xr-x 4 jenkins nobody 4096 Apr 17 07:25 testcli
> -rwx-- 1 hdfsnobody0 Apr  7 17:18 tf1
> [jenkins@hdfs-vanilla-1 tmp]$ cd acl_dir
> bash: cd: acl_dir: Permission denied
> [jenkins@hdfs-vanilla-1 tmp]$ touch acl_dir/testfile2
> touch: cannot touch `acl_dir/testfile2': Permission denied
> [jenkins@hdfs-vanilla-1 tmp]$ mkdir acl_dir/testdir2
> mkdir: cannot create directory `acl_dir/testdir2': Permission denied
> [jenkins@hdfs-vanilla-1 tmp]$ 
> {code}
> The fuse_dfs debug output doesn't show any error for the above operations:
> {code}
> unique: 18, opcode: OPENDIR (27), nodeid: 2, insize: 48
>unique: 18, success, outsize: 32
> unique: 19, opcode: READDIR (28), nodeid: 2, insize: 80
> readdir[0] from 0
>unique: 19, success, outsize: 312
> unique: 20, opcode: GETATTR (3), nodeid: 2, insize: 56
> getattr /tmp
>unique: 20, success, outsize: 120
> unique: 21, opcode: READDIR (28), nodeid: 2, insize: 80
>unique: 21, success, outsize: 16
> unique: 22, opcode: RELEASEDIR (29), nodeid: 2, insize: 64
>unique: 22, success, outsize: 16
> unique: 23, opcode: GETATTR (3), nodeid: 2, insize: 56
> getattr /tmp
>unique: 23, success, outsize: 120
> unique: 24, opcode: GETATTR (3), nodeid: 3, insize: 56
> getattr /tmp/acl_dir
>unique: 24, success, outsize: 120
> unique: 25, opcode: GETATTR (3), nodeid: 3, insize: 56
> getattr /tmp/acl_dir
>unique: 25, success, outsize: 120
> unique: 26, opcode: GETATTR (3), nodeid: 3, insize: 56
> getattr /tmp/acl_dir
>unique: 26, success, outsize: 120
> unique: 27, opcode: GETATTR (3), nodeid: 3, insize: 56
> getattr /tmp/acl_dir
>unique: 27, success, outsize: 120
> unique: 28, opcode: GETATTR (3), nodeid: 3, insize: 56
> getattr /tmp/acl_dir
>unique: 28, success, outsize: 120
> {code}
> In other scenarios, ACL permissions are enforced successfully. For example, 
> as hdfs user I create /tmp/acl_dir_2 and set permissions to 777. I then set 
> the acl user:jenkins:--- on the directory. On the fuse mount, I am not able 
> to ls, mkdir, or touch to that directory as jenkins user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6255) fuse_dfs will not adhere to ACL permissions in some cases

2016-01-18 Thread Ruslan Dautkhanov (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15105940#comment-15105940
 ] 

Ruslan Dautkhanov commented on HDFS-6255:
-

Hi Chris Nauroth,

> In its default configuration, fuse mounts are only accessible by one user: 
> the user who performed the mount

We have a kerberized cluster and hdfs fuse mounts use whoever accessed that 
mount. It uses Kerberos authentication properly.
Although we still have problem that hdfs fuse mounts don't use ACLs, it's only 
basic access permissions (normal UNIX's owner- group- other permission that 
count).

We still think that there is a problem and it would be great if somebody could 
have a look at this bug.

Thank you.

> fuse_dfs will not adhere to ACL permissions in some cases
> -
>
> Key: HDFS-6255
> URL: https://issues.apache.org/jira/browse/HDFS-6255
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fuse-dfs
>Affects Versions: 3.0.0, 2.4.0
>Reporter: Stephen Chu
>Assignee: Chris Nauroth
>
> As hdfs user, I created a directory /tmp/acl_dir/ and set permissions to 700. 
> Then I set a new acl group:jenkins:rwx on /tmp/acl_dir.
> {code}
> jenkins@hdfs-vanilla-1 ~]$ hdfs dfs -getfacl /tmp/acl_dir
> # file: /tmp/acl_dir
> # owner: hdfs
> # group: supergroup
> user::rwx
> group::---
> group:jenkins:rwx
> mask::rwx
> other::---
> {code}
> Through the FsShell, the jenkins user can list /tmp/acl_dir as well as create 
> a file and directory inside.
> {code}
> [jenkins@hdfs-vanilla-1 ~]$ hdfs dfs -touchz /tmp/acl_dir/testfile1
> [jenkins@hdfs-vanilla-1 ~]$ hdfs dfs -mkdir /tmp/acl_dir/testdir1
> hdfs dfs -ls /tmp/acl[jenkins@hdfs-vanilla-1 ~]$ hdfs dfs -ls /tmp/acl_dir/
> Found 2 items
> drwxr-xr-x   - jenkins supergroup  0 2014-04-17 19:11 
> /tmp/acl_dir/testdir1
> -rw-r--r--   1 jenkins supergroup  0 2014-04-17 19:11 
> /tmp/acl_dir/testfile1
> [jenkins@hdfs-vanilla-1 ~]$ 
> {code}
> However, as the same jenkins user, when I try to cd into /tmp/acl_dir using a 
> fuse_dfs mount, I get permission denied. Same permission denied when I try to 
> create or list files.
> {code}
> [jenkins@hdfs-vanilla-1 tmp]$ ls -l
> total 16
> drwxrwx--- 4 hdfsnobody 4096 Apr 17 19:11 acl_dir
> drwx-- 2 hdfsnobody 4096 Apr 17 18:30 acl_dir_2
> drwxr-xr-x 3 mapred  nobody 4096 Mar 11 03:53 mapred
> drwxr-xr-x 4 jenkins nobody 4096 Apr 17 07:25 testcli
> -rwx-- 1 hdfsnobody0 Apr  7 17:18 tf1
> [jenkins@hdfs-vanilla-1 tmp]$ cd acl_dir
> bash: cd: acl_dir: Permission denied
> [jenkins@hdfs-vanilla-1 tmp]$ touch acl_dir/testfile2
> touch: cannot touch `acl_dir/testfile2': Permission denied
> [jenkins@hdfs-vanilla-1 tmp]$ mkdir acl_dir/testdir2
> mkdir: cannot create directory `acl_dir/testdir2': Permission denied
> [jenkins@hdfs-vanilla-1 tmp]$ 
> {code}
> The fuse_dfs debug output doesn't show any error for the above operations:
> {code}
> unique: 18, opcode: OPENDIR (27), nodeid: 2, insize: 48
>unique: 18, success, outsize: 32
> unique: 19, opcode: READDIR (28), nodeid: 2, insize: 80
> readdir[0] from 0
>unique: 19, success, outsize: 312
> unique: 20, opcode: GETATTR (3), nodeid: 2, insize: 56
> getattr /tmp
>unique: 20, success, outsize: 120
> unique: 21, opcode: READDIR (28), nodeid: 2, insize: 80
>unique: 21, success, outsize: 16
> unique: 22, opcode: RELEASEDIR (29), nodeid: 2, insize: 64
>unique: 22, success, outsize: 16
> unique: 23, opcode: GETATTR (3), nodeid: 2, insize: 56
> getattr /tmp
>unique: 23, success, outsize: 120
> unique: 24, opcode: GETATTR (3), nodeid: 3, insize: 56
> getattr /tmp/acl_dir
>unique: 24, success, outsize: 120
> unique: 25, opcode: GETATTR (3), nodeid: 3, insize: 56
> getattr /tmp/acl_dir
>unique: 25, success, outsize: 120
> unique: 26, opcode: GETATTR (3), nodeid: 3, insize: 56
> getattr /tmp/acl_dir
>unique: 26, success, outsize: 120
> unique: 27, opcode: GETATTR (3), nodeid: 3, insize: 56
> getattr /tmp/acl_dir
>unique: 27, success, outsize: 120
> unique: 28, opcode: GETATTR (3), nodeid: 3, insize: 56
> getattr /tmp/acl_dir
>unique: 28, success, outsize: 120
> {code}
> In other scenarios, ACL permissions are enforced successfully. For example, 
> as hdfs user I create /tmp/acl_dir_2 and set permissions to 777. I then set 
> the acl user:jenkins:--- on the directory. On the fuse mount, I am not able 
> to ls, mkdir, or touch to that directory as jenkins user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)