[jira] [Commented] (HADOOP-18019) S3AFileSystem.s3GetFileStatus() doesn't find dir markers on minio

2021-11-24 Thread Ruslan Dautkhanov (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17448863#comment-17448863
 ] 

Ruslan Dautkhanov commented on HADOOP-18019:


[~y4m4] does this mean that other deployment method worked in Hadoop 3.2? I 
don't understand where this inconsistency is coming from. 

> S3AFileSystem.s3GetFileStatus() doesn't find dir markers on minio
> -
>
> Key: HADOOP-18019
> URL: https://issues.apache.org/jira/browse/HADOOP-18019
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 3.3.0, 3.3.1, 3.3.2
> Environment: minio s3-compatible storage
>Reporter: Ruslan Dautkhanov
>Priority: Major
>
> Repro code:
> {code:java}
> val conf = new Configuration()  
> conf.set("fs.s3a.endpoint", "http://127.0.0.1:9000;) 
> conf.set("fs.s3a.path.style.access", "true") 
> conf.set("fs.s3a.access.key", "user_access_key") 
> conf.set("fs.s3a.secret.key", "password")  
> val path = new Path("s3a://comcast-test")  
> val fs = path.getFileSystem(conf)  
> fs.mkdirs(new Path("/testdelta/_delta_log"))  
> fs.getFileStatus(new Path("/testdelta/_delta_log")){code}
> Fails with *FileNotFoundException fails* on Minio. The same code works in 
> real S3.
> It also works in Hadoop 3.2 with Minio and earlier versions.
> Only fails on 3.3 and newer Hadoop branches.
> The reason as discovered by [~sadikovi] is actually a more fundamental one - 
> Minio does not have empty directories (sort of), see 
> [https://github.com/minio/minio/issues/2423].
> This works in Hadoop 3.2 because of this infamous "Is this necessary?" block 
> of code
> [https://github.com/apache/hadoop/blob/branch-3.2.0/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L2204-L2223]
> that was removed in Hadoop 3.3 -
> [https://github.com/apache/hadoop/blob/branch-3.3.0/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L2179]
> and this causes the regression



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18019) Hadoop 3.3 regression in hadoop/fs/s3a/S3AFileSystem.s3GetFileStatus()

2021-11-19 Thread Ruslan Dautkhanov (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17446664#comment-17446664
 ] 

Ruslan Dautkhanov commented on HADOOP-18019:


[~ste...@apache.org] FYI if I am not mistaken you had a commit on 
S3AFileSystem.java last year that removed that code block. 

> Hadoop 3.3 regression in hadoop/fs/s3a/S3AFileSystem.s3GetFileStatus()
> --
>
> Key: HADOOP-18019
> URL: https://issues.apache.org/jira/browse/HADOOP-18019
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 3.3.0, 3.3.1, 3.3.2
>Reporter: Ruslan Dautkhanov
>Priority: Major
>
> Repro code:
> {code:java}
> val conf = new Configuration()  
> conf.set("fs.s3a.endpoint", "http://127.0.0.1:9000;) 
> conf.set("fs.s3a.path.style.access", "true") 
> conf.set("fs.s3a.access.key", "user_access_key") 
> conf.set("fs.s3a.secret.key", "password")  
> val path = new Path("s3a://comcast-test")  
> val fs = path.getFileSystem(conf)  
> fs.mkdirs(new Path("/testdelta/_delta_log"))  
> fs.getFileStatus(new Path("/testdelta/_delta_log")){code}
> Fails with *FileNotFoundException fails* on Minio. The same code works in 
> real S3.
> It also works in Hadoop 3.2 with Minio and earlier versions.
> Only fails on 3.3 and newer Hadoop branches.
> The reason as discovered by [~sadikovi] is actually a more fundamental one - 
> Minio does not have empty directories (sort of), see 
> [https://github.com/minio/minio/issues/2423].
> This works in Hadoop 3.2 because of this infamous "Is this necessary?" block 
> of code
> [https://github.com/apache/hadoop/blob/branch-3.2.0/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L2204-L2223]
> that was removed in Hadoop 3.3 -
> [https://github.com/apache/hadoop/blob/branch-3.3.0/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L2179]
> and this causes the regression



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-18019) Hadoop 3.3 regression in hadoop/fs/s3a/S3AFileSystem.s3GetFileStatus()

2021-11-19 Thread Ruslan Dautkhanov (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruslan Dautkhanov updated HADOOP-18019:
---
Description: 
Repro code:
{code:java}
val conf = new Configuration()  
conf.set("fs.s3a.endpoint", "http://127.0.0.1:9000;) 
conf.set("fs.s3a.path.style.access", "true") 
conf.set("fs.s3a.access.key", "user_access_key") 
conf.set("fs.s3a.secret.key", "password")  

val path = new Path("s3a://comcast-test")  
val fs = path.getFileSystem(conf)  
fs.mkdirs(new Path("/testdelta/_delta_log"))  
fs.getFileStatus(new Path("/testdelta/_delta_log")){code}
Fails with *FileNotFoundException fails* on Minio. The same code works in real 
S3.
It also works in Hadoop 3.2 with Minio and earlier versions.

Only fails on 3.3 and newer Hadoop branches.

The reason as discovered by [~sadikovi] is actually a more fundamental one - 
Minio does not have empty directories (sort of), see 
[https://github.com/minio/minio/issues/2423].

This works in Hadoop 3.2 because of this infamous "Is this necessary?" block of 
code
[https://github.com/apache/hadoop/blob/branch-3.2.0/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L2204-L2223]

that was removed in Hadoop 3.3 -
[https://github.com/apache/hadoop/blob/branch-3.3.0/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L2179]

and this causes the regression

  was:
Repro code:

{{}}
{code:java}
val conf = new Configuration()  
conf.set("fs.s3a.endpoint", "http://127.0.0.1:9000;) 
conf.set("fs.s3a.path.style.access", "true") 
conf.set("fs.s3a.access.key", "user_access_key") 
conf.set("fs.s3a.secret.key", "password")  

val path = new Path("s3a://comcast-test")  
val fs = path.getFileSystem(conf)  
fs.mkdirs(new Path("/testdelta/_delta_log"))  
fs.getFileStatus(new Path("/testdelta/_delta_log")){code}
{{}}
Fails with *FileNotFoundException fails* on Minio. The same code works in real 
S3.
It also works in Hadoop 3.2 with Minio and earlier versions.

Only fails on 3.3 and newer Hadoop branches.

The reason as discovered by [~sadikovi] is actually a more fundamental one - 
Minio does not have empty directories (sort of), see 
[https://github.com/minio/minio/issues/2423].

This works in Hadoop 3.2 because of this infamous "Is this necessary?" block of 
code
[https://github.com/apache/hadoop/blob/branch-3.2.0/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L2204-L2223]

that was removed in Hadoop 3.3 -
[https://github.com/apache/hadoop/blob/branch-3.3.0/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L2179]

and this causes the regression


> Hadoop 3.3 regression in hadoop/fs/s3a/S3AFileSystem.s3GetFileStatus()
> --
>
> Key: HADOOP-18019
> URL: https://issues.apache.org/jira/browse/HADOOP-18019
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 3.3.0, 3.3.1, 3.3.2
>Reporter: Ruslan Dautkhanov
>Priority: Major
>
> Repro code:
> {code:java}
> val conf = new Configuration()  
> conf.set("fs.s3a.endpoint", "http://127.0.0.1:9000;) 
> conf.set("fs.s3a.path.style.access", "true") 
> conf.set("fs.s3a.access.key", "user_access_key") 
> conf.set("fs.s3a.secret.key", "password")  
> val path = new Path("s3a://comcast-test")  
> val fs = path.getFileSystem(conf)  
> fs.mkdirs(new Path("/testdelta/_delta_log"))  
> fs.getFileStatus(new Path("/testdelta/_delta_log")){code}
> Fails with *FileNotFoundException fails* on Minio. The same code works in 
> real S3.
> It also works in Hadoop 3.2 with Minio and earlier versions.
> Only fails on 3.3 and newer Hadoop branches.
> The reason as discovered by [~sadikovi] is actually a more fundamental one - 
> Minio does not have empty directories (sort of), see 
> [https://github.com/minio/minio/issues/2423].
> This works in Hadoop 3.2 because of this infamous "Is this necessary?" block 
> of code
> [https://github.com/apache/hadoop/blob/branch-3.2.0/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L2204-L2223]
> that was removed in Hadoop 3.3 -
> [https://github.com/apache/hadoop/blob/branch-3.3.0/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L2179]
> and this causes the regression



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-18019) Hadoop 3.3 regression in hadoop/fs/s3a/S3AFileSystem.s3GetFileStatus()

2021-11-19 Thread Ruslan Dautkhanov (Jira)
Ruslan Dautkhanov created HADOOP-18019:
--

 Summary: Hadoop 3.3 regression in 
hadoop/fs/s3a/S3AFileSystem.s3GetFileStatus()
 Key: HADOOP-18019
 URL: https://issues.apache.org/jira/browse/HADOOP-18019
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3
Affects Versions: 3.3.1, 3.3.0, 3.3.2
Reporter: Ruslan Dautkhanov


Repro code:

{{}}
{code:java}
val conf = new Configuration()  
conf.set("fs.s3a.endpoint", "http://127.0.0.1:9000;) 
conf.set("fs.s3a.path.style.access", "true") 
conf.set("fs.s3a.access.key", "user_access_key") 
conf.set("fs.s3a.secret.key", "password")  

val path = new Path("s3a://comcast-test")  
val fs = path.getFileSystem(conf)  
fs.mkdirs(new Path("/testdelta/_delta_log"))  
fs.getFileStatus(new Path("/testdelta/_delta_log")){code}
{{}}
Fails with *FileNotFoundException fails* on Minio. The same code works in real 
S3.
It also works in Hadoop 3.2 with Minio and earlier versions.

Only fails on 3.3 and newer Hadoop branches.

The reason as discovered by [~sadikovi] is actually a more fundamental one - 
Minio does not have empty directories (sort of), see 
[https://github.com/minio/minio/issues/2423].

This works in Hadoop 3.2 because of this infamous "Is this necessary?" block of 
code
[https://github.com/apache/hadoop/blob/branch-3.2.0/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L2204-L2223]

that was removed in Hadoop 3.3 -
[https://github.com/apache/hadoop/blob/branch-3.3.0/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L2179]

and this causes the regression



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17231) empty getDefaultExtension() is ignored

2020-08-26 Thread Ruslan Dautkhanov (Jira)
Ruslan Dautkhanov created HADOOP-17231:
--

 Summary: empty getDefaultExtension() is ignored
 Key: HADOOP-17231
 URL: https://issues.apache.org/jira/browse/HADOOP-17231
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 3.1.3, 3.2.0
Reporter: Ruslan Dautkhanov


Use case - source files are gz-compressed but have no extensions.

Attempt to auto-decompress them through 
{code:java}
package com.my.codec.test

import org.apache.hadoop.io.compress.GzipCodec

class GZCodec extends GzipCodec {
  override def getDefaultExtension(): String = ""
 }
{code}
 (notice empty getDefaultExtension ) and then setting *io.compression.codecs* 
to com.my.codec.test.GZCodec makes no effect 

Similar tests with one-character encoding for last possible names makes it 
work. So only the empty-string getDefaultExtension case is broken. 

I guess the issue is somewhere in 
[https://github.com/c9n/hadoop/blob/master/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/CompressionCodecFactory.java#L109]
 

but it's not obvious. 

Folks have built some workarounds using custom readers, for example, 
 # 
[https://daynebatten.com/2015/11/override-hadoop-compression-codec-file-extension/]
 # 
[https://stackoverflow.com/questions/52011697/how-to-read-a-compressed-gzip-file-without-extension-in-spark?rq=1]
 

Hopefully it would be an easy fix to support empty getDefaultExtension? 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16120) Lazily allocate KMS delegation tokens

2019-02-20 Thread Ruslan Dautkhanov (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-16120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16773161#comment-16773161
 ] 

Ruslan Dautkhanov commented on HADOOP-16120:


Thanks for explaining guys that it's not possible with the current Hadoop DT 
architecture.. 



> Lazily allocate KMS delegation tokens
> -
>
> Key: HADOOP-16120
> URL: https://issues.apache.org/jira/browse/HADOOP-16120
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: kms, security
>Affects Versions: 2.8.5, 3.1.2
>Reporter: Ruslan Dautkhanov
>Priority: Major
>
> We noticed that HDFS clients talk to KMS even when they try to access not 
> encrypted databases.. Is there is a way to make HDFS clients to talk to KMS 
> servers *only* when they need access to encrypted data? Since we will be 
> encrypting only one database (and 50+ other much more critical production 
> databases will not be encrypted), in case if KMS is down for maintenance or 
> for some other reason, we want to limit outage only to encrypted data.
> In other words, it would be great if KMS delegation toekns would be allocated 
> lazily - on first request to encrypted data.
> This could be a non-default option to lazily allocate KMS delegation tokens, 
> to improve availability of non-encrypted data.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16120) Lazily allocate KMS delegation tokens

2019-02-19 Thread Ruslan Dautkhanov (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruslan Dautkhanov updated HADOOP-16120:
---
Description: 
We noticed that HDFS clients talk to KMS even when they try to access not 
encrypted databases.. Is there is a way to make HDFS clients to talk to KMS 
servers *only* when they need access to encrypted data? Since we will be 
encrypting only one database (and 50+ other much more critical production 
databases will not be encrypted), in case if KMS is down for maintenance or for 
some other reason, we want to limit outage only to encrypted data.

In other words, it would be great if KMS delegation toekns would be allocated 
lazily - on first request to encrypted data.

This could be a non-default option to lazily allocate KMS delegation tokens, to 
improve availability of non-encrypted data.

 

  was:
We noticed that HDFS clients talk to KMS even when they try to access not 
encrypted databases.. Is there is a way to make HDFS clients to talk to KMS 
servers *only* when they need access to encrypted data? Since we will be 
encrypting only one database (and 50 other databases will not be encrypted), in 
case if KMS is down for maintenance or for some other reason, we want to limit 
outage only to encrypted data.

In other words, it would be great if KMS delegation toekns would be allocated 
lazily - on first request to encrypted data.


> Lazily allocate KMS delegation tokens
> -
>
> Key: HADOOP-16120
> URL: https://issues.apache.org/jira/browse/HADOOP-16120
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: kms, security
>Affects Versions: 2.8.5, 3.1.2
>Reporter: Ruslan Dautkhanov
>Priority: Major
>
> We noticed that HDFS clients talk to KMS even when they try to access not 
> encrypted databases.. Is there is a way to make HDFS clients to talk to KMS 
> servers *only* when they need access to encrypted data? Since we will be 
> encrypting only one database (and 50+ other much more critical production 
> databases will not be encrypted), in case if KMS is down for maintenance or 
> for some other reason, we want to limit outage only to encrypted data.
> In other words, it would be great if KMS delegation toekns would be allocated 
> lazily - on first request to encrypted data.
> This could be a non-default option to lazily allocate KMS delegation tokens, 
> to improve availability of non-encrypted data.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-16120) Lazily allocate KMS delegation tokens

2019-02-18 Thread Ruslan Dautkhanov (JIRA)
Ruslan Dautkhanov created HADOOP-16120:
--

 Summary: Lazily allocate KMS delegation tokens
 Key: HADOOP-16120
 URL: https://issues.apache.org/jira/browse/HADOOP-16120
 Project: Hadoop Common
  Issue Type: Improvement
  Components: kms, security
Affects Versions: 3.1.2, 2.8.5
Reporter: Ruslan Dautkhanov


We noticed that HDFS clients talk to KMS even when they try to access not 
encrypted databases.. Is there is a way to make HDFS clients to talk to KMS 
servers *only* when they need access to encrypted data? Since we will be 
encrypting only one database (and 50 other databases will not be encrypted), in 
case if KMS is down for maintenance or for some other reason, we want to limit 
outage only to encrypted data.

In other words, it would be great if KMS delegation toekns would be allocated 
lazily - on first request to encrypted data.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13340) Compress Hadoop Archive output

2018-01-11 Thread Ruslan Dautkhanov (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16322927#comment-16322927
 ] 

Ruslan Dautkhanov commented on HADOOP-13340:


I'd say the former approach (transparent compression) would be much more 
useful. And yes compressing multiple files would give much better compression 
especially when those are tiny files. I just thought that compressing 
individual files is easier to implement. 

> Compress Hadoop Archive output
> --
>
> Key: HADOOP-13340
> URL: https://issues.apache.org/jira/browse/HADOOP-13340
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: tools
>Affects Versions: 2.5.0
>Reporter: Duc Le Tu
>  Labels: features, performance
>
> Why Hadoop Archive tool cannot compress output like other map-reduce job? 
> I used some options like -D mapreduce.output.fileoutputformat.compress=true 
> -D 
> mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec
>  but it's not work. Did I wrong somewhere?
> If not, please support option for compress output of Hadoop Archive tool, 
> it's very neccessary for data retention for everyone (small files problem and 
> compress data).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13340) Compress Hadoop Archive output

2018-01-11 Thread Ruslan Dautkhanov (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16322663#comment-16322663
 ] 

Ruslan Dautkhanov commented on HADOOP-13340:


[~jlowe] A workaround might be to compress only files for which compression 
makes sense? For example it doesn't make a lot of sense to compress tiny files.
It may make sense to compress when files are over a few Kb. Not sure if a 
hard-coded source file size would do. If it's over a threshold, that one file 
will be compressed.

> Compress Hadoop Archive output
> --
>
> Key: HADOOP-13340
> URL: https://issues.apache.org/jira/browse/HADOOP-13340
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: tools
>Affects Versions: 2.5.0
>Reporter: Duc Le Tu
>  Labels: features, performance
>
> Why Hadoop Archive tool cannot compress output like other map-reduce job? 
> I used some options like -D mapreduce.output.fileoutputformat.compress=true 
> -D 
> mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec
>  but it's not work. Did I wrong somewhere?
> If not, please support option for compress output of Hadoop Archive tool, 
> it's very neccessary for data retention for everyone (small files problem and 
> compress data).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-10388) Pure native hadoop client

2017-09-07 Thread Ruslan Dautkhanov (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158022#comment-16158022
 ] 

Ruslan Dautkhanov commented on HADOOP-10388:


I think HDFS-6994 supersedes this one?

> Pure native hadoop client
> -
>
> Key: HADOOP-10388
> URL: https://issues.apache.org/jira/browse/HADOOP-10388
> Project: Hadoop Common
>  Issue Type: New Feature
>Affects Versions: HADOOP-10388
>Reporter: Binglin Chang
>Assignee: Colin P. McCabe
> Attachments: 2014-06-13_HADOOP-10388_design.pdf
>
>
> A pure native hadoop client has following use case/advantages:
> 1.  writing Yarn applications using c++
> 2.  direct access to HDFS, without extra proxy overhead, comparing to web/nfs 
> interface.
> 3.  wrap native library to support more languages, e.g. python
> 4.  lightweight, small footprint compare to several hundred MB of JDK and 
> hadoop library with various dependencies.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-12502) SetReplication OutOfMemoryError

2017-07-11 Thread Ruslan Dautkhanov (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081747#comment-16081747
 ] 

Ruslan Dautkhanov edited comment on HADOOP-12502 at 7/11/17 3:02 PM:
-

Is HDFS-12113 a duplicate of this jira? Very similar but has a bit different 
exception stack.


was (Author: tagar):
Is HDFS-12113 and duplicate of this jira? Very similar but has a bit different 
exception stack.

> SetReplication OutOfMemoryError
> ---
>
> Key: HADOOP-12502
> URL: https://issues.apache.org/jira/browse/HADOOP-12502
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.3.0
>Reporter: Philipp Schuegerl
>Assignee: Vinayakumar B
> Attachments: HADOOP-12502-01.patch, HADOOP-12502-02.patch, 
> HADOOP-12502-03.patch, HADOOP-12502-04.patch, HADOOP-12502-05.patch, 
> HADOOP-12502-06.patch
>
>
> Setting the replication of a HDFS folder recursively can run out of memory. 
> E.g. with a large /var/log directory:
> hdfs dfs -setrep -R -w 1 /var/log
> Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit 
> exceeded
>   at java.util.Arrays.copyOfRange(Arrays.java:2694)
>   at java.lang.String.(String.java:203)
>   at java.lang.String.substring(String.java:1913)
>   at java.net.URI$Parser.substring(URI.java:2850)
>   at java.net.URI$Parser.parse(URI.java:3046)
>   at java.net.URI.(URI.java:753)
>   at org.apache.hadoop.fs.Path.initialize(Path.java:203)
>   at org.apache.hadoop.fs.Path.(Path.java:116)
>   at org.apache.hadoop.fs.Path.(Path.java:94)
>   at 
> org.apache.hadoop.hdfs.protocol.HdfsFileStatus.getFullPath(HdfsFileStatus.java:222)
>   at 
> org.apache.hadoop.hdfs.protocol.HdfsFileStatus.makeQualified(HdfsFileStatus.java:246)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:689)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:102)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:712)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:708)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:708)
>   at 
> org.apache.hadoop.fs.shell.PathData.getDirectoryContents(PathData.java:268)
>   at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347)
>   at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308)
>   at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347)
>   at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308)
>   at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347)
>   at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308)
>   at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347)
>   at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308)
>   at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347)
>   at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308)
>   at 
> org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:278)
>   at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:260)
>   at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:244)
>   at 
> org.apache.hadoop.fs.shell.SetReplication.processArguments(SetReplication.java:76)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-12502) SetReplication OutOfMemoryError

2017-07-11 Thread Ruslan Dautkhanov (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081747#comment-16081747
 ] 

Ruslan Dautkhanov commented on HADOOP-12502:


Is HDFS-12113 and duplicate of this jira? Very similar but has a bit different 
exception stack.

> SetReplication OutOfMemoryError
> ---
>
> Key: HADOOP-12502
> URL: https://issues.apache.org/jira/browse/HADOOP-12502
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.3.0
>Reporter: Philipp Schuegerl
>Assignee: Vinayakumar B
> Attachments: HADOOP-12502-01.patch, HADOOP-12502-02.patch, 
> HADOOP-12502-03.patch, HADOOP-12502-04.patch, HADOOP-12502-05.patch, 
> HADOOP-12502-06.patch
>
>
> Setting the replication of a HDFS folder recursively can run out of memory. 
> E.g. with a large /var/log directory:
> hdfs dfs -setrep -R -w 1 /var/log
> Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit 
> exceeded
>   at java.util.Arrays.copyOfRange(Arrays.java:2694)
>   at java.lang.String.(String.java:203)
>   at java.lang.String.substring(String.java:1913)
>   at java.net.URI$Parser.substring(URI.java:2850)
>   at java.net.URI$Parser.parse(URI.java:3046)
>   at java.net.URI.(URI.java:753)
>   at org.apache.hadoop.fs.Path.initialize(Path.java:203)
>   at org.apache.hadoop.fs.Path.(Path.java:116)
>   at org.apache.hadoop.fs.Path.(Path.java:94)
>   at 
> org.apache.hadoop.hdfs.protocol.HdfsFileStatus.getFullPath(HdfsFileStatus.java:222)
>   at 
> org.apache.hadoop.hdfs.protocol.HdfsFileStatus.makeQualified(HdfsFileStatus.java:246)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:689)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:102)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:712)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:708)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:708)
>   at 
> org.apache.hadoop.fs.shell.PathData.getDirectoryContents(PathData.java:268)
>   at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347)
>   at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308)
>   at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347)
>   at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308)
>   at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347)
>   at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308)
>   at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347)
>   at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308)
>   at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347)
>   at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308)
>   at 
> org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:278)
>   at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:260)
>   at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:244)
>   at 
> org.apache.hadoop.fs.shell.SetReplication.processArguments(SetReplication.java:76)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-11004) NFS gateway doesn't respect HDFS extended ACLs

2017-05-09 Thread Ruslan Dautkhanov (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16003345#comment-16003345
 ] 

Ruslan Dautkhanov commented on HADOOP-11004:


https://tools.ietf.org/html/rfc3530#section-5.11

NFS v4 defines ACL explicitly in RFC 3530.

> NFS gateway doesn't respect HDFS extended ACLs
> --
>
> Key: HADOOP-11004
> URL: https://issues.apache.org/jira/browse/HADOOP-11004
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: nfs, security
>Affects Versions: 2.4.0
> Environment: HDP 2.1
>Reporter: Hari Sekhon
>
> I'm aware that the NFS gateway to HDFS doesn't work with secondary groups 
> until Hadoop 2.5 (HADOOP-10701) but I've also found that when setting 
> extended ACLs to allow the primary group of my regular user account I'm still 
> unable to access that directory in HDFS via the NFS gateway's mount point, 
> although I can via hadoop fs commands, indicating the NFS gateway isn't 
> respecting with HDFS extended ACLs. Nor do the existence of extended ACLS 
> show up via a plus sign after the rwx bits in the NFS directory listing as 
> they do in hadoop fs listing or as regular Linux extended ACLs both do.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-10051) winutil.exe is not included in 2.2.0 bin tarball

2015-09-14 Thread Ruslan Dautkhanov (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14743785#comment-14743785
 ] 

Ruslan Dautkhanov commented on HADOOP-10051:


Not fixed in 2.6

> winutil.exe is not included in 2.2.0 bin tarball
> 
>
> Key: HADOOP-10051
> URL: https://issues.apache.org/jira/browse/HADOOP-10051
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: bin
>Affects Versions: 2.2.0, 2.4.0, 2.5.0
>Reporter: Tsuyoshi Ozawa
>
> I don't have Windows environment, but one user who tried 2.2.0 release
> on Windows reported that released tar ball doesn't contain
> "winutil.exe" and cannot run any commands. I confirmed that winutil.exe is 
> not included in 2.2.0 bin tarball surely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)