[jira] [Commented] (HADOOP-18019) S3AFileSystem.s3GetFileStatus() doesn't find dir markers on minio
[ https://issues.apache.org/jira/browse/HADOOP-18019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17448863#comment-17448863 ] Ruslan Dautkhanov commented on HADOOP-18019: [~y4m4] does this mean that other deployment method worked in Hadoop 3.2? I don't understand where this inconsistency is coming from. > S3AFileSystem.s3GetFileStatus() doesn't find dir markers on minio > - > > Key: HADOOP-18019 > URL: https://issues.apache.org/jira/browse/HADOOP-18019 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 3.3.0, 3.3.1, 3.3.2 > Environment: minio s3-compatible storage >Reporter: Ruslan Dautkhanov >Priority: Major > > Repro code: > {code:java} > val conf = new Configuration() > conf.set("fs.s3a.endpoint", "http://127.0.0.1:9000;) > conf.set("fs.s3a.path.style.access", "true") > conf.set("fs.s3a.access.key", "user_access_key") > conf.set("fs.s3a.secret.key", "password") > val path = new Path("s3a://comcast-test") > val fs = path.getFileSystem(conf) > fs.mkdirs(new Path("/testdelta/_delta_log")) > fs.getFileStatus(new Path("/testdelta/_delta_log")){code} > Fails with *FileNotFoundException fails* on Minio. The same code works in > real S3. > It also works in Hadoop 3.2 with Minio and earlier versions. > Only fails on 3.3 and newer Hadoop branches. > The reason as discovered by [~sadikovi] is actually a more fundamental one - > Minio does not have empty directories (sort of), see > [https://github.com/minio/minio/issues/2423]. > This works in Hadoop 3.2 because of this infamous "Is this necessary?" block > of code > [https://github.com/apache/hadoop/blob/branch-3.2.0/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L2204-L2223] > that was removed in Hadoop 3.3 - > [https://github.com/apache/hadoop/blob/branch-3.3.0/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L2179] > and this causes the regression -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18019) Hadoop 3.3 regression in hadoop/fs/s3a/S3AFileSystem.s3GetFileStatus()
[ https://issues.apache.org/jira/browse/HADOOP-18019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17446664#comment-17446664 ] Ruslan Dautkhanov commented on HADOOP-18019: [~ste...@apache.org] FYI if I am not mistaken you had a commit on S3AFileSystem.java last year that removed that code block. > Hadoop 3.3 regression in hadoop/fs/s3a/S3AFileSystem.s3GetFileStatus() > -- > > Key: HADOOP-18019 > URL: https://issues.apache.org/jira/browse/HADOOP-18019 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 3.3.0, 3.3.1, 3.3.2 >Reporter: Ruslan Dautkhanov >Priority: Major > > Repro code: > {code:java} > val conf = new Configuration() > conf.set("fs.s3a.endpoint", "http://127.0.0.1:9000;) > conf.set("fs.s3a.path.style.access", "true") > conf.set("fs.s3a.access.key", "user_access_key") > conf.set("fs.s3a.secret.key", "password") > val path = new Path("s3a://comcast-test") > val fs = path.getFileSystem(conf) > fs.mkdirs(new Path("/testdelta/_delta_log")) > fs.getFileStatus(new Path("/testdelta/_delta_log")){code} > Fails with *FileNotFoundException fails* on Minio. The same code works in > real S3. > It also works in Hadoop 3.2 with Minio and earlier versions. > Only fails on 3.3 and newer Hadoop branches. > The reason as discovered by [~sadikovi] is actually a more fundamental one - > Minio does not have empty directories (sort of), see > [https://github.com/minio/minio/issues/2423]. > This works in Hadoop 3.2 because of this infamous "Is this necessary?" block > of code > [https://github.com/apache/hadoop/blob/branch-3.2.0/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L2204-L2223] > that was removed in Hadoop 3.3 - > [https://github.com/apache/hadoop/blob/branch-3.3.0/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L2179] > and this causes the regression -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18019) Hadoop 3.3 regression in hadoop/fs/s3a/S3AFileSystem.s3GetFileStatus()
[ https://issues.apache.org/jira/browse/HADOOP-18019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruslan Dautkhanov updated HADOOP-18019: --- Description: Repro code: {code:java} val conf = new Configuration() conf.set("fs.s3a.endpoint", "http://127.0.0.1:9000;) conf.set("fs.s3a.path.style.access", "true") conf.set("fs.s3a.access.key", "user_access_key") conf.set("fs.s3a.secret.key", "password") val path = new Path("s3a://comcast-test") val fs = path.getFileSystem(conf) fs.mkdirs(new Path("/testdelta/_delta_log")) fs.getFileStatus(new Path("/testdelta/_delta_log")){code} Fails with *FileNotFoundException fails* on Minio. The same code works in real S3. It also works in Hadoop 3.2 with Minio and earlier versions. Only fails on 3.3 and newer Hadoop branches. The reason as discovered by [~sadikovi] is actually a more fundamental one - Minio does not have empty directories (sort of), see [https://github.com/minio/minio/issues/2423]. This works in Hadoop 3.2 because of this infamous "Is this necessary?" block of code [https://github.com/apache/hadoop/blob/branch-3.2.0/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L2204-L2223] that was removed in Hadoop 3.3 - [https://github.com/apache/hadoop/blob/branch-3.3.0/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L2179] and this causes the regression was: Repro code: {{}} {code:java} val conf = new Configuration() conf.set("fs.s3a.endpoint", "http://127.0.0.1:9000;) conf.set("fs.s3a.path.style.access", "true") conf.set("fs.s3a.access.key", "user_access_key") conf.set("fs.s3a.secret.key", "password") val path = new Path("s3a://comcast-test") val fs = path.getFileSystem(conf) fs.mkdirs(new Path("/testdelta/_delta_log")) fs.getFileStatus(new Path("/testdelta/_delta_log")){code} {{}} Fails with *FileNotFoundException fails* on Minio. The same code works in real S3. It also works in Hadoop 3.2 with Minio and earlier versions. Only fails on 3.3 and newer Hadoop branches. The reason as discovered by [~sadikovi] is actually a more fundamental one - Minio does not have empty directories (sort of), see [https://github.com/minio/minio/issues/2423]. This works in Hadoop 3.2 because of this infamous "Is this necessary?" block of code [https://github.com/apache/hadoop/blob/branch-3.2.0/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L2204-L2223] that was removed in Hadoop 3.3 - [https://github.com/apache/hadoop/blob/branch-3.3.0/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L2179] and this causes the regression > Hadoop 3.3 regression in hadoop/fs/s3a/S3AFileSystem.s3GetFileStatus() > -- > > Key: HADOOP-18019 > URL: https://issues.apache.org/jira/browse/HADOOP-18019 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 3.3.0, 3.3.1, 3.3.2 >Reporter: Ruslan Dautkhanov >Priority: Major > > Repro code: > {code:java} > val conf = new Configuration() > conf.set("fs.s3a.endpoint", "http://127.0.0.1:9000;) > conf.set("fs.s3a.path.style.access", "true") > conf.set("fs.s3a.access.key", "user_access_key") > conf.set("fs.s3a.secret.key", "password") > val path = new Path("s3a://comcast-test") > val fs = path.getFileSystem(conf) > fs.mkdirs(new Path("/testdelta/_delta_log")) > fs.getFileStatus(new Path("/testdelta/_delta_log")){code} > Fails with *FileNotFoundException fails* on Minio. The same code works in > real S3. > It also works in Hadoop 3.2 with Minio and earlier versions. > Only fails on 3.3 and newer Hadoop branches. > The reason as discovered by [~sadikovi] is actually a more fundamental one - > Minio does not have empty directories (sort of), see > [https://github.com/minio/minio/issues/2423]. > This works in Hadoop 3.2 because of this infamous "Is this necessary?" block > of code > [https://github.com/apache/hadoop/blob/branch-3.2.0/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L2204-L2223] > that was removed in Hadoop 3.3 - > [https://github.com/apache/hadoop/blob/branch-3.3.0/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L2179] > and this causes the regression -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-18019) Hadoop 3.3 regression in hadoop/fs/s3a/S3AFileSystem.s3GetFileStatus()
Ruslan Dautkhanov created HADOOP-18019: -- Summary: Hadoop 3.3 regression in hadoop/fs/s3a/S3AFileSystem.s3GetFileStatus() Key: HADOOP-18019 URL: https://issues.apache.org/jira/browse/HADOOP-18019 Project: Hadoop Common Issue Type: Bug Components: fs/s3 Affects Versions: 3.3.1, 3.3.0, 3.3.2 Reporter: Ruslan Dautkhanov Repro code: {{}} {code:java} val conf = new Configuration() conf.set("fs.s3a.endpoint", "http://127.0.0.1:9000;) conf.set("fs.s3a.path.style.access", "true") conf.set("fs.s3a.access.key", "user_access_key") conf.set("fs.s3a.secret.key", "password") val path = new Path("s3a://comcast-test") val fs = path.getFileSystem(conf) fs.mkdirs(new Path("/testdelta/_delta_log")) fs.getFileStatus(new Path("/testdelta/_delta_log")){code} {{}} Fails with *FileNotFoundException fails* on Minio. The same code works in real S3. It also works in Hadoop 3.2 with Minio and earlier versions. Only fails on 3.3 and newer Hadoop branches. The reason as discovered by [~sadikovi] is actually a more fundamental one - Minio does not have empty directories (sort of), see [https://github.com/minio/minio/issues/2423]. This works in Hadoop 3.2 because of this infamous "Is this necessary?" block of code [https://github.com/apache/hadoop/blob/branch-3.2.0/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L2204-L2223] that was removed in Hadoop 3.3 - [https://github.com/apache/hadoop/blob/branch-3.3.0/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L2179] and this causes the regression -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-17231) empty getDefaultExtension() is ignored
Ruslan Dautkhanov created HADOOP-17231: -- Summary: empty getDefaultExtension() is ignored Key: HADOOP-17231 URL: https://issues.apache.org/jira/browse/HADOOP-17231 Project: Hadoop Common Issue Type: Bug Affects Versions: 3.1.3, 3.2.0 Reporter: Ruslan Dautkhanov Use case - source files are gz-compressed but have no extensions. Attempt to auto-decompress them through {code:java} package com.my.codec.test import org.apache.hadoop.io.compress.GzipCodec class GZCodec extends GzipCodec { override def getDefaultExtension(): String = "" } {code} (notice empty getDefaultExtension ) and then setting *io.compression.codecs* to com.my.codec.test.GZCodec makes no effect Similar tests with one-character encoding for last possible names makes it work. So only the empty-string getDefaultExtension case is broken. I guess the issue is somewhere in [https://github.com/c9n/hadoop/blob/master/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/CompressionCodecFactory.java#L109] but it's not obvious. Folks have built some workarounds using custom readers, for example, # [https://daynebatten.com/2015/11/override-hadoop-compression-codec-file-extension/] # [https://stackoverflow.com/questions/52011697/how-to-read-a-compressed-gzip-file-without-extension-in-spark?rq=1] Hopefully it would be an easy fix to support empty getDefaultExtension? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16120) Lazily allocate KMS delegation tokens
[ https://issues.apache.org/jira/browse/HADOOP-16120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16773161#comment-16773161 ] Ruslan Dautkhanov commented on HADOOP-16120: Thanks for explaining guys that it's not possible with the current Hadoop DT architecture.. > Lazily allocate KMS delegation tokens > - > > Key: HADOOP-16120 > URL: https://issues.apache.org/jira/browse/HADOOP-16120 > Project: Hadoop Common > Issue Type: Improvement > Components: kms, security >Affects Versions: 2.8.5, 3.1.2 >Reporter: Ruslan Dautkhanov >Priority: Major > > We noticed that HDFS clients talk to KMS even when they try to access not > encrypted databases.. Is there is a way to make HDFS clients to talk to KMS > servers *only* when they need access to encrypted data? Since we will be > encrypting only one database (and 50+ other much more critical production > databases will not be encrypted), in case if KMS is down for maintenance or > for some other reason, we want to limit outage only to encrypted data. > In other words, it would be great if KMS delegation toekns would be allocated > lazily - on first request to encrypted data. > This could be a non-default option to lazily allocate KMS delegation tokens, > to improve availability of non-encrypted data. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16120) Lazily allocate KMS delegation tokens
[ https://issues.apache.org/jira/browse/HADOOP-16120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruslan Dautkhanov updated HADOOP-16120: --- Description: We noticed that HDFS clients talk to KMS even when they try to access not encrypted databases.. Is there is a way to make HDFS clients to talk to KMS servers *only* when they need access to encrypted data? Since we will be encrypting only one database (and 50+ other much more critical production databases will not be encrypted), in case if KMS is down for maintenance or for some other reason, we want to limit outage only to encrypted data. In other words, it would be great if KMS delegation toekns would be allocated lazily - on first request to encrypted data. This could be a non-default option to lazily allocate KMS delegation tokens, to improve availability of non-encrypted data. was: We noticed that HDFS clients talk to KMS even when they try to access not encrypted databases.. Is there is a way to make HDFS clients to talk to KMS servers *only* when they need access to encrypted data? Since we will be encrypting only one database (and 50 other databases will not be encrypted), in case if KMS is down for maintenance or for some other reason, we want to limit outage only to encrypted data. In other words, it would be great if KMS delegation toekns would be allocated lazily - on first request to encrypted data. > Lazily allocate KMS delegation tokens > - > > Key: HADOOP-16120 > URL: https://issues.apache.org/jira/browse/HADOOP-16120 > Project: Hadoop Common > Issue Type: Improvement > Components: kms, security >Affects Versions: 2.8.5, 3.1.2 >Reporter: Ruslan Dautkhanov >Priority: Major > > We noticed that HDFS clients talk to KMS even when they try to access not > encrypted databases.. Is there is a way to make HDFS clients to talk to KMS > servers *only* when they need access to encrypted data? Since we will be > encrypting only one database (and 50+ other much more critical production > databases will not be encrypted), in case if KMS is down for maintenance or > for some other reason, we want to limit outage only to encrypted data. > In other words, it would be great if KMS delegation toekns would be allocated > lazily - on first request to encrypted data. > This could be a non-default option to lazily allocate KMS delegation tokens, > to improve availability of non-encrypted data. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-16120) Lazily allocate KMS delegation tokens
Ruslan Dautkhanov created HADOOP-16120: -- Summary: Lazily allocate KMS delegation tokens Key: HADOOP-16120 URL: https://issues.apache.org/jira/browse/HADOOP-16120 Project: Hadoop Common Issue Type: Improvement Components: kms, security Affects Versions: 3.1.2, 2.8.5 Reporter: Ruslan Dautkhanov We noticed that HDFS clients talk to KMS even when they try to access not encrypted databases.. Is there is a way to make HDFS clients to talk to KMS servers *only* when they need access to encrypted data? Since we will be encrypting only one database (and 50 other databases will not be encrypted), in case if KMS is down for maintenance or for some other reason, we want to limit outage only to encrypted data. In other words, it would be great if KMS delegation toekns would be allocated lazily - on first request to encrypted data. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13340) Compress Hadoop Archive output
[ https://issues.apache.org/jira/browse/HADOOP-13340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16322927#comment-16322927 ] Ruslan Dautkhanov commented on HADOOP-13340: I'd say the former approach (transparent compression) would be much more useful. And yes compressing multiple files would give much better compression especially when those are tiny files. I just thought that compressing individual files is easier to implement. > Compress Hadoop Archive output > -- > > Key: HADOOP-13340 > URL: https://issues.apache.org/jira/browse/HADOOP-13340 > Project: Hadoop Common > Issue Type: New Feature > Components: tools >Affects Versions: 2.5.0 >Reporter: Duc Le Tu > Labels: features, performance > > Why Hadoop Archive tool cannot compress output like other map-reduce job? > I used some options like -D mapreduce.output.fileoutputformat.compress=true > -D > mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec > but it's not work. Did I wrong somewhere? > If not, please support option for compress output of Hadoop Archive tool, > it's very neccessary for data retention for everyone (small files problem and > compress data). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13340) Compress Hadoop Archive output
[ https://issues.apache.org/jira/browse/HADOOP-13340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16322663#comment-16322663 ] Ruslan Dautkhanov commented on HADOOP-13340: [~jlowe] A workaround might be to compress only files for which compression makes sense? For example it doesn't make a lot of sense to compress tiny files. It may make sense to compress when files are over a few Kb. Not sure if a hard-coded source file size would do. If it's over a threshold, that one file will be compressed. > Compress Hadoop Archive output > -- > > Key: HADOOP-13340 > URL: https://issues.apache.org/jira/browse/HADOOP-13340 > Project: Hadoop Common > Issue Type: New Feature > Components: tools >Affects Versions: 2.5.0 >Reporter: Duc Le Tu > Labels: features, performance > > Why Hadoop Archive tool cannot compress output like other map-reduce job? > I used some options like -D mapreduce.output.fileoutputformat.compress=true > -D > mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec > but it's not work. Did I wrong somewhere? > If not, please support option for compress output of Hadoop Archive tool, > it's very neccessary for data retention for everyone (small files problem and > compress data). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-10388) Pure native hadoop client
[ https://issues.apache.org/jira/browse/HADOOP-10388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158022#comment-16158022 ] Ruslan Dautkhanov commented on HADOOP-10388: I think HDFS-6994 supersedes this one? > Pure native hadoop client > - > > Key: HADOOP-10388 > URL: https://issues.apache.org/jira/browse/HADOOP-10388 > Project: Hadoop Common > Issue Type: New Feature >Affects Versions: HADOOP-10388 >Reporter: Binglin Chang >Assignee: Colin P. McCabe > Attachments: 2014-06-13_HADOOP-10388_design.pdf > > > A pure native hadoop client has following use case/advantages: > 1. writing Yarn applications using c++ > 2. direct access to HDFS, without extra proxy overhead, comparing to web/nfs > interface. > 3. wrap native library to support more languages, e.g. python > 4. lightweight, small footprint compare to several hundred MB of JDK and > hadoop library with various dependencies. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-12502) SetReplication OutOfMemoryError
[ https://issues.apache.org/jira/browse/HADOOP-12502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081747#comment-16081747 ] Ruslan Dautkhanov edited comment on HADOOP-12502 at 7/11/17 3:02 PM: - Is HDFS-12113 a duplicate of this jira? Very similar but has a bit different exception stack. was (Author: tagar): Is HDFS-12113 and duplicate of this jira? Very similar but has a bit different exception stack. > SetReplication OutOfMemoryError > --- > > Key: HADOOP-12502 > URL: https://issues.apache.org/jira/browse/HADOOP-12502 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 2.3.0 >Reporter: Philipp Schuegerl >Assignee: Vinayakumar B > Attachments: HADOOP-12502-01.patch, HADOOP-12502-02.patch, > HADOOP-12502-03.patch, HADOOP-12502-04.patch, HADOOP-12502-05.patch, > HADOOP-12502-06.patch > > > Setting the replication of a HDFS folder recursively can run out of memory. > E.g. with a large /var/log directory: > hdfs dfs -setrep -R -w 1 /var/log > Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit > exceeded > at java.util.Arrays.copyOfRange(Arrays.java:2694) > at java.lang.String.(String.java:203) > at java.lang.String.substring(String.java:1913) > at java.net.URI$Parser.substring(URI.java:2850) > at java.net.URI$Parser.parse(URI.java:3046) > at java.net.URI.(URI.java:753) > at org.apache.hadoop.fs.Path.initialize(Path.java:203) > at org.apache.hadoop.fs.Path.(Path.java:116) > at org.apache.hadoop.fs.Path.(Path.java:94) > at > org.apache.hadoop.hdfs.protocol.HdfsFileStatus.getFullPath(HdfsFileStatus.java:222) > at > org.apache.hadoop.hdfs.protocol.HdfsFileStatus.makeQualified(HdfsFileStatus.java:246) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:689) > at > org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:102) > at > org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:712) > at > org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:708) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:708) > at > org.apache.hadoop.fs.shell.PathData.getDirectoryContents(PathData.java:268) > at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347) > at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308) > at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347) > at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308) > at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347) > at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308) > at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347) > at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308) > at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347) > at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308) > at > org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:278) > at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:260) > at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:244) > at > org.apache.hadoop.fs.shell.SetReplication.processArguments(SetReplication.java:76) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-12502) SetReplication OutOfMemoryError
[ https://issues.apache.org/jira/browse/HADOOP-12502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081747#comment-16081747 ] Ruslan Dautkhanov commented on HADOOP-12502: Is HDFS-12113 and duplicate of this jira? Very similar but has a bit different exception stack. > SetReplication OutOfMemoryError > --- > > Key: HADOOP-12502 > URL: https://issues.apache.org/jira/browse/HADOOP-12502 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 2.3.0 >Reporter: Philipp Schuegerl >Assignee: Vinayakumar B > Attachments: HADOOP-12502-01.patch, HADOOP-12502-02.patch, > HADOOP-12502-03.patch, HADOOP-12502-04.patch, HADOOP-12502-05.patch, > HADOOP-12502-06.patch > > > Setting the replication of a HDFS folder recursively can run out of memory. > E.g. with a large /var/log directory: > hdfs dfs -setrep -R -w 1 /var/log > Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit > exceeded > at java.util.Arrays.copyOfRange(Arrays.java:2694) > at java.lang.String.(String.java:203) > at java.lang.String.substring(String.java:1913) > at java.net.URI$Parser.substring(URI.java:2850) > at java.net.URI$Parser.parse(URI.java:3046) > at java.net.URI.(URI.java:753) > at org.apache.hadoop.fs.Path.initialize(Path.java:203) > at org.apache.hadoop.fs.Path.(Path.java:116) > at org.apache.hadoop.fs.Path.(Path.java:94) > at > org.apache.hadoop.hdfs.protocol.HdfsFileStatus.getFullPath(HdfsFileStatus.java:222) > at > org.apache.hadoop.hdfs.protocol.HdfsFileStatus.makeQualified(HdfsFileStatus.java:246) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:689) > at > org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:102) > at > org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:712) > at > org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:708) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:708) > at > org.apache.hadoop.fs.shell.PathData.getDirectoryContents(PathData.java:268) > at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347) > at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308) > at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347) > at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308) > at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347) > at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308) > at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347) > at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308) > at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347) > at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308) > at > org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:278) > at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:260) > at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:244) > at > org.apache.hadoop.fs.shell.SetReplication.processArguments(SetReplication.java:76) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-11004) NFS gateway doesn't respect HDFS extended ACLs
[ https://issues.apache.org/jira/browse/HADOOP-11004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16003345#comment-16003345 ] Ruslan Dautkhanov commented on HADOOP-11004: https://tools.ietf.org/html/rfc3530#section-5.11 NFS v4 defines ACL explicitly in RFC 3530. > NFS gateway doesn't respect HDFS extended ACLs > -- > > Key: HADOOP-11004 > URL: https://issues.apache.org/jira/browse/HADOOP-11004 > Project: Hadoop Common > Issue Type: Bug > Components: nfs, security >Affects Versions: 2.4.0 > Environment: HDP 2.1 >Reporter: Hari Sekhon > > I'm aware that the NFS gateway to HDFS doesn't work with secondary groups > until Hadoop 2.5 (HADOOP-10701) but I've also found that when setting > extended ACLs to allow the primary group of my regular user account I'm still > unable to access that directory in HDFS via the NFS gateway's mount point, > although I can via hadoop fs commands, indicating the NFS gateway isn't > respecting with HDFS extended ACLs. Nor do the existence of extended ACLS > show up via a plus sign after the rwx bits in the NFS directory listing as > they do in hadoop fs listing or as regular Linux extended ACLs both do. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-10051) winutil.exe is not included in 2.2.0 bin tarball
[ https://issues.apache.org/jira/browse/HADOOP-10051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14743785#comment-14743785 ] Ruslan Dautkhanov commented on HADOOP-10051: Not fixed in 2.6 > winutil.exe is not included in 2.2.0 bin tarball > > > Key: HADOOP-10051 > URL: https://issues.apache.org/jira/browse/HADOOP-10051 > Project: Hadoop Common > Issue Type: Bug > Components: bin >Affects Versions: 2.2.0, 2.4.0, 2.5.0 >Reporter: Tsuyoshi Ozawa > > I don't have Windows environment, but one user who tried 2.2.0 release > on Windows reported that released tar ball doesn't contain > "winutil.exe" and cannot run any commands. I confirmed that winutil.exe is > not included in 2.2.0 bin tarball surely. -- This message was sent by Atlassian JIRA (v6.3.4#6332)