[jira] [Created] (HADOOP-18447) Vectored IO: Threadpool should be closed on interrupts or during close calls
Rajesh Balamohan created HADOOP-18447: - Summary: Vectored IO: Threadpool should be closed on interrupts or during close calls Key: HADOOP-18447 URL: https://issues.apache.org/jira/browse/HADOOP-18447 Project: Hadoop Common Issue Type: Sub-task Components: common, fs, fs/adl, fs/s3 Reporter: Rajesh Balamohan Attachments: Screenshot 2022-09-08 at 9.22.07 AM.png Vectored IO threadpool should be closed on any interrupts or during S3AFileSystem/S3AInputStream close() calls. E.g Query which got cancelled in the middle of the run. However, in background (e.g LLAP) vectored IO threads continued to run. !Screenshot 2022-09-08 at 9.22.07 AM.png|width=537,height=164! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-18347) Restrict vectoredIO threadpool to reduce memory pressure
Rajesh Balamohan created HADOOP-18347: - Summary: Restrict vectoredIO threadpool to reduce memory pressure Key: HADOOP-18347 URL: https://issues.apache.org/jira/browse/HADOOP-18347 Project: Hadoop Common Issue Type: Sub-task Components: common, fs, fs/adl, fs/s3 Reporter: Rajesh Balamohan https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java#L964-L967 Currently, it fetches all the ranges with unbounded threadpool. This will not cause memory pressures with standard benchmarks like TPCDS. However, when large number of ranges are present with large files, this could potentially spike up memory usage of the task. Limiting the threadpool size could reduce the memory usage. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18106) Handle memory fragmentation in S3 Vectored IO implementation.
[ https://issues.apache.org/jira/browse/HADOOP-18106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17550781#comment-17550781 ] Rajesh Balamohan commented on HADOOP-18106: --- This will be applicable mainly for direct byte buffers. Otherwise, it will be auto released at the time of GC. > Handle memory fragmentation in S3 Vectored IO implementation. > - > > Key: HADOOP-18106 > URL: https://issues.apache.org/jira/browse/HADOOP-18106 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Mukund Thakur >Assignee: Mukund Thakur >Priority: Major > > As we have implemented merging of ranges in the S3AInputStream implementation > of vectored IO api, it can lead to memory fragmentation. Let me explain by > example. > > Suppose client requests for 3 ranges. > 0-500, 700-1000 and 1200-1500. > Now because of merging, all the above ranges will get merged into one and we > will allocate a big byte buffer of 0-1500 size but return sliced byte buffers > for the desired ranges. > Now once the client is done reading all the ranges, it will only be able to > free the memory for requested ranges and memory of the gaps will never be > released for eg here (500-700 and 1000-1200). > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-18115) EvaluatingStatisticsMap::entrySet may not need parallelstream
Rajesh Balamohan created HADOOP-18115: - Summary: EvaluatingStatisticsMap::entrySet may not need parallelstream Key: HADOOP-18115 URL: https://issues.apache.org/jira/browse/HADOOP-18115 Project: Hadoop Common Issue Type: Improvement Components: fs Reporter: Rajesh Balamohan Attachments: Screenshot 2022-02-04 at 11.10.39 AM.png When large number of S3AInputStreams are opened, it ends up showing up in profile, as internally parallelstream makes use of fork and join. If it is not mandatory, it can be refactored to get rid of parallelstream. Here is the relevant profile output for ref. !Screenshot 2022-02-04 at 11.10.39 AM.png|width=632,height=429! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-17531) DistCp: Reduce memory usage on copying huge directories
[ https://issues.apache.org/jira/browse/HADOOP-17531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286328#comment-17286328 ] Rajesh Balamohan commented on HADOOP-17531: --- [~ayushtkn]: Was the test tried out with HDFS or local fs? Doing that in S3 listing can give very different results. HADOOP-11827 tried to fix speed, compromising memory usage. > DistCp: Reduce memory usage on copying huge directories > --- > > Key: HADOOP-17531 > URL: https://issues.apache.org/jira/browse/HADOOP-17531 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Ayush Saxena >Priority: Critical > Attachments: MoveToStackIterator.patch, gc-NewD-512M-3.8ML.log > > > Presently distCp, uses the producer-consumer kind of setup while building the > listing, the input queue and output queue are both unbounded, thus the > listStatus grows quite huge. > Rel Code Part : > https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java#L635 > This goes on bredth-first traversal kind of stuff(uses queue instead of > earlier stack), so if you have files at lower depth, it will like open up the > entire tree and the start processing -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-17347) ABFS: Read optimizations
[ https://issues.apache.org/jira/browse/HADOOP-17347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257885#comment-17257885 ] Rajesh Balamohan commented on HADOOP-17347: --- >> If the read is for the last 8 bytes, read the full file. Can you plz share details on this? Does this mean that it is going to load 4 MB (or buffer size) worth of data during footer reads? If so, it would be expensive for short jobs that rely on footer reads. > ABFS: Read optimizations > > > Key: HADOOP-17347 > URL: https://issues.apache.org/jira/browse/HADOOP-17347 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 3.4.0 >Reporter: Bilahari T H >Assignee: Bilahari T H >Priority: Major > Labels: pull-request-available > Time Spent: 12h 10m > Remaining Estimate: 0h > > Optimize read performance for the following scenarios > # Read small files completely > Files that are of size smaller than the read buffer size can be considered > as small files. In case of such files it would be better to read the full > file into the AbfsInputStream buffer. > # Read last block if the read is for footer > If the read is for the last 8 bytes, read the full file. > This will optimize reads for parquet files. [Parquet file > format|https://www.ellicium.com/parquet-file-format-structure/] > Both these optimizations will be present under configs as follows > # fs.azure.read.smallfilescompletely > # fs.azure.read.optimizefooterread -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-17156) Clear readahead requests on stream close
[ https://issues.apache.org/jira/browse/HADOOP-17156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HADOOP-17156: -- Priority: Minor (was: Major) > Clear readahead requests on stream close > > > Key: HADOOP-17156 > URL: https://issues.apache.org/jira/browse/HADOOP-17156 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 3.3.0 >Reporter: Rajesh Balamohan >Priority: Minor > > It would be good to close/clear pending read ahead requests on stream close(). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-17156) Clear readahead requests on stream close
Rajesh Balamohan created HADOOP-17156: - Summary: Clear readahead requests on stream close Key: HADOOP-17156 URL: https://issues.apache.org/jira/browse/HADOOP-17156 Project: Hadoop Common Issue Type: Sub-task Components: fs/azure Affects Versions: 3.3.0 Reporter: Rajesh Balamohan It would be good to close/clear pending read ahead requests on stream close(). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-17020) RawFileSystem could localize default block size to avoid sync bottleneck in config
[ https://issues.apache.org/jira/browse/HADOOP-17020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17102474#comment-17102474 ] Rajesh Balamohan edited comment on HADOOP-17020 at 5/8/20, 11:09 AM: - Sure. Thanks [~ste...@apache.org] . Shared changes related to blocksize sync issue. PR: [https://github.com/apache/hadoop/pull/2002] was (Author: rajesh.balamohan): Sure. Thanks [~ste...@apache.org] . PR: https://github.com/apache/hadoop/pull/2002 > RawFileSystem could localize default block size to avoid sync bottleneck in > config > -- > > Key: HADOOP-17020 > URL: https://issues.apache.org/jira/browse/HADOOP-17020 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Reporter: Rajesh Balamohan >Priority: Minor > Attachments: HADOOP-17020.1.patch, Screenshot 2020-04-29 at 5.24.53 > PM.png, Screenshot 2020-05-01 at 7.12.06 AM.png > > > RawLocalFileSystem could localize default block size to avoid sync bottleneck > with Configuration object. > https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/RawLocalFileSystem.java#L666 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-17020) RawFileSystem could localize default block size to avoid sync bottleneck in config
[ https://issues.apache.org/jira/browse/HADOOP-17020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17102474#comment-17102474 ] Rajesh Balamohan commented on HADOOP-17020: --- Sure. Thanks [~ste...@apache.org] . PR: https://github.com/apache/hadoop/pull/2002 > RawFileSystem could localize default block size to avoid sync bottleneck in > config > -- > > Key: HADOOP-17020 > URL: https://issues.apache.org/jira/browse/HADOOP-17020 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Reporter: Rajesh Balamohan >Priority: Minor > Attachments: HADOOP-17020.1.patch, Screenshot 2020-04-29 at 5.24.53 > PM.png, Screenshot 2020-05-01 at 7.12.06 AM.png > > > RawLocalFileSystem could localize default block size to avoid sync bottleneck > with Configuration object. > https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/RawLocalFileSystem.java#L666 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-17020) RawFileSystem could localize default block size to avoid sync bottleneck in config
[ https://issues.apache.org/jira/browse/HADOOP-17020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HADOOP-17020: -- Status: Open (was: Patch Available) > RawFileSystem could localize default block size to avoid sync bottleneck in > config > -- > > Key: HADOOP-17020 > URL: https://issues.apache.org/jira/browse/HADOOP-17020 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Reporter: Rajesh Balamohan >Priority: Minor > Attachments: HADOOP-17020.1.patch, Screenshot 2020-04-29 at 5.24.53 > PM.png, Screenshot 2020-05-01 at 7.12.06 AM.png > > > RawLocalFileSystem could localize default block size to avoid sync bottleneck > with Configuration object. > https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/RawLocalFileSystem.java#L666 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-17020) RawFileSystem could localize default block size to avoid sync bottleneck in config
[ https://issues.apache.org/jira/browse/HADOOP-17020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HADOOP-17020: -- Status: Patch Available (was: Open) > RawFileSystem could localize default block size to avoid sync bottleneck in > config > -- > > Key: HADOOP-17020 > URL: https://issues.apache.org/jira/browse/HADOOP-17020 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Reporter: Rajesh Balamohan >Priority: Minor > Attachments: HADOOP-17020.1.patch, Screenshot 2020-04-29 at 5.24.53 > PM.png, Screenshot 2020-05-01 at 7.12.06 AM.png > > > RawLocalFileSystem could localize default block size to avoid sync bottleneck > with Configuration object. > https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/RawLocalFileSystem.java#L666 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-17020) RawFileSystem could localize default block size to avoid sync bottleneck in config
[ https://issues.apache.org/jira/browse/HADOOP-17020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HADOOP-17020: -- Attachment: HADOOP-17020.1.patch > RawFileSystem could localize default block size to avoid sync bottleneck in > config > -- > > Key: HADOOP-17020 > URL: https://issues.apache.org/jira/browse/HADOOP-17020 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Reporter: Rajesh Balamohan >Priority: Minor > Attachments: HADOOP-17020.1.patch, Screenshot 2020-04-29 at 5.24.53 > PM.png, Screenshot 2020-05-01 at 7.12.06 AM.png > > > RawLocalFileSystem could localize default block size to avoid sync bottleneck > with Configuration object. > https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/RawLocalFileSystem.java#L666 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-17020) RawFileSystem could localize default block size to avoid sync bottleneck in config
[ https://issues.apache.org/jira/browse/HADOOP-17020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17097127#comment-17097127 ] Rajesh Balamohan commented on HADOOP-17020: --- Also found similar kind of issue in mkdirs. !Screenshot 2020-05-01 at 7.12.06 AM.png! > RawFileSystem could localize default block size to avoid sync bottleneck in > config > -- > > Key: HADOOP-17020 > URL: https://issues.apache.org/jira/browse/HADOOP-17020 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Reporter: Rajesh Balamohan >Priority: Minor > Attachments: Screenshot 2020-04-29 at 5.24.53 PM.png, Screenshot > 2020-05-01 at 7.12.06 AM.png > > > RawLocalFileSystem could localize default block size to avoid sync bottleneck > with Configuration object. > https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/RawLocalFileSystem.java#L666 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-17020) RawFileSystem could localize default block size to avoid sync bottleneck in config
[ https://issues.apache.org/jira/browse/HADOOP-17020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17097127#comment-17097127 ] Rajesh Balamohan edited comment on HADOOP-17020 at 5/1/20, 2:17 AM: Also found similar kind of issue in mkdirs. !Screenshot 2020-05-01 at 7.12.06 AM.png|width=481,height=178! was (Author: rajesh.balamohan): Also found similar kind of issue in mkdirs. !Screenshot 2020-05-01 at 7.12.06 AM.png! > RawFileSystem could localize default block size to avoid sync bottleneck in > config > -- > > Key: HADOOP-17020 > URL: https://issues.apache.org/jira/browse/HADOOP-17020 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Reporter: Rajesh Balamohan >Priority: Minor > Attachments: Screenshot 2020-04-29 at 5.24.53 PM.png, Screenshot > 2020-05-01 at 7.12.06 AM.png > > > RawLocalFileSystem could localize default block size to avoid sync bottleneck > with Configuration object. > https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/RawLocalFileSystem.java#L666 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-17020) RawFileSystem could localize default block size to avoid sync bottleneck in config
[ https://issues.apache.org/jira/browse/HADOOP-17020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HADOOP-17020: -- Attachment: Screenshot 2020-05-01 at 7.12.06 AM.png > RawFileSystem could localize default block size to avoid sync bottleneck in > config > -- > > Key: HADOOP-17020 > URL: https://issues.apache.org/jira/browse/HADOOP-17020 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Reporter: Rajesh Balamohan >Priority: Minor > Attachments: Screenshot 2020-04-29 at 5.24.53 PM.png, Screenshot > 2020-05-01 at 7.12.06 AM.png > > > RawLocalFileSystem could localize default block size to avoid sync bottleneck > with Configuration object. > https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/RawLocalFileSystem.java#L666 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-17020) RawFileSystem could localize default block size to avoid sync bottleneck in config
[ https://issues.apache.org/jira/browse/HADOOP-17020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HADOOP-17020: -- Description: RawLocalFileSystem could localize default block size to avoid sync bottleneck with Configuration object. https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/RawLocalFileSystem.java#L666 was: RawLocalFileSystem could localize default block size to avoid sync bottleneck with Configuration object. https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/RawLocalFileSystem.java#L666 !Screenshot 2020-04-29 at 5.24.53 PM.png > RawFileSystem could localize default block size to avoid sync bottleneck in > config > -- > > Key: HADOOP-17020 > URL: https://issues.apache.org/jira/browse/HADOOP-17020 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Reporter: Rajesh Balamohan >Priority: Minor > Attachments: Screenshot 2020-04-29 at 5.24.53 PM.png > > > RawLocalFileSystem could localize default block size to avoid sync bottleneck > with Configuration object. > https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/RawLocalFileSystem.java#L666 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-17020) RawFileSystem could localize default block size to avoid sync bottleneck in config
Rajesh Balamohan created HADOOP-17020: - Summary: RawFileSystem could localize default block size to avoid sync bottleneck in config Key: HADOOP-17020 URL: https://issues.apache.org/jira/browse/HADOOP-17020 Project: Hadoop Common Issue Type: Improvement Components: fs Reporter: Rajesh Balamohan Attachments: Screenshot 2020-04-29 at 5.24.53 PM.png RawLocalFileSystem could localize default block size to avoid sync bottleneck with Configuration object. https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/RawLocalFileSystem.java#L666 !Screenshot 2020-04-29 at 5.24.53 PM.png -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-16751) DurationInfo text parsing/formatting should be moved out of hotpath
Rajesh Balamohan created HADOOP-16751: - Summary: DurationInfo text parsing/formatting should be moved out of hotpath Key: HADOOP-16751 URL: https://issues.apache.org/jira/browse/HADOOP-16751 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Reporter: Rajesh Balamohan Attachments: Screenshot 2019-12-09 at 10.32.33 AM.png, image-2019-12-09-10-45-17-351.png {color:#172b4d}It would be good to lazy evaluate the text on need basis.{color} {color:#172b4d}[https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/DurationInfo.java#L68]{color} {color:#172b4d}All pink color in the following diagram are from this codepath.{color} {color:#172b4d}!Screenshot 2019-12-09 at 10.32.33 AM.png|width=1008,height=920!{color} {color:#172b4d}!image-2019-12-09-10-45-17-351.png|width=571,height=373!{color} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16711) With S3Guard + authmode, consider skipping "verifyBuckets" check in S3A fs init()
[ https://issues.apache.org/jira/browse/HADOOP-16711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976966#comment-16976966 ] Rajesh Balamohan commented on HADOOP-16711: --- ORC-570 is specific to lazy init on ORC side and after this fix was tried out. For S3, it would be nice to combine #1, #2. As of now, genuine callers are also impacted by verifyBuckets call. This still does not cover the {{FileSystem::get()}} spinning up lots of FS inits with simultaneous thread accessing at same time (e.g LLAP). > With S3Guard + authmode, consider skipping "verifyBuckets" check in S3A fs > init() > - > > Key: HADOOP-16711 > URL: https://issues.apache.org/jira/browse/HADOOP-16711 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Rajesh Balamohan >Priority: Minor > Labels: performance > Attachments: HADOOP-16711.prelim.1.patch > > > When authoritative mode is enabled with s3guard, it would be good to skip > verifyBuckets call during S3A filesystem init(). This would save call to S3 > during init method. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16711) With S3Guard + authmode, consider skipping "verifyBuckets" check in S3A fs init()
[ https://issues.apache.org/jira/browse/HADOOP-16711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HADOOP-16711: -- Attachment: HADOOP-16711.prelim.1.patch > With S3Guard + authmode, consider skipping "verifyBuckets" check in S3A fs > init() > - > > Key: HADOOP-16711 > URL: https://issues.apache.org/jira/browse/HADOOP-16711 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Rajesh Balamohan >Priority: Minor > Labels: performance > Attachments: HADOOP-16711.prelim.1.patch > > > When authoritative mode is enabled with s3guard, it would be good to skip > verifyBuckets call during S3A filesystem init(). This would save call to S3 > during init method. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-16711) With S3Guard + authmode, consider skipping "verifyBuckets" check in S3A fs init()
Rajesh Balamohan created HADOOP-16711: - Summary: With S3Guard + authmode, consider skipping "verifyBuckets" check in S3A fs init() Key: HADOOP-16711 URL: https://issues.apache.org/jira/browse/HADOOP-16711 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Reporter: Rajesh Balamohan When authoritative mode is enabled with s3guard, it would be good to skip verifyBuckets call during S3A filesystem init(). This would save call to S3 during init method. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-16709) Consider having the ability to turn off TTL in S3Guard + Authoritative mode
Rajesh Balamohan created HADOOP-16709: - Summary: Consider having the ability to turn off TTL in S3Guard + Authoritative mode Key: HADOOP-16709 URL: https://issues.apache.org/jira/browse/HADOOP-16709 Project: Hadoop Common Issue Type: Improvement Components: fs/s3 Reporter: Rajesh Balamohan Authoritative mode has TTL which is set to 15 minutes by default. However, there are cases when we know for sure that the data wouldn't be changed/updated. In certain cases, AppMaster ends up spending good amount of time in getSplits due to TTL expiry. It would be great to have an option to disable TTL (or specify as -1 when TTL shouldn't be checked). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-16648) HDFS Native Client does not build correctly
[ https://issues.apache.org/jira/browse/HADOOP-16648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan resolved HADOOP-16648. --- Resolution: Duplicate Marking this as dup of HDFS-14900 > HDFS Native Client does not build correctly > --- > > Key: HADOOP-16648 > URL: https://issues.apache.org/jira/browse/HADOOP-16648 > Project: Hadoop Common > Issue Type: Sub-task > Components: native >Affects Versions: 3.3.0 >Reporter: Rajesh Balamohan >Priority: Blocker > > Builds are failing in PR with following exception in native client. > {noformat} > [WARNING] make[2]: Leaving directory > '/home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1591/src/hadoop-hdfs-project/hadoop-hdfs-native-client/target' > [WARNING] /opt/cmake/bin/cmake -E cmake_progress_report > /home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1591/src/hadoop-hdfs-project/hadoop-hdfs-native-client/target/CMakeFiles > 2 3 4 5 6 7 8 9 10 11 > [WARNING] [ 28%] Built target common_obj > [WARNING] make[2]: Leaving directory > '/home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1591/src/hadoop-hdfs-project/hadoop-hdfs-native-client/target' > [WARNING] /opt/cmake/bin/cmake -E cmake_progress_report > /home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1591/src/hadoop-hdfs-project/hadoop-hdfs-native-client/target/CMakeFiles > 31 > [WARNING] [ 28%] Built target gmock_main_obj > [WARNING] make[1]: Leaving directory > '/home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1591/src/hadoop-hdfs-project/hadoop-hdfs-native-client/target' > [WARNING] Makefile:127: recipe for target 'all' failed > [WARNING] make[2]: *** No rule to make target > '/home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1591/src/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/proto/PROTOBUF_PROTOC_EXECUTABLE-NOTFOUND', > needed by 'main/native/libhdfspp/lib/proto/ClientNamenodeProtocol.hrpc.inl'. > Stop. > [WARNING] make[1]: *** > [main/native/libhdfspp/lib/proto/CMakeFiles/proto_obj.dir/all] Error 2 > [WARNING] make[1]: *** Waiting for unfinished jobs > [WARNING] make: *** [all] Error 2 > [INFO] > > [INFO] Reactor Summary: > [INFO] > [INFO] Apache Hadoop Main . SUCCESS [ 0.301 > s] > [INFO] Apache Hadoop Build Tools .. SUCCESS [ 1.348 > s] > [INFO] Apache Hadoop Project POM .. SUCCESS [ 0.501 > s] > [INFO] Apache Hadoop Annotations .. SUCCESS [ 1.391 > s] > [INFO] Apache Hadoop Project Dist POM . SUCCESS [ 0.115 > s] > [INFO] Apache Hadoop Assemblies ... SUCCESS [ 0.168 > s] > [INFO] Apache Hadoop Maven Plugins SUCCESS [ 4.490 > s] > [INFO] Apache Hadoop MiniKDC .. SUCCESS [ 2.773 > s] > [INFO] Apache Hadoop Auth . SUCCESS [ 7.922 > s] > [INFO] Apache Hadoop Auth Examples SUCCESS [ 1.381 > s] > [INFO] Apache Hadoop Common ... SUCCESS [ 34.562 > s] > [INFO] Apache Hadoop NFS .. SUCCESS [ 5.583 > s] > [INFO] Apache Hadoop KMS .. SUCCESS [ 5.931 > s] > [INFO] Apache Hadoop Registry . SUCCESS [ 5.816 > s] > [INFO] Apache Hadoop Common Project ... SUCCESS [ 0.056 > s] > [INFO] Apache Hadoop HDFS Client .. SUCCESS [ 27.104 > s] > [INFO] Apache Hadoop HDFS . SUCCESS [ 42.065 > s] > [INFO] Apache Hadoop HDFS Native Client ... FAILURE [ 19.349 > s] > {noformat} > Creating this ticket, as couple of pull requests had the same issue. > e.g > https://builds.apache.org/job/hadoop-multibranch/job/PR-1591/2/artifact/out/patch-compile-root.txt > https://builds.apache.org/job/hadoop-multibranch/job/PR-1614/1/artifact/out/patch-compile-root.txt -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16648) HDFS Native Client does not build correctly
[ https://issues.apache.org/jira/browse/HADOOP-16648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949175#comment-16949175 ] Rajesh Balamohan commented on HADOOP-16648: --- Closing this ticket as HDFS-14900 fixes the issue. Thanks [~ayushtkn], [~ste...@apache.org] > HDFS Native Client does not build correctly > --- > > Key: HADOOP-16648 > URL: https://issues.apache.org/jira/browse/HADOOP-16648 > Project: Hadoop Common > Issue Type: Sub-task > Components: native >Affects Versions: 3.3.0 >Reporter: Rajesh Balamohan >Priority: Blocker > > Builds are failing in PR with following exception in native client. > {noformat} > [WARNING] make[2]: Leaving directory > '/home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1591/src/hadoop-hdfs-project/hadoop-hdfs-native-client/target' > [WARNING] /opt/cmake/bin/cmake -E cmake_progress_report > /home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1591/src/hadoop-hdfs-project/hadoop-hdfs-native-client/target/CMakeFiles > 2 3 4 5 6 7 8 9 10 11 > [WARNING] [ 28%] Built target common_obj > [WARNING] make[2]: Leaving directory > '/home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1591/src/hadoop-hdfs-project/hadoop-hdfs-native-client/target' > [WARNING] /opt/cmake/bin/cmake -E cmake_progress_report > /home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1591/src/hadoop-hdfs-project/hadoop-hdfs-native-client/target/CMakeFiles > 31 > [WARNING] [ 28%] Built target gmock_main_obj > [WARNING] make[1]: Leaving directory > '/home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1591/src/hadoop-hdfs-project/hadoop-hdfs-native-client/target' > [WARNING] Makefile:127: recipe for target 'all' failed > [WARNING] make[2]: *** No rule to make target > '/home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1591/src/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/proto/PROTOBUF_PROTOC_EXECUTABLE-NOTFOUND', > needed by 'main/native/libhdfspp/lib/proto/ClientNamenodeProtocol.hrpc.inl'. > Stop. > [WARNING] make[1]: *** > [main/native/libhdfspp/lib/proto/CMakeFiles/proto_obj.dir/all] Error 2 > [WARNING] make[1]: *** Waiting for unfinished jobs > [WARNING] make: *** [all] Error 2 > [INFO] > > [INFO] Reactor Summary: > [INFO] > [INFO] Apache Hadoop Main . SUCCESS [ 0.301 > s] > [INFO] Apache Hadoop Build Tools .. SUCCESS [ 1.348 > s] > [INFO] Apache Hadoop Project POM .. SUCCESS [ 0.501 > s] > [INFO] Apache Hadoop Annotations .. SUCCESS [ 1.391 > s] > [INFO] Apache Hadoop Project Dist POM . SUCCESS [ 0.115 > s] > [INFO] Apache Hadoop Assemblies ... SUCCESS [ 0.168 > s] > [INFO] Apache Hadoop Maven Plugins SUCCESS [ 4.490 > s] > [INFO] Apache Hadoop MiniKDC .. SUCCESS [ 2.773 > s] > [INFO] Apache Hadoop Auth . SUCCESS [ 7.922 > s] > [INFO] Apache Hadoop Auth Examples SUCCESS [ 1.381 > s] > [INFO] Apache Hadoop Common ... SUCCESS [ 34.562 > s] > [INFO] Apache Hadoop NFS .. SUCCESS [ 5.583 > s] > [INFO] Apache Hadoop KMS .. SUCCESS [ 5.931 > s] > [INFO] Apache Hadoop Registry . SUCCESS [ 5.816 > s] > [INFO] Apache Hadoop Common Project ... SUCCESS [ 0.056 > s] > [INFO] Apache Hadoop HDFS Client .. SUCCESS [ 27.104 > s] > [INFO] Apache Hadoop HDFS . SUCCESS [ 42.065 > s] > [INFO] Apache Hadoop HDFS Native Client ... FAILURE [ 19.349 > s] > {noformat} > Creating this ticket, as couple of pull requests had the same issue. > e.g > https://builds.apache.org/job/hadoop-multibranch/job/PR-1591/2/artifact/out/patch-compile-root.txt > https://builds.apache.org/job/hadoop-multibranch/job/PR-1614/1/artifact/out/patch-compile-root.txt -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16648) HDFS Native Client does not build correctly
[ https://issues.apache.org/jira/browse/HADOOP-16648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16948984#comment-16948984 ] Rajesh Balamohan commented on HADOOP-16648: --- Sure, thanks [~ayushtkn], [~ste...@apache.org]. I will check if the HDFS patch solves the issue. > HDFS Native Client does not build correctly > --- > > Key: HADOOP-16648 > URL: https://issues.apache.org/jira/browse/HADOOP-16648 > Project: Hadoop Common > Issue Type: Sub-task > Components: native >Affects Versions: 3.3.0 >Reporter: Rajesh Balamohan >Priority: Blocker > > Builds are failing in PR with following exception in native client. > {noformat} > [WARNING] make[2]: Leaving directory > '/home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1591/src/hadoop-hdfs-project/hadoop-hdfs-native-client/target' > [WARNING] /opt/cmake/bin/cmake -E cmake_progress_report > /home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1591/src/hadoop-hdfs-project/hadoop-hdfs-native-client/target/CMakeFiles > 2 3 4 5 6 7 8 9 10 11 > [WARNING] [ 28%] Built target common_obj > [WARNING] make[2]: Leaving directory > '/home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1591/src/hadoop-hdfs-project/hadoop-hdfs-native-client/target' > [WARNING] /opt/cmake/bin/cmake -E cmake_progress_report > /home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1591/src/hadoop-hdfs-project/hadoop-hdfs-native-client/target/CMakeFiles > 31 > [WARNING] [ 28%] Built target gmock_main_obj > [WARNING] make[1]: Leaving directory > '/home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1591/src/hadoop-hdfs-project/hadoop-hdfs-native-client/target' > [WARNING] Makefile:127: recipe for target 'all' failed > [WARNING] make[2]: *** No rule to make target > '/home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1591/src/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/proto/PROTOBUF_PROTOC_EXECUTABLE-NOTFOUND', > needed by 'main/native/libhdfspp/lib/proto/ClientNamenodeProtocol.hrpc.inl'. > Stop. > [WARNING] make[1]: *** > [main/native/libhdfspp/lib/proto/CMakeFiles/proto_obj.dir/all] Error 2 > [WARNING] make[1]: *** Waiting for unfinished jobs > [WARNING] make: *** [all] Error 2 > [INFO] > > [INFO] Reactor Summary: > [INFO] > [INFO] Apache Hadoop Main . SUCCESS [ 0.301 > s] > [INFO] Apache Hadoop Build Tools .. SUCCESS [ 1.348 > s] > [INFO] Apache Hadoop Project POM .. SUCCESS [ 0.501 > s] > [INFO] Apache Hadoop Annotations .. SUCCESS [ 1.391 > s] > [INFO] Apache Hadoop Project Dist POM . SUCCESS [ 0.115 > s] > [INFO] Apache Hadoop Assemblies ... SUCCESS [ 0.168 > s] > [INFO] Apache Hadoop Maven Plugins SUCCESS [ 4.490 > s] > [INFO] Apache Hadoop MiniKDC .. SUCCESS [ 2.773 > s] > [INFO] Apache Hadoop Auth . SUCCESS [ 7.922 > s] > [INFO] Apache Hadoop Auth Examples SUCCESS [ 1.381 > s] > [INFO] Apache Hadoop Common ... SUCCESS [ 34.562 > s] > [INFO] Apache Hadoop NFS .. SUCCESS [ 5.583 > s] > [INFO] Apache Hadoop KMS .. SUCCESS [ 5.931 > s] > [INFO] Apache Hadoop Registry . SUCCESS [ 5.816 > s] > [INFO] Apache Hadoop Common Project ... SUCCESS [ 0.056 > s] > [INFO] Apache Hadoop HDFS Client .. SUCCESS [ 27.104 > s] > [INFO] Apache Hadoop HDFS . SUCCESS [ 42.065 > s] > [INFO] Apache Hadoop HDFS Native Client ... FAILURE [ 19.349 > s] > {noformat} > Creating this ticket, as couple of pull requests had the same issue. > e.g > https://builds.apache.org/job/hadoop-multibranch/job/PR-1591/2/artifact/out/patch-compile-root.txt > https://builds.apache.org/job/hadoop-multibranch/job/PR-1614/1/artifact/out/patch-compile-root.txt -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16648) HDFS Native Client does not build correctly
[ https://issues.apache.org/jira/browse/HADOOP-16648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HADOOP-16648: -- Affects Version/s: 3.3.0 > HDFS Native Client does not build correctly > --- > > Key: HADOOP-16648 > URL: https://issues.apache.org/jira/browse/HADOOP-16648 > Project: Hadoop Common > Issue Type: Sub-task > Components: native >Affects Versions: 3.3.0 >Reporter: Rajesh Balamohan >Priority: Blocker > > Builds are failing in PR with following exception in native client. > {noformat} > [WARNING] make[2]: Leaving directory > '/home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1591/src/hadoop-hdfs-project/hadoop-hdfs-native-client/target' > [WARNING] /opt/cmake/bin/cmake -E cmake_progress_report > /home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1591/src/hadoop-hdfs-project/hadoop-hdfs-native-client/target/CMakeFiles > 2 3 4 5 6 7 8 9 10 11 > [WARNING] [ 28%] Built target common_obj > [WARNING] make[2]: Leaving directory > '/home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1591/src/hadoop-hdfs-project/hadoop-hdfs-native-client/target' > [WARNING] /opt/cmake/bin/cmake -E cmake_progress_report > /home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1591/src/hadoop-hdfs-project/hadoop-hdfs-native-client/target/CMakeFiles > 31 > [WARNING] [ 28%] Built target gmock_main_obj > [WARNING] make[1]: Leaving directory > '/home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1591/src/hadoop-hdfs-project/hadoop-hdfs-native-client/target' > [WARNING] Makefile:127: recipe for target 'all' failed > [WARNING] make[2]: *** No rule to make target > '/home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1591/src/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/proto/PROTOBUF_PROTOC_EXECUTABLE-NOTFOUND', > needed by 'main/native/libhdfspp/lib/proto/ClientNamenodeProtocol.hrpc.inl'. > Stop. > [WARNING] make[1]: *** > [main/native/libhdfspp/lib/proto/CMakeFiles/proto_obj.dir/all] Error 2 > [WARNING] make[1]: *** Waiting for unfinished jobs > [WARNING] make: *** [all] Error 2 > [INFO] > > [INFO] Reactor Summary: > [INFO] > [INFO] Apache Hadoop Main . SUCCESS [ 0.301 > s] > [INFO] Apache Hadoop Build Tools .. SUCCESS [ 1.348 > s] > [INFO] Apache Hadoop Project POM .. SUCCESS [ 0.501 > s] > [INFO] Apache Hadoop Annotations .. SUCCESS [ 1.391 > s] > [INFO] Apache Hadoop Project Dist POM . SUCCESS [ 0.115 > s] > [INFO] Apache Hadoop Assemblies ... SUCCESS [ 0.168 > s] > [INFO] Apache Hadoop Maven Plugins SUCCESS [ 4.490 > s] > [INFO] Apache Hadoop MiniKDC .. SUCCESS [ 2.773 > s] > [INFO] Apache Hadoop Auth . SUCCESS [ 7.922 > s] > [INFO] Apache Hadoop Auth Examples SUCCESS [ 1.381 > s] > [INFO] Apache Hadoop Common ... SUCCESS [ 34.562 > s] > [INFO] Apache Hadoop NFS .. SUCCESS [ 5.583 > s] > [INFO] Apache Hadoop KMS .. SUCCESS [ 5.931 > s] > [INFO] Apache Hadoop Registry . SUCCESS [ 5.816 > s] > [INFO] Apache Hadoop Common Project ... SUCCESS [ 0.056 > s] > [INFO] Apache Hadoop HDFS Client .. SUCCESS [ 27.104 > s] > [INFO] Apache Hadoop HDFS . SUCCESS [ 42.065 > s] > [INFO] Apache Hadoop HDFS Native Client ... FAILURE [ 19.349 > s] > {noformat} > Creating this ticket, as couple of pull requests had the same issue. > e.g > https://builds.apache.org/job/hadoop-multibranch/job/PR-1591/2/artifact/out/patch-compile-root.txt > https://builds.apache.org/job/hadoop-multibranch/job/PR-1614/1/artifact/out/patch-compile-root.txt -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-16648) HDFS Native Client does not build correctly
Rajesh Balamohan created HADOOP-16648: - Summary: HDFS Native Client does not build correctly Key: HADOOP-16648 URL: https://issues.apache.org/jira/browse/HADOOP-16648 Project: Hadoop Common Issue Type: Sub-task Components: native Reporter: Rajesh Balamohan Builds are failing in PR with following exception in native client. {noformat} [WARNING] make[2]: Leaving directory '/home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1591/src/hadoop-hdfs-project/hadoop-hdfs-native-client/target' [WARNING] /opt/cmake/bin/cmake -E cmake_progress_report /home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1591/src/hadoop-hdfs-project/hadoop-hdfs-native-client/target/CMakeFiles 2 3 4 5 6 7 8 9 10 11 [WARNING] [ 28%] Built target common_obj [WARNING] make[2]: Leaving directory '/home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1591/src/hadoop-hdfs-project/hadoop-hdfs-native-client/target' [WARNING] /opt/cmake/bin/cmake -E cmake_progress_report /home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1591/src/hadoop-hdfs-project/hadoop-hdfs-native-client/target/CMakeFiles 31 [WARNING] [ 28%] Built target gmock_main_obj [WARNING] make[1]: Leaving directory '/home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1591/src/hadoop-hdfs-project/hadoop-hdfs-native-client/target' [WARNING] Makefile:127: recipe for target 'all' failed [WARNING] make[2]: *** No rule to make target '/home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1591/src/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/proto/PROTOBUF_PROTOC_EXECUTABLE-NOTFOUND', needed by 'main/native/libhdfspp/lib/proto/ClientNamenodeProtocol.hrpc.inl'. Stop. [WARNING] make[1]: *** [main/native/libhdfspp/lib/proto/CMakeFiles/proto_obj.dir/all] Error 2 [WARNING] make[1]: *** Waiting for unfinished jobs [WARNING] make: *** [all] Error 2 [INFO] [INFO] Reactor Summary: [INFO] [INFO] Apache Hadoop Main . SUCCESS [ 0.301 s] [INFO] Apache Hadoop Build Tools .. SUCCESS [ 1.348 s] [INFO] Apache Hadoop Project POM .. SUCCESS [ 0.501 s] [INFO] Apache Hadoop Annotations .. SUCCESS [ 1.391 s] [INFO] Apache Hadoop Project Dist POM . SUCCESS [ 0.115 s] [INFO] Apache Hadoop Assemblies ... SUCCESS [ 0.168 s] [INFO] Apache Hadoop Maven Plugins SUCCESS [ 4.490 s] [INFO] Apache Hadoop MiniKDC .. SUCCESS [ 2.773 s] [INFO] Apache Hadoop Auth . SUCCESS [ 7.922 s] [INFO] Apache Hadoop Auth Examples SUCCESS [ 1.381 s] [INFO] Apache Hadoop Common ... SUCCESS [ 34.562 s] [INFO] Apache Hadoop NFS .. SUCCESS [ 5.583 s] [INFO] Apache Hadoop KMS .. SUCCESS [ 5.931 s] [INFO] Apache Hadoop Registry . SUCCESS [ 5.816 s] [INFO] Apache Hadoop Common Project ... SUCCESS [ 0.056 s] [INFO] Apache Hadoop HDFS Client .. SUCCESS [ 27.104 s] [INFO] Apache Hadoop HDFS . SUCCESS [ 42.065 s] [INFO] Apache Hadoop HDFS Native Client ... FAILURE [ 19.349 s] {noformat} Creating this ticket, as couple of pull requests had the same issue. e.g https://builds.apache.org/job/hadoop-multibranch/job/PR-1591/2/artifact/out/patch-compile-root.txt https://builds.apache.org/job/hadoop-multibranch/job/PR-1614/1/artifact/out/patch-compile-root.txt -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16604) Provide copy functionality for cloud native applications
[ https://issues.apache.org/jira/browse/HADOOP-16604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16944262#comment-16944262 ] Rajesh Balamohan commented on HADOOP-16604: --- Thanks for sharing the details [~ste...@apache.org] . Initial step is to enable copyFile(URI, URI) for S3AFileSystem. I have created a subtask for this. This is for copying single file to destination and higher level apps can make use of the API to parallelize copy. In next iterations, we can provide parallel copy option within FS itself. > Provide copy functionality for cloud native applications > > > Key: HADOOP-16604 > URL: https://issues.apache.org/jira/browse/HADOOP-16604 > Project: Hadoop Common > Issue Type: Improvement > Components: fs, fs/azure, fs/s3 >Affects Versions: 3.2.1 >Reporter: Rajesh Balamohan >Priority: Major > > Lot of cloud native systems provide out of the box and optimized copy > functionality within their system. They avoid bringing data over to the > client and write back to the destination. > It would be good to have a cloud native interface, which can be implemented > by the cloud connectors to provide (e.g {{copy(URI srcFile, URI destFile)}}) > This would be helpful for applications which make use of these connectors and > enhance copy performance within cloud. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16629) support copyFile in s3afilesystem
[ https://issues.apache.org/jira/browse/HADOOP-16629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HADOOP-16629: -- Affects Version/s: 3.2.1 > support copyFile in s3afilesystem > - > > Key: HADOOP-16629 > URL: https://issues.apache.org/jira/browse/HADOOP-16629 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.2.1 >Reporter: Rajesh Balamohan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-16629) support copyFile in s3afilesystem
Rajesh Balamohan created HADOOP-16629: - Summary: support copyFile in s3afilesystem Key: HADOOP-16629 URL: https://issues.apache.org/jira/browse/HADOOP-16629 Project: Hadoop Common Issue Type: Sub-task Reporter: Rajesh Balamohan -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16629) support copyFile in s3afilesystem
[ https://issues.apache.org/jira/browse/HADOOP-16629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HADOOP-16629: -- Component/s: fs/s3 > support copyFile in s3afilesystem > - > > Key: HADOOP-16629 > URL: https://issues.apache.org/jira/browse/HADOOP-16629 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Rajesh Balamohan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-16604) Provide copy functionality for cloud native applications
Rajesh Balamohan created HADOOP-16604: - Summary: Provide copy functionality for cloud native applications Key: HADOOP-16604 URL: https://issues.apache.org/jira/browse/HADOOP-16604 Project: Hadoop Common Issue Type: Improvement Components: fs Reporter: Rajesh Balamohan Lot of cloud native systems provide out of the box and optimized copy functionality within their system. They avoid bringing data over to the client and write back to the destination. It would be good to have a cloud native interface, which can be implemented by the cloud connectors to provide (e.g {{copy(URI srcFile, URI destFile)}}) This would be helpful for applications which make use of these connectors and enhance copy performance within cloud. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15042) Azure::PageBlobInputStream::skip can return -ve value when numberOfPagesRemaining is 0
[ https://issues.apache.org/jira/browse/HADOOP-15042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HADOOP-15042: -- Assignee: Rajesh Balamohan Status: Patch Available (was: Open) > Azure::PageBlobInputStream::skip can return -ve value when > numberOfPagesRemaining is 0 > -- > > Key: HADOOP-15042 > URL: https://issues.apache.org/jira/browse/HADOOP-15042 > Project: Hadoop Common > Issue Type: Bug > Components: fs/azure >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Attachments: HADOOP-15042.001.patch > > > {{PageBlobInputStream::skip-->skipImpl}} returns negative values when > {{numberOfPagesRemaining=0}}. This can cause wrong position to be set in > NativeAzureFileSystem::seek() and can lead to errors. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-15042) Azure::PageBlobInputStream::skip can return -ve value when numberOfPagesRemaining is 0
[ https://issues.apache.org/jira/browse/HADOOP-15042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16254718#comment-16254718 ] Rajesh Balamohan edited comment on HADOOP-15042 at 11/16/17 4:04 AM: - {noformat} from hadoop-azure dir. mvn test -Dtest=Test\*,ITest\* Results : Tests run: 843, Failures: 0, Errors: 0, Skipped: 117 {noformat} WASB Region for the test: East Asia was (Author: rajesh.balamohan): {noformat} mvn test -Dtest=Test\*,ITest\* Results : Tests run: 843, Failures: 0, Errors: 0, Skipped: 117 {noformat} WASB Region for the test: East Asia > Azure::PageBlobInputStream::skip can return -ve value when > numberOfPagesRemaining is 0 > -- > > Key: HADOOP-15042 > URL: https://issues.apache.org/jira/browse/HADOOP-15042 > Project: Hadoop Common > Issue Type: Bug > Components: fs/azure >Reporter: Rajesh Balamohan >Priority: Minor > Attachments: HADOOP-15042.001.patch > > > {{PageBlobInputStream::skip-->skipImpl}} returns negative values when > {{numberOfPagesRemaining=0}}. This can cause wrong position to be set in > NativeAzureFileSystem::seek() and can lead to errors. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15042) Azure::PageBlobInputStream::skip can return -ve value when numberOfPagesRemaining is 0
[ https://issues.apache.org/jira/browse/HADOOP-15042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HADOOP-15042: -- Attachment: HADOOP-15042.001.patch {noformat} mvn test -Dtest=Test\*,ITest\* Results : Tests run: 843, Failures: 0, Errors: 0, Skipped: 117 {noformat} WASB Region for the test: East Asia > Azure::PageBlobInputStream::skip can return -ve value when > numberOfPagesRemaining is 0 > -- > > Key: HADOOP-15042 > URL: https://issues.apache.org/jira/browse/HADOOP-15042 > Project: Hadoop Common > Issue Type: Bug > Components: fs/azure >Reporter: Rajesh Balamohan >Priority: Minor > Attachments: HADOOP-15042.001.patch > > > {{PageBlobInputStream::skip-->skipImpl}} returns negative values when > {{numberOfPagesRemaining=0}}. This can cause wrong position to be set in > NativeAzureFileSystem::seek() and can lead to errors. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15042) Azure::PageBlobInputStream::skip can return -ve value when numberOfPagesRemaining is 0
Rajesh Balamohan created HADOOP-15042: - Summary: Azure::PageBlobInputStream::skip can return -ve value when numberOfPagesRemaining is 0 Key: HADOOP-15042 URL: https://issues.apache.org/jira/browse/HADOOP-15042 Project: Hadoop Common Issue Type: Bug Components: fs/azure Reporter: Rajesh Balamohan Priority: Minor {{PageBlobInputStream::skip-->skipImpl}} returns negative values when {{numberOfPagesRemaining=0}}. This can cause wrong position to be set in NativeAzureFileSystem::seek() and can lead to errors. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-14988) WASB: Expose WASB status metrics as counters in Hadoop
Rajesh Balamohan created HADOOP-14988: - Summary: WASB: Expose WASB status metrics as counters in Hadoop Key: HADOOP-14988 URL: https://issues.apache.org/jira/browse/HADOOP-14988 Project: Hadoop Common Issue Type: Bug Components: fs/azure Reporter: Rajesh Balamohan Priority: Minor It would be good to expose WASB status metrics (e.g 503) as Hadoop counters. Here is an example from a spark job, where it ends up spending large amount of time in retries. Adding hadoop counters would help in analyzing and tuning long running tasks. {noformat} 2017-10-23 23:07:20,876 DEBUG [Executor task launch worker for task 2463] azure.SelfThrottlingIntercept: SelfThrottlingIntercept:: SendingRequest: threadId=99, requestType=read , isFirstRequest=false, sleepDuration=0 2017-10-23 23:07:20,877 DEBUG [Executor task launch worker for task 2463] azure.SelfThrottlingIntercept: SelfThrottlingIntercept:: ResponseReceived: threadId=99, Status=503, Elapsed(ms)=1, ETAG=null, contentLength=198, requestMethod=GET 2017-10-23 23:07:21,877 DEBUG [Executor task launch worker for task 2463] azure.SelfThrottlingIntercept: SelfThrottlingIntercept:: SendingRequest: threadId=99, requestType=read , isFirstRequest=false, sleepDuration=0 2017-10-23 23:07:21,879 DEBUG [Executor task launch worker for task 2463] azure.SelfThrottlingIntercept: SelfThrottlingIntercept:: ResponseReceived: threadId=99, Status=503, Elapsed(ms)=2, ETAG=null, contentLength=198, requestMethod=GET 2017-10-23 23:07:24,070 DEBUG [Executor task launch worker for task 2463] azure.SelfThrottlingIntercept: SelfThrottlingIntercept:: SendingRequest: threadId=99, requestType=read , isFirstRequest=false, sleepDuration=0 2017-10-23 23:07:24,073 DEBUG [Executor task launch worker for task 2463] azure.SelfThrottlingIntercept: q:: ResponseReceived: threadId=99, Status=503, Elapsed(ms)=3, ETAG=null, contentLength=198, requestMethod=GET 2017-10-23 23:07:27,917 DEBUG [Executor task launch worker for task 2463] azure.SelfThrottlingIntercept: SelfThrottlingIntercept:: SendingRequest: threadId=99, requestType=read , isFirstRequest=false, sleepDuration=0 2017-10-23 23:07:27,920 DEBUG [Executor task launch worker for task 2463] azure.SelfThrottlingIntercept: SelfThrottlingIntercept:: ResponseReceived: threadId=99, Status=503, Elapsed(ms)=2, ETAG=null, contentLength=198, requestMethod=GET 2017-10-23 23:07:36,879 DEBUG [Executor task launch worker for task 2463] azure.SelfThrottlingIntercept: SelfThrottlingIntercept:: SendingRequest: threadId=99, requestType=read , isFirstRequest=false, sleepDuration=0 2017-10-23 23:07:36,881 DEBUG [Executor task launch worker for task 2463] azure.SelfThrottlingIntercept: SelfThrottlingIntercept:: ResponseReceived: threadId=99, Status=503, Elapsed(ms)=1, ETAG=null, contentLength=198, requestMethod=GET 2017-10-23 23:07:54,786 DEBUG [Executor task launch worker for task 2463] azure.SelfThrottlingIntercept: SelfThrottlingIntercept:: SendingRequest: threadId=99, requestType=read , isFirstRequest=false, sleepDuration=0 2017-10-23 23:07:54,789 DEBUG [Executor task launch worker for task 2463] azure.SelfThrottlingIntercept: SelfThrottlingIntercept:: ResponseReceived: threadId=99, Status=503, Elapsed(ms)=3, ETAG=null, contentLength=198, requestMethod=GET 2017-10-23 23:08:24,790 DEBUG [Executor task launch worker for task 2463] azure.SelfThrottlingIntercept: SelfThrottlingIntercept:: SendingRequest: threadId=99, requestType=read , isFirstRequest=false, sleepDuration=0 2017-10-23 23:08:24,794 DEBUG [Executor task launch worker for task 2463] azure.SelfThrottlingIntercept: SelfThrottlingIntercept:: ResponseReceived: threadId=99, Status=503, Elapsed(ms)=4, ETAG=null, contentLength=198, requestMethod=GET 2017-10-23 23:08:54,794 DEBUG [Executor task launch worker for task 2463] azure.SelfThrottlingIntercept: SelfThrottlingIntercept:: SendingRequest: threadId=99, requestType=read , isFirstRequest=false, sleepDuration=0 {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14965) s3a input stream "normal" fadvise mode to be adaptive
[ https://issues.apache.org/jira/browse/HADOOP-14965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16214478#comment-16214478 ] Rajesh Balamohan commented on HADOOP-14965: --- It would be good to tune itself based on the seek access patterns. > s3a input stream "normal" fadvise mode to be adaptive > - > > Key: HADOOP-14965 > URL: https://issues.apache.org/jira/browse/HADOOP-14965 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Steve Loughran > > HADOOP-14535 added seek optimisation to wasb, but rather than require the > caller to declare sequential vs random, it works out for itself. > # defaults to sequential, lazy seek > # if the caller ever seeks backwards, switches to random IO. > This means that on the use pattern of columnar stores: of go to end of file, > read summary, then go to columns and work forwards, will switch to random IO > after that first seek back (cost: one aborted HTTP connection)/. > Where this should benefit the most is in downstream apps where you are > working with different data sources in the same object store/running of the > same app config, but have different read patterns. I'm seeing exactly this in > some of my spark tests, where it's near impossible to set things up so that > .gz files are read sequentially, but ORC data is read in random IO > I propose the "normal" fadvise => adaptive, sequential==sequential always, > random => random from the outset. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14680) Azure: IndexOutOfBoundsException in BlockBlobInputStream
[ https://issues.apache.org/jira/browse/HADOOP-14680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16100923#comment-16100923 ] Rajesh Balamohan commented on HADOOP-14680: --- Thanks for the patch [~tmarquardt]. Patch lgtm (non-binding). I tried out the patch on multi-node cluster and it works fine. > Azure: IndexOutOfBoundsException in BlockBlobInputStream > > > Key: HADOOP-14680 > URL: https://issues.apache.org/jira/browse/HADOOP-14680 > Project: Hadoop Common > Issue Type: Bug > Components: fs/azure >Reporter: Rajesh Balamohan >Assignee: Thomas Marquardt >Priority: Minor > Attachments: HADOOP-14680-001.patch > > > https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azure/BlockBlobInputStream.java#L361 > On certain conditions, BlockBlobInputStream can throw > IndexOutOfBoundsException. Following is an example > {{length:297898, offset:4194304, buf.len:4492202, writePos:4194304}} : > In this case, {{MemoryOutputStream::capacity()}} would end up returning > negative value and can cause {{IndexOutOfBoundsException}} > It should be {{return buffer.length - offset;}} to determine current capacity. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-14680) Azure: IndexOutOfBoundsException in BlockBlobInputStream
Rajesh Balamohan created HADOOP-14680: - Summary: Azure: IndexOutOfBoundsException in BlockBlobInputStream Key: HADOOP-14680 URL: https://issues.apache.org/jira/browse/HADOOP-14680 Project: Hadoop Common Issue Type: Bug Components: fs/azure Reporter: Rajesh Balamohan Priority: Minor https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azure/BlockBlobInputStream.java#L361 On certain conditions, BlockBlobInputStream can throw IndexOutOfBoundsException. Following is an example {{length:297898, offset:4194304, buf.len:4492202, writePos:4194304}} : In this case, {{MemoryOutputStream::capacity()}} would end up returning negative value and can cause {{IndexOutOfBoundsException}} It should be {{return buffer.length - offset;}} to determine current capacity. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Assigned] (HADOOP-11572) s3a delete() operation fails during a concurrent delete of child entries
[ https://issues.apache.org/jira/browse/HADOOP-11572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan reassigned HADOOP-11572: - Assignee: Steve Loughran (was: Rajesh Balamohan) > s3a delete() operation fails during a concurrent delete of child entries > > > Key: HADOOP-11572 > URL: https://issues.apache.org/jira/browse/HADOOP-11572 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.6.0 >Reporter: Steve Loughran >Assignee: Steve Loughran > Fix For: 2.9.0, 3.0.0-alpha4 > > Attachments: HADOOP-11572-001.patch, HADOOP-11572-branch-2-002.patch, > HADOOP-11572-branch-2-003.patch > > > Reviewing the code, s3a has the problem raised in HADOOP-6688: deletion of a > child entry during a recursive directory delete is propagated as an > exception, rather than ignored as a detail which idempotent operations should > just ignore. > the exception should be caught and, if a file not found problem, logged > rather than propagated -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Assigned] (HADOOP-11572) s3a delete() operation fails during a concurrent delete of child entries
[ https://issues.apache.org/jira/browse/HADOOP-11572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan reassigned HADOOP-11572: - Assignee: Rajesh Balamohan (was: Steve Loughran) > s3a delete() operation fails during a concurrent delete of child entries > > > Key: HADOOP-11572 > URL: https://issues.apache.org/jira/browse/HADOOP-11572 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.6.0 >Reporter: Steve Loughran >Assignee: Rajesh Balamohan > Fix For: 2.9.0, 3.0.0-alpha4 > > Attachments: HADOOP-11572-001.patch, HADOOP-11572-branch-2-002.patch, > HADOOP-11572-branch-2-003.patch > > > Reviewing the code, s3a has the problem raised in HADOOP-6688: deletion of a > child entry during a recursive directory delete is propagated as an > exception, rather than ignored as a detail which idempotent operations should > just ignore. > the exception should be caught and, if a file not found problem, logged > rather than propagated -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-14612) Reduce memory copy in BlobOutputStreamInternal::dispatchWrite
Rajesh Balamohan created HADOOP-14612: - Summary: Reduce memory copy in BlobOutputStreamInternal::dispatchWrite Key: HADOOP-14612 URL: https://issues.apache.org/jira/browse/HADOOP-14612 Project: Hadoop Common Issue Type: Sub-task Components: fs/azure Reporter: Rajesh Balamohan Priority: Minor Currently in {{BlobOutputStreamInternal::dispatchWrite}}, buffer is copied internally for every write. During large uploads this can be around 4 MB. This can be avoided if there is internal class which extends ByteArrayOutputStream with additional method "ByteArrayInputStream getInputStream()". -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14596) AWS SDK 1.11+ aborts() on close() if > 0 bytes in stream; logs error
[ https://issues.apache.org/jira/browse/HADOOP-14596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16066377#comment-16066377 ] Rajesh Balamohan commented on HADOOP-14596: --- Thanks for sharing the patch [~ste...@apache.org]. {{skip}} sometimes may not skip to the intended position; should we check for the return value of {{skip()}} and repeat until {{remaining}} is reached? > AWS SDK 1.11+ aborts() on close() if > 0 bytes in stream; logs error > > > Key: HADOOP-14596 > URL: https://issues.apache.org/jira/browse/HADOOP-14596 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Steve Loughran >Priority: Minor > Attachments: HADOOP-14596-001.patch, testlog.txt > > > The latest SDK now tells us off when we do a seek() by aborting the TCP stream > {code} > - Not all bytes were read from the S3ObjectInputStream, aborting HTTP > connection. This is likely an error and may result in sub-optimal behavior. > Request only the bytes you need via a ranged GET or drain the input stream > after use. > 2017-06-27 15:47:35,789 [ScalaTest-main-running-S3ACSVReadSuite] WARN > internal.S3AbortableInputStream (S3AbortableInputStream.java:close(163)) - > Not all bytes were read from the S3ObjectInputStream, aborting HTTP > connection. This is likely an error and may result in sub-optimal behavior. > Request only the bytes you need via a ranged GET or drain the input stream > after use. > 2017-06-27 15:47:37,409 [ScalaTest-main-running-S3ACSVReadSuite] WARN > internal.S3AbortableInputStream (S3AbortableInputStream.java:close(163)) - > Not all bytes were read from the S3ObjectInputStream, aborting HTTP > connection. This is likely an error and may result in sub-optimal behavior. > Request only the bytes you need via a ranged GET or drain the input stream > after use. > 2017-06-27 15:47:39,003 [ScalaTest-main-running-S3ACSVReadSuite] WARN > internal.S3AbortableInputStream (S3AbortableInputStream.java:close(163)) - > Not all bytes were read from the S3ObjectInputStream, aborting HTTP > connection. This is likely an error and may result in sub-optimal behavior. > Request only the bytes you need via a ranged GET or drain the input stream > after use. > 2017-06-27 15:47:40,627 [ScalaTest-main-running-S3ACSVReadSuite] WARN > internal.S3AbortableInputStream (S3AbortableInputStream.java:close(163)) - > Not all bytes were read from the S3ObjectInputStream, aborting HTTP > connection. This is likely an error and may result in sub-optimal behavior. > Request only the bytes you need via a ranged GET or drain the input stream > after use. > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14500) Azure: TestFileSystemOperationExceptionHandling{,MultiThreaded} fails
[ https://issues.apache.org/jira/browse/HADOOP-14500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16041527#comment-16041527 ] Rajesh Balamohan commented on HADOOP-14500: --- It was simulation related I believe. When earlier patch results were submitted “fs.contract.test.fs.wasb” was not added IIRC. So some of the tests were getting ignored as mentioedn in. For the results posted here, I added all the 3 parameters mentioned in “fs.azure.test.account.name”, “fs.azure.account.key.{ACCOUNTNAME}.blob.core.windows.net” and “fs.contract.test.fs.wasb”. Before HADOOP-14478, seek() was closing the inputstream explicitly and re-opening it. With HADOOP-14478, it is not expected to do so. So seek need not throw FNFE; however would throw this error in subsequent read. > Azure: TestFileSystemOperationExceptionHandling{,MultiThreaded} fails > - > > Key: HADOOP-14500 > URL: https://issues.apache.org/jira/browse/HADOOP-14500 > Project: Hadoop Common > Issue Type: Bug > Components: fs/azure, test >Reporter: Mingliang Liu >Assignee: Rajesh Balamohan > Attachments: HADOOP-14500-001.patch > > > The following test fails: > {code} > TestFileSystemOperationExceptionHandling.testSingleThreadBlockBlobSeekScenario > Expected exception: java.io.FileNotFoundException > TestFileSystemOperationsExceptionHandlingMultiThreaded.testMultiThreadBlockBlobSeekScenario > Expected exception: java.io.FileNotFoundException > {code} > I did early analysis and found [HADOOP-14478] maybe the reason. I think we > can fix the test itself here. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14500) Azure: TestFileSystemOperationExceptionHandling{,MultiThreaded} fails
[ https://issues.apache.org/jira/browse/HADOOP-14500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HADOOP-14500: -- Status: Patch Available (was: Open) > Azure: TestFileSystemOperationExceptionHandling{,MultiThreaded} fails > - > > Key: HADOOP-14500 > URL: https://issues.apache.org/jira/browse/HADOOP-14500 > Project: Hadoop Common > Issue Type: Bug > Components: fs/azure, test >Reporter: Mingliang Liu >Assignee: Rajesh Balamohan > Attachments: HADOOP-14500-001.patch > > > The following test fails: > {code} > TestFileSystemOperationExceptionHandling.testSingleThreadBlockBlobSeekScenario > Expected exception: java.io.FileNotFoundException > TestFileSystemOperationsExceptionHandlingMultiThreaded.testMultiThreadBlockBlobSeekScenario > Expected exception: java.io.FileNotFoundException > {code} > I did early analysis and found [HADOOP-14478] maybe the reason. I think we > can fix the test itself here. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14500) Azure: TestFileSystemOperationExceptionHandling{,MultiThreaded} fails
[ https://issues.apache.org/jira/browse/HADOOP-14500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HADOOP-14500: -- Attachment: HADOOP-14500-001.patch Tests were executed against WASB/Japan region. {noformat} Running org.apache.hadoop.fs.azure.TestAzureConcurrentOutOfBandIo Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.838 sec - in org.apache.hadoop.fs.azure.TestAzureConcurrentOutOfBandIo Running org.apache.hadoop.fs.azure.TestAzureFileSystemErrorConditions Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.786 sec - in org.apache.hadoop.fs.azure.TestAzureFileSystemErrorConditions Results : Failed tests: TestNativeAzureFileSystemMocked>NativeAzureFileSystemBaseTest.testFolderLastModifiedTime:649 null Tests run: 703, Failures: 1, Errors: 0, Skipped: 119 {noformat} Test case failure is not related to the patch. {noformat} testFolderLastModifiedTime(org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked) Time elapsed: 15.023 sec <<< FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertFalse(Assert.java:64) at org.junit.Assert.assertFalse(Assert.java:74) at org.apache.hadoop.fs.azure.NativeAzureFileSystemBaseTest.testFolderLastModifiedTime(NativeAzureFileSystemBaseTest.java:649) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103) {noformat} > Azure: TestFileSystemOperationExceptionHandling{,MultiThreaded} fails > - > > Key: HADOOP-14500 > URL: https://issues.apache.org/jira/browse/HADOOP-14500 > Project: Hadoop Common > Issue Type: Bug > Components: fs/azure, test >Reporter: Mingliang Liu >Assignee: Rajesh Balamohan > Attachments: HADOOP-14500-001.patch > > > The following test fails: > {code} > TestFileSystemOperationExceptionHandling.testSingleThreadBlockBlobSeekScenario > Expected exception: java.io.FileNotFoundException > TestFileSystemOperationsExceptionHandlingMultiThreaded.testMultiThreadBlockBlobSeekScenario > Expected exception: java.io.FileNotFoundException > {code} > I did early analysis and found [HADOOP-14478] maybe the reason. I think we > can fix the test itself here. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Assigned] (HADOOP-14500) Azure: TestFileSystemOperationExceptionHandling{,MultiThreaded} fails
[ https://issues.apache.org/jira/browse/HADOOP-14500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan reassigned HADOOP-14500: - Assignee: Rajesh Balamohan > Azure: TestFileSystemOperationExceptionHandling{,MultiThreaded} fails > - > > Key: HADOOP-14500 > URL: https://issues.apache.org/jira/browse/HADOOP-14500 > Project: Hadoop Common > Issue Type: Bug > Components: fs/azure, test >Reporter: Mingliang Liu >Assignee: Rajesh Balamohan > > The following test fails: > {code} > TestFileSystemOperationExceptionHandling.testSingleThreadBlockBlobSeekScenario > Expected exception: java.io.FileNotFoundException > TestFileSystemOperationsExceptionHandlingMultiThreaded.testMultiThreadBlockBlobSeekScenario > Expected exception: java.io.FileNotFoundException > {code} > I did early analysis and found [HADOOP-14478] maybe the reason. I think we > can fix the test itself here. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14478) Optimize NativeAzureFsInputStream for positional reads
[ https://issues.apache.org/jira/browse/HADOOP-14478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037914#comment-16037914 ] Rajesh Balamohan commented on HADOOP-14478: --- [~liuml07] - Perf improvement would be observed when {{BlobInputStream}} is fixed. Thanks for creating HADOOP-14490. > Optimize NativeAzureFsInputStream for positional reads > -- > > Key: HADOOP-14478 > URL: https://issues.apache.org/jira/browse/HADOOP-14478 > Project: Hadoop Common > Issue Type: Bug > Components: fs/azure >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan > Fix For: 2.9.0, 3.0.0-alpha4 > > Attachments: HADOOP-14478.001.patch, HADOOP-14478.002.patch, > HADOOP-14478.003.patch > > > Azure's {{BlobbInputStream}} internally buffers 4 MB of data irrespective of > the data length requested for. This would be beneficial for sequential reads. > However, for positional reads (seek to specific location, read x number of > bytes, seek back to original location) this may not be beneficial and might > even download lot more data which are not used later. > It would be good to override {{readFully(long position, byte[] buffer, int > offset, int length)}} for {{NativeAzureFsInputStream}} and make use of > {{mark(readLimit)}} as a hint to Azure's BlobInputStream. > BlobInputStream reference: > https://github.com/Azure/azure-storage-java/blob/master/microsoft-azure-storage/src/com/microsoft/azure/storage/blob/BlobInputStream.java#L448 > BlobInputStream can consider this as a hint later to determine the amount of > data to be read ahead. Changes to BlobInputStream would not be addressed in > this JIRA. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14473) Optimize NativeAzureFileSystem::seek for forward seeks
[ https://issues.apache.org/jira/browse/HADOOP-14473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HADOOP-14473: -- Resolution: Resolved Status: Resolved (was: Patch Available) Closing this ticket, since HADOOP-14478 takes care of this. > Optimize NativeAzureFileSystem::seek for forward seeks > -- > > Key: HADOOP-14473 > URL: https://issues.apache.org/jira/browse/HADOOP-14473 > Project: Hadoop Common > Issue Type: Bug > Components: fs/azure >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan > Attachments: HADOOP-14473-001.patch > > > {{NativeAzureFileSystem::seek()}} closes and re-opens the inputstream > irrespective of forward/backward seek. It would be beneficial to re-open the > stream on backward seek. > https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azure/NativeAzureFileSystem.java#L889 -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14478) Optimize NativeAzureFsInputStream for positional reads
[ https://issues.apache.org/jira/browse/HADOOP-14478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037890#comment-16037890 ] Rajesh Balamohan commented on HADOOP-14478: --- Thanks [~liuml07], [~ste...@apache.org]. > Optimize NativeAzureFsInputStream for positional reads > -- > > Key: HADOOP-14478 > URL: https://issues.apache.org/jira/browse/HADOOP-14478 > Project: Hadoop Common > Issue Type: Bug > Components: fs/azure >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan > Fix For: 2.9.0, 3.0.0-alpha4 > > Attachments: HADOOP-14478.001.patch, HADOOP-14478.002.patch, > HADOOP-14478.003.patch > > > Azure's {{BlobbInputStream}} internally buffers 4 MB of data irrespective of > the data length requested for. This would be beneficial for sequential reads. > However, for positional reads (seek to specific location, read x number of > bytes, seek back to original location) this may not be beneficial and might > even download lot more data which are not used later. > It would be good to override {{readFully(long position, byte[] buffer, int > offset, int length)}} for {{NativeAzureFsInputStream}} and make use of > {{mark(readLimit)}} as a hint to Azure's BlobInputStream. > BlobInputStream reference: > https://github.com/Azure/azure-storage-java/blob/master/microsoft-azure-storage/src/com/microsoft/azure/storage/blob/BlobInputStream.java#L448 > BlobInputStream can consider this as a hint later to determine the amount of > data to be read ahead. Changes to BlobInputStream would not be addressed in > this JIRA. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14478) Optimize NativeAzureFsInputStream for positional reads
[ https://issues.apache.org/jira/browse/HADOOP-14478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16035724#comment-16035724 ] Rajesh Balamohan commented on HADOOP-14478: --- Thanks [~liuml07] > Optimize NativeAzureFsInputStream for positional reads > -- > > Key: HADOOP-14478 > URL: https://issues.apache.org/jira/browse/HADOOP-14478 > Project: Hadoop Common > Issue Type: Bug > Components: fs/azure >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan > Attachments: HADOOP-14478.001.patch, HADOOP-14478.002.patch, > HADOOP-14478.003.patch > > > Azure's {{BlobbInputStream}} internally buffers 4 MB of data irrespective of > the data length requested for. This would be beneficial for sequential reads. > However, for positional reads (seek to specific location, read x number of > bytes, seek back to original location) this may not be beneficial and might > even download lot more data which are not used later. > It would be good to override {{readFully(long position, byte[] buffer, int > offset, int length)}} for {{NativeAzureFsInputStream}} and make use of > {{mark(readLimit)}} as a hint to Azure's BlobInputStream. > BlobInputStream reference: > https://github.com/Azure/azure-storage-java/blob/master/microsoft-azure-storage/src/com/microsoft/azure/storage/blob/BlobInputStream.java#L448 > BlobInputStream can consider this as a hint later to determine the amount of > data to be read ahead. Changes to BlobInputStream would not be addressed in > this JIRA. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14478) Optimize NativeAzureFsInputStream for positional reads
[ https://issues.apache.org/jira/browse/HADOOP-14478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HADOOP-14478: -- Attachment: HADOOP-14478.003.patch Attaching .3 patch to address checkstyle issue (removed unused import statement) > Optimize NativeAzureFsInputStream for positional reads > -- > > Key: HADOOP-14478 > URL: https://issues.apache.org/jira/browse/HADOOP-14478 > Project: Hadoop Common > Issue Type: Bug > Components: fs/azure >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan > Attachments: HADOOP-14478.001.patch, HADOOP-14478.002.patch, > HADOOP-14478.003.patch > > > Azure's {{BlobbInputStream}} internally buffers 4 MB of data irrespective of > the data length requested for. This would be beneficial for sequential reads. > However, for positional reads (seek to specific location, read x number of > bytes, seek back to original location) this may not be beneficial and might > even download lot more data which are not used later. > It would be good to override {{readFully(long position, byte[] buffer, int > offset, int length)}} for {{NativeAzureFsInputStream}} and make use of > {{mark(readLimit)}} as a hint to Azure's BlobInputStream. > BlobInputStream reference: > https://github.com/Azure/azure-storage-java/blob/master/microsoft-azure-storage/src/com/microsoft/azure/storage/blob/BlobInputStream.java#L448 > BlobInputStream can consider this as a hint later to determine the amount of > data to be read ahead. Changes to BlobInputStream would not be addressed in > this JIRA. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14473) Optimize NativeAzureFileSystem::seek for forward seeks
[ https://issues.apache.org/jira/browse/HADOOP-14473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16034683#comment-16034683 ] Rajesh Balamohan commented on HADOOP-14473: --- Since it was easier to combine this patch with HADOOP-14478, I have merged it and posted the revised patch there. In the revised patch, I have fixed an issue in seek() and shared the test results as well there. Tests were run against "japan west region" end point. {{BlobInputStream::skip()}} is more of a no-op call. Issue was related to closing the stream and opening it again via {{store.retrieve()}} as it would end up creating new {{BlobInputStream}}. And that would internally need additional http call as it needs to download blob attributes internally in {{BlobInputStream}}. This has been avoided in the patch. I completely agree that it would be good to get the instrumentation similar to s3a, and it was very useful. Please let me know if this could be done in incremental tickets. > Optimize NativeAzureFileSystem::seek for forward seeks > -- > > Key: HADOOP-14473 > URL: https://issues.apache.org/jira/browse/HADOOP-14473 > Project: Hadoop Common > Issue Type: Bug > Components: fs/azure >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan > Attachments: HADOOP-14473-001.patch > > > {{NativeAzureFileSystem::seek()}} closes and re-opens the inputstream > irrespective of forward/backward seek. It would be beneficial to re-open the > stream on backward seek. > https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azure/NativeAzureFileSystem.java#L889 -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Assigned] (HADOOP-14478) Optimize NativeAzureFsInputStream for positional reads
[ https://issues.apache.org/jira/browse/HADOOP-14478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan reassigned HADOOP-14478: - Assignee: Rajesh Balamohan > Optimize NativeAzureFsInputStream for positional reads > -- > > Key: HADOOP-14478 > URL: https://issues.apache.org/jira/browse/HADOOP-14478 > Project: Hadoop Common > Issue Type: Bug > Components: fs/azure >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan > Attachments: HADOOP-14478.001.patch, HADOOP-14478.002.patch > > > Azure's {{BlobbInputStream}} internally buffers 4 MB of data irrespective of > the data length requested for. This would be beneficial for sequential reads. > However, for positional reads (seek to specific location, read x number of > bytes, seek back to original location) this may not be beneficial and might > even download lot more data which are not used later. > It would be good to override {{readFully(long position, byte[] buffer, int > offset, int length)}} for {{NativeAzureFsInputStream}} and make use of > {{mark(readLimit)}} as a hint to Azure's BlobInputStream. > BlobInputStream reference: > https://github.com/Azure/azure-storage-java/blob/master/microsoft-azure-storage/src/com/microsoft/azure/storage/blob/BlobInputStream.java#L448 > BlobInputStream can consider this as a hint later to determine the amount of > data to be read ahead. Changes to BlobInputStream would not be addressed in > this JIRA. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14478) Optimize NativeAzureFsInputStream for positional reads
[ https://issues.apache.org/jira/browse/HADOOP-14478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HADOOP-14478: -- Attachment: HADOOP-14478.002.patch Attaching .2 version with fixes in seek(). Also attaching test results from hadoop-azure module. My azure machine and endpoints are hosted in "Japan West region" {noformat} hdiuser@hn0:~/hadoop/hadoop-tools/hadoop-azure⟫ mvn test ... .. Tests run: 16, Failures: 0, Errors: 0, Skipped: 16, Time elapsed: 0.421 sec - in org.apache.hadoop.fs.azure.TestFileSystemOperationExceptionHandling Running org.apache.hadoop.fs.azure.TestAzureConcurrentOutOfBandIo Tests run: 1, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0.361 sec - in org.apache.hadoop.fs.azure.TestAzureConcurrentOutOfBandIo Running org.apache.hadoop.fs.azure.TestAzureFileSystemErrorConditions Tests run: 6, Failures: 0, Errors: 0, Skipped: 3, Time elapsed: 0.939 sec - in org.apache.hadoop.fs.azure.TestAzureFileSystemErrorConditions Results : Tests run: 703, Failures: 0, Errors: 0, Skipped: 436 [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 01:50 min [INFO] Finished at: 2017-06-02T13:08:42+00:00 [INFO] Final Memory: 29M/1574M [INFO] {noformat} > Optimize NativeAzureFsInputStream for positional reads > -- > > Key: HADOOP-14478 > URL: https://issues.apache.org/jira/browse/HADOOP-14478 > Project: Hadoop Common > Issue Type: Bug > Components: fs/azure >Reporter: Rajesh Balamohan > Attachments: HADOOP-14478.001.patch, HADOOP-14478.002.patch > > > Azure's {{BlobbInputStream}} internally buffers 4 MB of data irrespective of > the data length requested for. This would be beneficial for sequential reads. > However, for positional reads (seek to specific location, read x number of > bytes, seek back to original location) this may not be beneficial and might > even download lot more data which are not used later. > It would be good to override {{readFully(long position, byte[] buffer, int > offset, int length)}} for {{NativeAzureFsInputStream}} and make use of > {{mark(readLimit)}} as a hint to Azure's BlobInputStream. > BlobInputStream reference: > https://github.com/Azure/azure-storage-java/blob/master/microsoft-azure-storage/src/com/microsoft/azure/storage/blob/BlobInputStream.java#L448 > BlobInputStream can consider this as a hint later to determine the amount of > data to be read ahead. Changes to BlobInputStream would not be addressed in > this JIRA. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14478) Optimize NativeAzureFsInputStream for positional reads
[ https://issues.apache.org/jira/browse/HADOOP-14478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HADOOP-14478: -- Description: Azure's {{BlobbInputStream}} internally buffers 4 MB of data irrespective of the data length requested for. This would be beneficial for sequential reads. However, for positional reads (seek to specific location, read x number of bytes, seek back to original location) this may not be beneficial and might even download lot more data which are not used later. It would be good to override {{readFully(long position, byte[] buffer, int offset, int length)}} for {{NativeAzureFsInputStream}} and make use of {{mark(readLimit)}} as a hint to Azure's BlobInputStream. BlobInputStream reference: https://github.com/Azure/azure-storage-java/blob/master/microsoft-azure-storage/src/com/microsoft/azure/storage/blob/BlobInputStream.java#L448 BlobInputStream can consider this as a hint later to determine the amount of data to be read ahead. Changes to BlobInputStream would not be addressed in this JIRA. was: Azure's {{BlobbInputStream}} internally buffers 4 MB of data irrespective of the data length requested for. This would be beneficial for sequential reads. However, for positional reads (seek to specific location, read x number of bytes, seek back to original location) this may not be beneficial and might even download lot more data which are not used later. It would be good to override {{readFully(long position, byte[] buffer, int offset, int length)}} for {{NativeAzureFsInputStream}} and make use of {{mark(readLimit)}} as a hint to Azure's BlobInputStream. BlobInputStream reference: https://github.com/Azure/azure-storage-java/blob/master/microsoft-azure-storage/src/com/microsoft/azure/storage/blob/BlobInputStream.java#L448 BlobInputStream can consider this as a hint later to determine the amount of data to be read ahead. Changes to BlobInputStream would not be apart of this JIRA. > Optimize NativeAzureFsInputStream for positional reads > -- > > Key: HADOOP-14478 > URL: https://issues.apache.org/jira/browse/HADOOP-14478 > Project: Hadoop Common > Issue Type: Bug > Components: fs/azure >Reporter: Rajesh Balamohan > Attachments: HADOOP-14478.001.patch > > > Azure's {{BlobbInputStream}} internally buffers 4 MB of data irrespective of > the data length requested for. This would be beneficial for sequential reads. > However, for positional reads (seek to specific location, read x number of > bytes, seek back to original location) this may not be beneficial and might > even download lot more data which are not used later. > It would be good to override {{readFully(long position, byte[] buffer, int > offset, int length)}} for {{NativeAzureFsInputStream}} and make use of > {{mark(readLimit)}} as a hint to Azure's BlobInputStream. > BlobInputStream reference: > https://github.com/Azure/azure-storage-java/blob/master/microsoft-azure-storage/src/com/microsoft/azure/storage/blob/BlobInputStream.java#L448 > BlobInputStream can consider this as a hint later to determine the amount of > data to be read ahead. Changes to BlobInputStream would not be addressed in > this JIRA. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14478) Optimize NativeAzureFsInputStream for positional reads
[ https://issues.apache.org/jira/browse/HADOOP-14478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HADOOP-14478: -- Attachment: HADOOP-14478.001.patch Attaching .1 patch for review. This includes changes related to HADOOP-14473 as well. > Optimize NativeAzureFsInputStream for positional reads > -- > > Key: HADOOP-14478 > URL: https://issues.apache.org/jira/browse/HADOOP-14478 > Project: Hadoop Common > Issue Type: Bug > Components: fs/azure >Reporter: Rajesh Balamohan > Attachments: HADOOP-14478.001.patch > > > Azure's {{BlobbInputStream}} internally buffers 4 MB of data irrespective of > the data length requested for. This would be beneficial for sequential reads. > However, for positional reads (seek to specific location, read x number of > bytes, seek back to original location) this may not be beneficial and might > even download lot more data which are not used later. > It would be good to override {{readFully(long position, byte[] buffer, int > offset, int length)}} for {{NativeAzureFsInputStream}} and make use of > {{mark(readLimit)}} as a hint to Azure's BlobInputStream. > BlobInputStream reference: > https://github.com/Azure/azure-storage-java/blob/master/microsoft-azure-storage/src/com/microsoft/azure/storage/blob/BlobInputStream.java#L448 > BlobInputStream can consider this as a hint later to determine the amount of > data to be read ahead. Changes to BlobInputStream would not be apart of this > JIRA. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-14478) Optimize NativeAzureFsInputStream for positional reads
Rajesh Balamohan created HADOOP-14478: - Summary: Optimize NativeAzureFsInputStream for positional reads Key: HADOOP-14478 URL: https://issues.apache.org/jira/browse/HADOOP-14478 Project: Hadoop Common Issue Type: Bug Components: fs/azure Reporter: Rajesh Balamohan Azure's {{BlobbInputStream}} internally buffers 4 MB of data irrespective of the data length requested for. This would be beneficial for sequential reads. However, for positional reads (seek to specific location, read x number of bytes, seek back to original location) this may not be beneficial and might even download lot more data which are not used later. It would be good to override {{readFully(long position, byte[] buffer, int offset, int length)}} for {{NativeAzureFsInputStream}} and make use of {{mark(readLimit)}} as a hint to Azure's BlobInputStream. BlobInputStream reference: https://github.com/Azure/azure-storage-java/blob/master/microsoft-azure-storage/src/com/microsoft/azure/storage/blob/BlobInputStream.java#L448 BlobInputStream can consider this as a hint later to determine the amount of data to be read ahead. Changes to BlobInputStream would not be apart of this JIRA. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14473) Optimize NativeAzureFileSystem::seek for forward seeks
[ https://issues.apache.org/jira/browse/HADOOP-14473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HADOOP-14473: -- Attachment: HADOOP-14473-001.patch > Optimize NativeAzureFileSystem::seek for forward seeks > -- > > Key: HADOOP-14473 > URL: https://issues.apache.org/jira/browse/HADOOP-14473 > Project: Hadoop Common > Issue Type: Bug > Components: fs/azure >Reporter: Rajesh Balamohan > Attachments: HADOOP-14473-001.patch > > > {{NativeAzureFileSystem::seek()}} closes and re-opens the inputstream > irrespective of forward/backward seek. It would be beneficial to re-open the > stream on backward seek. > https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azure/NativeAzureFileSystem.java#L889 -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-14473) Optimize NativeAzureFileSystem::seek for forward seeks
Rajesh Balamohan created HADOOP-14473: - Summary: Optimize NativeAzureFileSystem::seek for forward seeks Key: HADOOP-14473 URL: https://issues.apache.org/jira/browse/HADOOP-14473 Project: Hadoop Common Issue Type: Bug Components: fs/azure Reporter: Rajesh Balamohan Priority: Minor {{NativeAzureFileSystem::seek()}} closes and re-opens the inputstream irrespective of forward/backward seek. It would be beneficial to re-open the stream on backward seek. https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azure/NativeAzureFileSystem.java#L889 -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13926) S3Guard: S3AFileSystem::listLocatedStatus() to employ MetadataStore
[ https://issues.apache.org/jira/browse/HADOOP-13926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15954481#comment-15954481 ] Rajesh Balamohan commented on HADOOP-13926: --- Thanks for the patch [~liuml07]. Patch LGTM. Very minor comment: {{Listing::createFileStatusListingIterator}} may need to have {{providedStatus}} in its javadoc. > S3Guard: S3AFileSystem::listLocatedStatus() to employ MetadataStore > --- > > Key: HADOOP-13926 > URL: https://issues.apache.org/jira/browse/HADOOP-13926 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Rajesh Balamohan >Assignee: Mingliang Liu > Attachments: HADOOP-13926-HADOOP-13345.001.patch, > HADOOP-13926-HADOOP-13345.002.patch, HADOOP-13926-HADOOP-13345.003.patch, > HADOOP-13926-HADOOP-13345.004.patch, > HADOOP-13926.wip.proto.branch-13345.1.patch > > > Need to check if {{listLocatedStatus}} can make use of metastore's > listChildren feature. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14154) Set isAuthoritative flag when creating DirListingMetadata in DynamoDBMetaStore
[ https://issues.apache.org/jira/browse/HADOOP-14154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HADOOP-14154: -- Status: Open (was: Patch Available) > Set isAuthoritative flag when creating DirListingMetadata in DynamoDBMetaStore > -- > > Key: HADOOP-14154 > URL: https://issues.apache.org/jira/browse/HADOOP-14154 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Rajesh Balamohan >Priority: Minor > Attachments: HADOOP-14154-HADOOP-13345.001.patch, > HADOOP-14154-HADOOP-13345.002.patch > > > Currently {{DynamoDBMetaStore::listChildren}} does not populate > {{isAuthoritative}} flag when creating {{DirListingMetadata}}. > This causes additional S3 lookups even when users have enabled > {{fs.s3a.metadatastore.authoritative}}. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14154) Set isAuthoritative flag when creating DirListingMetadata in DynamoDBMetaStore
[ https://issues.apache.org/jira/browse/HADOOP-14154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15904013#comment-15904013 ] Rajesh Balamohan commented on HADOOP-14154: --- Thanks for the clarification [~fabbri]. Would that isAuthoritative flag have to be setup by higher level applications like Pig/Hive/MR? > Set isAuthoritative flag when creating DirListingMetadata in DynamoDBMetaStore > -- > > Key: HADOOP-14154 > URL: https://issues.apache.org/jira/browse/HADOOP-14154 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Rajesh Balamohan >Priority: Minor > Attachments: HADOOP-14154-HADOOP-13345.001.patch, > HADOOP-14154-HADOOP-13345.002.patch > > > Currently {{DynamoDBMetaStore::listChildren}} does not populate > {{isAuthoritative}} flag when creating {{DirListingMetadata}}. > This causes additional S3 lookups even when users have enabled > {{fs.s3a.metadatastore.authoritative}}. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-14165) Add S3Guard.dirListingUnion in S3AFileSystem#listFiles, listLocatedStatus
Rajesh Balamohan created HADOOP-14165: - Summary: Add S3Guard.dirListingUnion in S3AFileSystem#listFiles, listLocatedStatus Key: HADOOP-14165 URL: https://issues.apache.org/jira/browse/HADOOP-14165 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Reporter: Rajesh Balamohan Priority: Minor {{S3Guard::dirListingUnion}} merges information from backing store and DDB to create consistent view. This needs to be added in {{S3AFileSystem::listFiles}} and {{S3AFileSystem::listLocatedStatus}} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14154) Set isAuthoritative flag when creating DirListingMetadata in DynamoDBMetaStore
[ https://issues.apache.org/jira/browse/HADOOP-14154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HADOOP-14154: -- Attachment: HADOOP-14154-HADOOP-13345.002.patch Fixing checkstyle issues. > Set isAuthoritative flag when creating DirListingMetadata in DynamoDBMetaStore > -- > > Key: HADOOP-14154 > URL: https://issues.apache.org/jira/browse/HADOOP-14154 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Rajesh Balamohan >Priority: Minor > Attachments: HADOOP-14154-HADOOP-13345.001.patch, > HADOOP-14154-HADOOP-13345.002.patch > > > Currently {{DynamoDBMetaStore::listChildren}} does not populate > {{isAuthoritative}} flag when creating {{DirListingMetadata}}. > This causes additional S3 lookups even when users have enabled > {{fs.s3a.metadatastore.authoritative}}. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14154) Set isAuthoritative flag when creating DirListingMetadata in DynamoDBMetaStore
[ https://issues.apache.org/jira/browse/HADOOP-14154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HADOOP-14154: -- Status: Patch Available (was: Open) > Set isAuthoritative flag when creating DirListingMetadata in DynamoDBMetaStore > -- > > Key: HADOOP-14154 > URL: https://issues.apache.org/jira/browse/HADOOP-14154 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Rajesh Balamohan >Priority: Minor > Attachments: HADOOP-14154-HADOOP-13345.001.patch > > > Currently {{DynamoDBMetaStore::listChildren}} does not populate > {{isAuthoritative}} flag when creating {{DirListingMetadata}}. > This causes additional S3 lookups even when users have enabled > {{fs.s3a.metadatastore.authoritative}}. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14154) Set isAuthoritative flag when creating DirListingMetadata in DynamoDBMetaStore
[ https://issues.apache.org/jira/browse/HADOOP-14154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HADOOP-14154: -- Attachment: HADOOP-14154-HADOOP-13345.001.patch > Set isAuthoritative flag when creating DirListingMetadata in DynamoDBMetaStore > -- > > Key: HADOOP-14154 > URL: https://issues.apache.org/jira/browse/HADOOP-14154 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Rajesh Balamohan >Priority: Minor > Attachments: HADOOP-14154-HADOOP-13345.001.patch > > > Currently {{DynamoDBMetaStore::listChildren}} does not populate > {{isAuthoritative}} flag when creating {{DirListingMetadata}}. > This causes additional S3 lookups even when users have enabled > {{fs.s3a.metadatastore.authoritative}}. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-14154) Set isAuthoritative flag when creating DirListingMetadata in DynamoDBMetaStore
Rajesh Balamohan created HADOOP-14154: - Summary: Set isAuthoritative flag when creating DirListingMetadata in DynamoDBMetaStore Key: HADOOP-14154 URL: https://issues.apache.org/jira/browse/HADOOP-14154 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Reporter: Rajesh Balamohan Priority: Minor Currently {{DynamoDBMetaStore::listChildren}} does not populate {{isAuthoritative}} flag when creating {{DirListingMetadata}}. This causes additional S3 lookups even when users have enabled {{fs.s3a.metadatastore.authoritative}}. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13914) s3guard: improve S3AFileStatus#isEmptyDirectory handling
[ https://issues.apache.org/jira/browse/HADOOP-13914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898949#comment-15898949 ] Rajesh Balamohan commented on HADOOP-13914: --- {noformat} S3AFileStatus innerGetFileStatus(final Path f, boolean needEmptyDirectoryFlag) throws IOException { .. // Check MetadataStore, if any. .. PathMetadata pm = metadataStore.get(path, true); .. {noformat} Should {{needEmptyDirectoryFlag}} be passed on to {{MetadataStore}} ? This would avoid additional {QuerySpec}. This is similar [~liuml07]'s comment #1. > s3guard: improve S3AFileStatus#isEmptyDirectory handling > > > Key: HADOOP-13914 > URL: https://issues.apache.org/jira/browse/HADOOP-13914 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: HADOOP-13345 >Reporter: Aaron Fabbri >Assignee: Aaron Fabbri > Attachments: HADOOP-13914-HADOOP-13345.000.patch, > HADOOP-13914-HADOOP-13345.002.patch, HADOOP-13914-HADOOP-13345.003.patch, > HADOOP-13914-HADOOP-13345.004.patch, HADOOP-13914-HADOOP-13345.005.patch, > s3guard-empty-dirs.md, test-only-HADOOP-13914.patch > > > As discussed in HADOOP-13449, proper support for the isEmptyDirectory() flag > stored in S3AFileStatus is missing from DynamoDBMetadataStore. > The approach taken by LocalMetadataStore is not suitable for the DynamoDB > implementation, and also sacrifices good code separation to minimize > S3AFileSystem changes pre-merge to trunk. > I will attach a design doc that attempts to clearly explain the problem and > preferred solution. I suggest we do this work after merging the HADOOP-13345 > branch to trunk, but am open to suggestions. > I can also attach a patch of a integration test that exercises the missing > case and demonstrates a failure with DynamoDBMetadataStore. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14081) S3A: Consider avoiding array copy in S3ABlockOutputStream (ByteArrayBlock)
[ https://issues.apache.org/jira/browse/HADOOP-14081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875140#comment-15875140 ] Rajesh Balamohan commented on HADOOP-14081: --- Thanks [~ste...@apache.org]. I ran with "mvn test -Dtest=ITestS\* -Dscale". I should have used scale test param for huge filesize upload. > S3A: Consider avoiding array copy in S3ABlockOutputStream (ByteArrayBlock) > -- > > Key: HADOOP-14081 > URL: https://issues.apache.org/jira/browse/HADOOP-14081 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Fix For: 2.8.0 > > Attachments: HADOOP-14081.001.patch > > > In {{S3ADataBlocks::ByteArrayBlock}}, data is copied whenever {{startUpload}} > is called. It might be possible to directly access the byte[] array from > ByteArrayOutputStream. > Might have to extend ByteArrayOutputStream and create a method like > getInputStream() which can return ByteArrayInputStream. This would avoid > expensive array copy during large upload. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14081) S3A: Consider avoiding array copy in S3ABlockOutputStream (ByteArrayBlock)
[ https://issues.apache.org/jira/browse/HADOOP-14081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15871711#comment-15871711 ] Rajesh Balamohan commented on HADOOP-14081: --- Thanks [~ste...@apache.org]. Here are the test results (region: S3 bucket in U.S east. Tests were run from my laptop). Errors are due to socket time outs (180 seconds). Checked ITestS3AContractGetFileStatus.teardown, which was again due to socket timeout. {noformat} Results : Tests in error: ITestS3ContractOpen>AbstractFSContractTestBase.setup:193->AbstractFSContractTestBase.mkdirs:338 » SocketTimeout ITestS3AContractGetFileStatus.teardown:40->AbstractFSContractTestBase.teardown:204->AbstractFSContractTestBase.deleteTestDirInTeardown:213 » ITestS3AContractRootDir>AbstractContractRootDirectoryTest.testRmEmptyRootDirNonRecursive:116 » PathIO ITestS3NContractOpen>AbstractFSContractTestBase.setup:193->AbstractFSContractTestBase.mkdirs:338 » SocketTimeout Tests run: 454, Failures: 0, Errors: 4, Skipped: 56 .. .. [INFO] Total time: 02:11 h {noformat} > S3A: Consider avoiding array copy in S3ABlockOutputStream (ByteArrayBlock) > -- > > Key: HADOOP-14081 > URL: https://issues.apache.org/jira/browse/HADOOP-14081 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Attachments: HADOOP-14081.001.patch > > > In {{S3ADataBlocks::ByteArrayBlock}}, data is copied whenever {{startUpload}} > is called. It might be possible to directly access the byte[] array from > ByteArrayOutputStream. > Might have to extend ByteArrayOutputStream and create a method like > getInputStream() which can return ByteArrayInputStream. This would avoid > expensive array copy during large upload. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14081) S3A: Consider avoiding array copy in S3ABlockOutputStream (ByteArrayBlock)
[ https://issues.apache.org/jira/browse/HADOOP-14081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HADOOP-14081: -- Status: Patch Available (was: Open) > S3A: Consider avoiding array copy in S3ABlockOutputStream (ByteArrayBlock) > -- > > Key: HADOOP-14081 > URL: https://issues.apache.org/jira/browse/HADOOP-14081 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Rajesh Balamohan >Priority: Minor > Attachments: HADOOP-14081.001.patch > > > In {{S3ADataBlocks::ByteArrayBlock}}, data is copied whenever {{startUpload}} > is called. It might be possible to directly access the byte[] array from > ByteArrayOutputStream. > Might have to extend ByteArrayOutputStream and create a method like > getInputStream() which can return ByteArrayInputStream. This would avoid > expensive array copy during large upload. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14081) S3A: Consider avoiding array copy in S3ABlockOutputStream (ByteArrayBlock)
[ https://issues.apache.org/jira/browse/HADOOP-14081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HADOOP-14081: -- Attachment: HADOOP-14081.001.patch > S3A: Consider avoiding array copy in S3ABlockOutputStream (ByteArrayBlock) > -- > > Key: HADOOP-14081 > URL: https://issues.apache.org/jira/browse/HADOOP-14081 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Rajesh Balamohan >Priority: Minor > Attachments: HADOOP-14081.001.patch > > > In {{S3ADataBlocks::ByteArrayBlock}}, data is copied whenever {{startUpload}} > is called. It might be possible to directly access the byte[] array from > ByteArrayOutputStream. > Might have to extend ByteArrayOutputStream and create a method like > getInputStream() which can return ByteArrayInputStream. This would avoid > expensive array copy during large upload. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-14081) S3A: Consider avoiding array copy in S3ABlockOutputStream (ByteArrayBlock)
Rajesh Balamohan created HADOOP-14081: - Summary: S3A: Consider avoiding array copy in S3ABlockOutputStream (ByteArrayBlock) Key: HADOOP-14081 URL: https://issues.apache.org/jira/browse/HADOOP-14081 Project: Hadoop Common Issue Type: Improvement Components: fs/s3 Reporter: Rajesh Balamohan Priority: Minor In {{S3ADataBlocks::ByteArrayBlock}}, data is copied whenever {{startUpload}} is called. It might be possible to directly access the byte[] array from ByteArrayOutputStream. Might have to extend ByteArrayOutputStream and create a method like getInputStream() which can return ByteArrayInputStream. This would avoid expensive array copy during large upload. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14081) S3A: Consider avoiding array copy in S3ABlockOutputStream (ByteArrayBlock)
[ https://issues.apache.org/jira/browse/HADOOP-14081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HADOOP-14081: -- Issue Type: Sub-task (was: Improvement) Parent: HADOOP-13204 > S3A: Consider avoiding array copy in S3ABlockOutputStream (ByteArrayBlock) > -- > > Key: HADOOP-14081 > URL: https://issues.apache.org/jira/browse/HADOOP-14081 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Rajesh Balamohan >Priority: Minor > > In {{S3ADataBlocks::ByteArrayBlock}}, data is copied whenever {{startUpload}} > is called. It might be possible to directly access the byte[] array from > ByteArrayOutputStream. > Might have to extend ByteArrayOutputStream and create a method like > getInputStream() which can return ByteArrayInputStream. This would avoid > expensive array copy during large upload. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13926) S3Guard: Improve listLocatedStatus
[ https://issues.apache.org/jira/browse/HADOOP-13926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15816583#comment-15816583 ] Rajesh Balamohan commented on HADOOP-13926: --- Agreed. That would need change in s3guard to support few million entries in {{DirListingMetadata listChildren(Path path)}} or make changes in the API to support larger set of entries in the listing. > S3Guard: Improve listLocatedStatus > -- > > Key: HADOOP-13926 > URL: https://issues.apache.org/jira/browse/HADOOP-13926 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Rajesh Balamohan >Priority: Minor > Attachments: HADOOP-13926.wip.proto.branch-13345.1.patch > > > Need to check if {{listLocatedStatus}} can make use of metastore's > listChildren feature. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13926) S3Guard: Improve listLocatedStatus
[ https://issues.apache.org/jira/browse/HADOOP-13926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HADOOP-13926: -- Attachment: HADOOP-13926.wip.proto.branch-13345.1.patch > S3Guard: Improve listLocatedStatus > -- > > Key: HADOOP-13926 > URL: https://issues.apache.org/jira/browse/HADOOP-13926 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Rajesh Balamohan >Priority: Minor > Attachments: HADOOP-13926.wip.proto.branch-13345.1.patch > > > Need to check if {{listLocatedStatus}} can make use of metastore's > listChildren feature. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-13936) S3Guard: DynamoDB can go out of sync with S3AFileSystem::delete operation
Rajesh Balamohan created HADOOP-13936: - Summary: S3Guard: DynamoDB can go out of sync with S3AFileSystem::delete operation Key: HADOOP-13936 URL: https://issues.apache.org/jira/browse/HADOOP-13936 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Reporter: Rajesh Balamohan Priority: Minor As a part of {{S3AFileSystem.delete}} operation {{innerDelete}} is invoked, which deletes keys from S3 in batches (default is 1000). But DynamoDB is updated only at the end of this operation. This can cause issues when deleting large number of keys. E.g, it is possible to get exception after deleting 1000 keys and in such cases dynamoDB would not be updated. This can cause DynamoDB to go out of sync. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13934) S3Guard: DynamoDBMetaStore::move could be throwing exception due to BatchWriteItem limits
[ https://issues.apache.org/jira/browse/HADOOP-13934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HADOOP-13934: -- Summary: S3Guard: DynamoDBMetaStore::move could be throwing exception due to BatchWriteItem limits (was: S3Guard: DynamoDBMetaStore::move throws exception with limited info) > S3Guard: DynamoDBMetaStore::move could be throwing exception due to > BatchWriteItem limits > - > > Key: HADOOP-13934 > URL: https://issues.apache.org/jira/browse/HADOOP-13934 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Rajesh Balamohan >Priority: Minor > Fix For: 2.9.0 > > > When using {{DynamoDBMetadataStore}} with a insert heavy hive app , it > started throwing exceptions in {{DynamoDBMetadataStore::move}}. But just with > the following exception, it is relatively hard to debug on the real issue in > DynamoDB side. > {noformat} > Caused by: com.amazonaws.services.dynamodbv2.model.AmazonDynamoDBException: 1 > validation error detected: Value > '{ddb-table-name-334=[com.amazonaws.dynamodb.v20120810.WriteRequest@ca1da583, > com.amazonaws.dynamodb.v20120810.WriteRequest@ca1fc7cd, > com.amazonaws.dynamodb.v20120810.WriteRequest@ca4244e6, > com.amazonaws.dynamodb.v20120810.WriteRequest@ca2f58a9, > com.amazonaws.dynamodb.v20120810.WriteRequest@ca3525f8, > ... > ... > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1529) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1167) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:948) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:661) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:635) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:618) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$300(AmazonHttpClient.java:586) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:573) > at > com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:445) > at > com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.doInvoke(AmazonDynamoDBClient.java:1722) > at > com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.invoke(AmazonDynamoDBClient.java:1698) > at > com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.batchWriteItem(AmazonDynamoDBClient.java:668) > at > com.amazonaws.services.dynamodbv2.document.internal.BatchWriteItemImpl.doBatchWriteItem(BatchWriteItemImpl.java:111) > at > com.amazonaws.services.dynamodbv2.document.internal.BatchWriteItemImpl.batchWriteItem(BatchWriteItemImpl.java:52) > at > com.amazonaws.services.dynamodbv2.document.DynamoDB.batchWriteItem(DynamoDB.java:178) > at > org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.move(DynamoDBMetadataStore.java:351) > ... 28 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13934) S3Guard: DynamoDBMetaStore::move throws exception with limited info
[ https://issues.apache.org/jira/browse/HADOOP-13934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15769320#comment-15769320 ] Rajesh Balamohan commented on HADOOP-13934: --- Suspecting the API limit in batchitem (http://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_BatchWriteItem.html). If the number of items are > 25, it could end up throwing this exception. It might be good to invoke this in micro batches? > S3Guard: DynamoDBMetaStore::move throws exception with limited info > --- > > Key: HADOOP-13934 > URL: https://issues.apache.org/jira/browse/HADOOP-13934 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Rajesh Balamohan >Priority: Minor > Fix For: 2.9.0 > > > When using {{DynamoDBMetadataStore}} with a insert heavy hive app , it > started throwing exceptions in {{DynamoDBMetadataStore::move}}. But just with > the following exception, it is relatively hard to debug on the real issue in > DynamoDB side. > {noformat} > Caused by: com.amazonaws.services.dynamodbv2.model.AmazonDynamoDBException: 1 > validation error detected: Value > '{ddb-table-name-334=[com.amazonaws.dynamodb.v20120810.WriteRequest@ca1da583, > com.amazonaws.dynamodb.v20120810.WriteRequest@ca1fc7cd, > com.amazonaws.dynamodb.v20120810.WriteRequest@ca4244e6, > com.amazonaws.dynamodb.v20120810.WriteRequest@ca2f58a9, > com.amazonaws.dynamodb.v20120810.WriteRequest@ca3525f8, > ... > ... > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1529) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1167) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:948) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:661) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:635) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:618) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$300(AmazonHttpClient.java:586) > at > com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:573) > at > com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:445) > at > com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.doInvoke(AmazonDynamoDBClient.java:1722) > at > com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.invoke(AmazonDynamoDBClient.java:1698) > at > com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.batchWriteItem(AmazonDynamoDBClient.java:668) > at > com.amazonaws.services.dynamodbv2.document.internal.BatchWriteItemImpl.doBatchWriteItem(BatchWriteItemImpl.java:111) > at > com.amazonaws.services.dynamodbv2.document.internal.BatchWriteItemImpl.batchWriteItem(BatchWriteItemImpl.java:52) > at > com.amazonaws.services.dynamodbv2.document.DynamoDB.batchWriteItem(DynamoDB.java:178) > at > org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.move(DynamoDBMetadataStore.java:351) > ... 28 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-13934) S3Guard: DynamoDBMetaStore::move throws exception with limited info
Rajesh Balamohan created HADOOP-13934: - Summary: S3Guard: DynamoDBMetaStore::move throws exception with limited info Key: HADOOP-13934 URL: https://issues.apache.org/jira/browse/HADOOP-13934 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Reporter: Rajesh Balamohan Priority: Minor Fix For: 2.9.0 When using {{DynamoDBMetadataStore}} with a insert heavy hive app , it started throwing exceptions in {{DynamoDBMetadataStore::move}}. But just with the following exception, it is relatively hard to debug on the real issue in DynamoDB side. {noformat} Caused by: com.amazonaws.services.dynamodbv2.model.AmazonDynamoDBException: 1 validation error detected: Value '{ddb-table-name-334=[com.amazonaws.dynamodb.v20120810.WriteRequest@ca1da583, com.amazonaws.dynamodb.v20120810.WriteRequest@ca1fc7cd, com.amazonaws.dynamodb.v20120810.WriteRequest@ca4244e6, com.amazonaws.dynamodb.v20120810.WriteRequest@ca2f58a9, com.amazonaws.dynamodb.v20120810.WriteRequest@ca3525f8, ... ... at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1529) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1167) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:948) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:661) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:635) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:618) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$300(AmazonHttpClient.java:586) at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:573) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:445) at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.doInvoke(AmazonDynamoDBClient.java:1722) at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.invoke(AmazonDynamoDBClient.java:1698) at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.batchWriteItem(AmazonDynamoDBClient.java:668) at com.amazonaws.services.dynamodbv2.document.internal.BatchWriteItemImpl.doBatchWriteItem(BatchWriteItemImpl.java:111) at com.amazonaws.services.dynamodbv2.document.internal.BatchWriteItemImpl.batchWriteItem(BatchWriteItemImpl.java:52) at com.amazonaws.services.dynamodbv2.document.DynamoDB.batchWriteItem(DynamoDB.java:178) at org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.move(DynamoDBMetadataStore.java:351) ... 28 more {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-13931) S3AGuard: Use BatchWriteItem in DynamoDBMetadataStore::put(DirListingMetadata)
Rajesh Balamohan created HADOOP-13931: - Summary: S3AGuard: Use BatchWriteItem in DynamoDBMetadataStore::put(DirListingMetadata) Key: HADOOP-13931 URL: https://issues.apache.org/jira/browse/HADOOP-13931 Project: Hadoop Common Issue Type: Sub-task Reporter: Rajesh Balamohan Priority: Minor Using {{batchWriteItem}} might be performant in {{DynamoDBMetadataStore::put(DirListingMetadata meta)}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-13925) S3Guard: NPE when table is already populated in dynamodb and user specifies "fs.s3a.s3guard.ddb.table.create=false"
[ https://issues.apache.org/jira/browse/HADOOP-13925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan resolved HADOOP-13925. --- Resolution: Duplicate > S3Guard: NPE when table is already populated in dynamodb and user specifies > "fs.s3a.s3guard.ddb.table.create=false" > --- > > Key: HADOOP-13925 > URL: https://issues.apache.org/jira/browse/HADOOP-13925 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Rajesh Balamohan >Assignee: Mingliang Liu >Priority: Minor > > When table is present dynamodb store and already populated, it is possible > that users can specify {{fs.s3a.s3guard.ddb.table.create=false}}. In such > cases, {{DynamoDBMetadataStore.get}} would end up throwing NPE as {{table}} > object may not be initialized. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-13926) S3Guard: Improve listLocatedStatus
Rajesh Balamohan created HADOOP-13926: - Summary: S3Guard: Improve listLocatedStatus Key: HADOOP-13926 URL: https://issues.apache.org/jira/browse/HADOOP-13926 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Reporter: Rajesh Balamohan Priority: Minor Need to check if {{listLocatedStatus}} can make use of metastore's listChildren feature. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-13925) S3Guard: NPE when table is already populated in dynamodb and user specifies "fs.s3a.s3guard.ddb.table.create=false"
Rajesh Balamohan created HADOOP-13925: - Summary: S3Guard: NPE when table is already populated in dynamodb and user specifies "fs.s3a.s3guard.ddb.table.create=false" Key: HADOOP-13925 URL: https://issues.apache.org/jira/browse/HADOOP-13925 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Reporter: Rajesh Balamohan Priority: Minor When table is present dynamodb store and already populated, it is possible that users can specify {{fs.s3a.s3guard.ddb.table.create=false}}. In such cases, {{DynamoDBMetadataStore.get}} would end up throwing NPE as {{table}} object may not be initialized. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-13757) Remove verifyBuckets overhead in S3AFileSystem::initialize()
Rajesh Balamohan created HADOOP-13757: - Summary: Remove verifyBuckets overhead in S3AFileSystem::initialize() Key: HADOOP-13757 URL: https://issues.apache.org/jira/browse/HADOOP-13757 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Reporter: Rajesh Balamohan Priority: Minor {{S3AFileSystem.initialize()}} invokes verifyBuckets, but in case the bucket does not exist and gets a 403 error message, it ends up returning {{true}} for {{s3.doesBucketExists(bucketName}}. In that aspect, verifyBuckets() is an unnecessary call during initialization. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13727) S3A: Reduce high number of connections to EC2 Instance Metadata Service caused by InstanceProfileCredentialsProvider.
[ https://issues.apache.org/jira/browse/HADOOP-13727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15594685#comment-15594685 ] Rajesh Balamohan commented on HADOOP-13727: --- I have tried out this patch in AWS test environment and fixes the issue. Are you referring to running entire test suite in aws ec2?. > S3A: Reduce high number of connections to EC2 Instance Metadata Service > caused by InstanceProfileCredentialsProvider. > - > > Key: HADOOP-13727 > URL: https://issues.apache.org/jira/browse/HADOOP-13727 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Rajesh Balamohan >Assignee: Chris Nauroth >Priority: Minor > Attachments: HADOOP-13727-branch-2.001.patch, > HADOOP-13727-branch-2.002.patch, HADOOP-13727-branch-2.003.patch, > HADOOP-13727-branch-2.004.patch > > > When running in an EC2 VM, S3A can make use of > {{InstanceProfileCredentialsProvider}} from the AWS SDK to obtain credentials > from the EC2 Instance Metadata Service. We have observed that for a highly > multi-threaded application, this may generate a high number of calls to the > Instance Metadata Service. The service may throttle the client by replying > with an HTTP 429 response or forcibly closing connections. We can greatly > reduce the number of calls to the service by enforcing that all threads use a > single shared instance of {{InstanceProfileCredentialsProvider}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13560) S3ABlockOutputStream to support huge (many GB) file writes
[ https://issues.apache.org/jira/browse/HADOOP-13560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15535127#comment-15535127 ] Rajesh Balamohan commented on HADOOP-13560: --- S3ABlockOutputStream::initiateMultiPartUpload() has the following {noformat} LOG.debug("Initiating Multipart upload for block {}", currentBlock); {noformat} In S3ADataBlocks.java, patch has the following for ByteArrayBlock {noformat} @Override public String toString() { return "ByteArrayBlock{" + "state=" + getState() + ", buffer=" + buffer + ", limit=" + limit + ", dataSize=" + dataSize + '}'; } {noformat} When DEBUG log was enabled to check the AWS traffic, it ended up printing the entire contents of the buffer. When trying to debug large data transfer (4 GB in my case), it ended up printing huge chunks which may not be needed. Would it be possible to only the buffer sizes?. > S3ABlockOutputStream to support huge (many GB) file writes > -- > > Key: HADOOP-13560 > URL: https://issues.apache.org/jira/browse/HADOOP-13560 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.9.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Attachments: HADOOP-13560-branch-2-001.patch, > HADOOP-13560-branch-2-002.patch, HADOOP-13560-branch-2-003.patch, > HADOOP-13560-branch-2-004.patch > > > An AWS SDK [issue|https://github.com/aws/aws-sdk-java/issues/367] highlights > that metadata isn't copied on large copies. > 1. Add a test to do that large copy/rname and verify that the copy really > works > 2. Verify that metadata makes it over. > Verifying large file rename is important on its own, as it is needed for very > large commit operations for committers using rename -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13169) Randomize file list in SimpleCopyListing
[ https://issues.apache.org/jira/browse/HADOOP-13169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HADOOP-13169: -- Attachment: HADOOP-13169-branch-2-010.patch Thank you [~ste...@apache.org], [~cnauroth]. Attached the revised the patch to address the review comments. > Randomize file list in SimpleCopyListing > > > Key: HADOOP-13169 > URL: https://issues.apache.org/jira/browse/HADOOP-13169 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Attachments: HADOOP-13169-branch-2-001.patch, > HADOOP-13169-branch-2-002.patch, HADOOP-13169-branch-2-003.patch, > HADOOP-13169-branch-2-004.patch, HADOOP-13169-branch-2-005.patch, > HADOOP-13169-branch-2-006.patch, HADOOP-13169-branch-2-007.patch, > HADOOP-13169-branch-2-008.patch, HADOOP-13169-branch-2-009.patch, > HADOOP-13169-branch-2-010.patch > > > When copying files to S3, based on file listing some mappers can get into S3 > partition hotspots. This would be more visible, when data is copied from hive > warehouse with lots of partitions (e.g date partitions). In such cases, some > of the tasks would tend to be a lot more slower than others. It would be good > to randomize the file paths which are written out in SimpleCopyListing to > avoid this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13169) Randomize file list in SimpleCopyListing
[ https://issues.apache.org/jira/browse/HADOOP-13169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HADOOP-13169: -- Attachment: HADOOP-13169-branch-2-009.patch Thank you [~cnauroth]. Changes in the latest patch: 1. Changed LinkedList to ArrayList in SimpleCopyListing 2. For the test, I thought of having guava's {{Ordering.arbitrary()}} which relies on {{System.identityHashCode}}, but that is also prone to collisions (http://bugs.java.com/bugdatabase/view_bug.do?bug_id=6809470). Instead, using {{setSeedForRandomListing(long seed)}} with "@VisibleForTesting" for testing purpose. > Randomize file list in SimpleCopyListing > > > Key: HADOOP-13169 > URL: https://issues.apache.org/jira/browse/HADOOP-13169 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Attachments: HADOOP-13169-branch-2-001.patch, > HADOOP-13169-branch-2-002.patch, HADOOP-13169-branch-2-003.patch, > HADOOP-13169-branch-2-004.patch, HADOOP-13169-branch-2-005.patch, > HADOOP-13169-branch-2-006.patch, HADOOP-13169-branch-2-007.patch, > HADOOP-13169-branch-2-008.patch, HADOOP-13169-branch-2-009.patch > > > When copying files to S3, based on file listing some mappers can get into S3 > partition hotspots. This would be more visible, when data is copied from hive > warehouse with lots of partitions (e.g date partitions). In such cases, some > of the tasks would tend to be a lot more slower than others. It would be good > to randomize the file paths which are written out in SimpleCopyListing to > avoid this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13169) Randomize file list in SimpleCopyListing
[ https://issues.apache.org/jira/browse/HADOOP-13169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HADOOP-13169: -- Status: Patch Available (was: Open) > Randomize file list in SimpleCopyListing > > > Key: HADOOP-13169 > URL: https://issues.apache.org/jira/browse/HADOOP-13169 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Attachments: HADOOP-13169-branch-2-001.patch, > HADOOP-13169-branch-2-002.patch, HADOOP-13169-branch-2-003.patch, > HADOOP-13169-branch-2-004.patch, HADOOP-13169-branch-2-005.patch, > HADOOP-13169-branch-2-006.patch, HADOOP-13169-branch-2-007.patch, > HADOOP-13169-branch-2-008.patch > > > When copying files to S3, based on file listing some mappers can get into S3 > partition hotspots. This would be more visible, when data is copied from hive > warehouse with lots of partitions (e.g date partitions). In such cases, some > of the tasks would tend to be a lot more slower than others. It would be good > to randomize the file paths which are written out in SimpleCopyListing to > avoid this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13169) Randomize file list in SimpleCopyListing
[ https://issues.apache.org/jira/browse/HADOOP-13169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HADOOP-13169: -- Attachment: HADOOP-13169-branch-2-008.patch > Randomize file list in SimpleCopyListing > > > Key: HADOOP-13169 > URL: https://issues.apache.org/jira/browse/HADOOP-13169 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Attachments: HADOOP-13169-branch-2-001.patch, > HADOOP-13169-branch-2-002.patch, HADOOP-13169-branch-2-003.patch, > HADOOP-13169-branch-2-004.patch, HADOOP-13169-branch-2-005.patch, > HADOOP-13169-branch-2-006.patch, HADOOP-13169-branch-2-007.patch, > HADOOP-13169-branch-2-008.patch > > > When copying files to S3, based on file listing some mappers can get into S3 > partition hotspots. This would be more visible, when data is copied from hive > warehouse with lots of partitions (e.g date partitions). In such cases, some > of the tasks would tend to be a lot more slower than others. It would be good > to randomize the file paths which are written out in SimpleCopyListing to > avoid this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13169) Randomize file list in SimpleCopyListing
[ https://issues.apache.org/jira/browse/HADOOP-13169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HADOOP-13169: -- Attachment: (was: HADOOP-13169-branch-2-008.patch) > Randomize file list in SimpleCopyListing > > > Key: HADOOP-13169 > URL: https://issues.apache.org/jira/browse/HADOOP-13169 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Attachments: HADOOP-13169-branch-2-001.patch, > HADOOP-13169-branch-2-002.patch, HADOOP-13169-branch-2-003.patch, > HADOOP-13169-branch-2-004.patch, HADOOP-13169-branch-2-005.patch, > HADOOP-13169-branch-2-006.patch, HADOOP-13169-branch-2-007.patch, > HADOOP-13169-branch-2-008.patch > > > When copying files to S3, based on file listing some mappers can get into S3 > partition hotspots. This would be more visible, when data is copied from hive > warehouse with lots of partitions (e.g date partitions). In such cases, some > of the tasks would tend to be a lot more slower than others. It would be good > to randomize the file paths which are written out in SimpleCopyListing to > avoid this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13169) Randomize file list in SimpleCopyListing
[ https://issues.apache.org/jira/browse/HADOOP-13169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HADOOP-13169: -- Status: Open (was: Patch Available) > Randomize file list in SimpleCopyListing > > > Key: HADOOP-13169 > URL: https://issues.apache.org/jira/browse/HADOOP-13169 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Attachments: HADOOP-13169-branch-2-001.patch, > HADOOP-13169-branch-2-002.patch, HADOOP-13169-branch-2-003.patch, > HADOOP-13169-branch-2-004.patch, HADOOP-13169-branch-2-005.patch, > HADOOP-13169-branch-2-006.patch, HADOOP-13169-branch-2-007.patch, > HADOOP-13169-branch-2-008.patch > > > When copying files to S3, based on file listing some mappers can get into S3 > partition hotspots. This would be more visible, when data is copied from hive > warehouse with lots of partitions (e.g date partitions). In such cases, some > of the tasks would tend to be a lot more slower than others. It would be good > to randomize the file paths which are written out in SimpleCopyListing to > avoid this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13169) Randomize file list in SimpleCopyListing
[ https://issues.apache.org/jira/browse/HADOOP-13169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HADOOP-13169: -- Attachment: HADOOP-13169-branch-2-008.patch Thanks [~ste...@apache.org]. Added isDebugEnabled() to be consistent with rest of the code in the latest patch. > Randomize file list in SimpleCopyListing > > > Key: HADOOP-13169 > URL: https://issues.apache.org/jira/browse/HADOOP-13169 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Attachments: HADOOP-13169-branch-2-001.patch, > HADOOP-13169-branch-2-002.patch, HADOOP-13169-branch-2-003.patch, > HADOOP-13169-branch-2-004.patch, HADOOP-13169-branch-2-005.patch, > HADOOP-13169-branch-2-006.patch, HADOOP-13169-branch-2-007.patch, > HADOOP-13169-branch-2-008.patch > > > When copying files to S3, based on file listing some mappers can get into S3 > partition hotspots. This would be more visible, when data is copied from hive > warehouse with lots of partitions (e.g date partitions). In such cases, some > of the tasks would tend to be a lot more slower than others. It would be good > to randomize the file paths which are written out in SimpleCopyListing to > avoid this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13169) Randomize file list in SimpleCopyListing
[ https://issues.apache.org/jira/browse/HADOOP-13169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HADOOP-13169: -- Attachment: HADOOP-13169-branch-2-007.patch > Randomize file list in SimpleCopyListing > > > Key: HADOOP-13169 > URL: https://issues.apache.org/jira/browse/HADOOP-13169 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Attachments: HADOOP-13169-branch-2-001.patch, > HADOOP-13169-branch-2-002.patch, HADOOP-13169-branch-2-003.patch, > HADOOP-13169-branch-2-004.patch, HADOOP-13169-branch-2-005.patch, > HADOOP-13169-branch-2-006.patch, HADOOP-13169-branch-2-007.patch > > > When copying files to S3, based on file listing some mappers can get into S3 > partition hotspots. This would be more visible, when data is copied from hive > warehouse with lots of partitions (e.g date partitions). In such cases, some > of the tasks would tend to be a lot more slower than others. It would be good > to randomize the file paths which are written out in SimpleCopyListing to > avoid this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13169) Randomize file list in SimpleCopyListing
[ https://issues.apache.org/jira/browse/HADOOP-13169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HADOOP-13169: -- Status: Patch Available (was: Open) > Randomize file list in SimpleCopyListing > > > Key: HADOOP-13169 > URL: https://issues.apache.org/jira/browse/HADOOP-13169 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Attachments: HADOOP-13169-branch-2-001.patch, > HADOOP-13169-branch-2-002.patch, HADOOP-13169-branch-2-003.patch, > HADOOP-13169-branch-2-004.patch, HADOOP-13169-branch-2-005.patch, > HADOOP-13169-branch-2-006.patch, HADOOP-13169-branch-2-007.patch > > > When copying files to S3, based on file listing some mappers can get into S3 > partition hotspots. This would be more visible, when data is copied from hive > warehouse with lots of partitions (e.g date partitions). In such cases, some > of the tasks would tend to be a lot more slower than others. It would be good > to randomize the file paths which are written out in SimpleCopyListing to > avoid this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13169) Randomize file list in SimpleCopyListing
[ https://issues.apache.org/jira/browse/HADOOP-13169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HADOOP-13169: -- Status: Open (was: Patch Available) > Randomize file list in SimpleCopyListing > > > Key: HADOOP-13169 > URL: https://issues.apache.org/jira/browse/HADOOP-13169 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Attachments: HADOOP-13169-branch-2-001.patch, > HADOOP-13169-branch-2-002.patch, HADOOP-13169-branch-2-003.patch, > HADOOP-13169-branch-2-004.patch, HADOOP-13169-branch-2-005.patch, > HADOOP-13169-branch-2-006.patch > > > When copying files to S3, based on file listing some mappers can get into S3 > partition hotspots. This would be more visible, when data is copied from hive > warehouse with lots of partitions (e.g date partitions). In such cases, some > of the tasks would tend to be a lot more slower than others. It would be good > to randomize the file paths which are written out in SimpleCopyListing to > avoid this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org