[jira] [Created] (HADOOP-18447) Vectored IO: Threadpool should be closed on interrupts or during close calls

2022-09-07 Thread Rajesh Balamohan (Jira)
Rajesh Balamohan created HADOOP-18447:
-

 Summary: Vectored IO: Threadpool should be closed on interrupts or 
during close calls
 Key: HADOOP-18447
 URL: https://issues.apache.org/jira/browse/HADOOP-18447
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: common, fs, fs/adl, fs/s3
Reporter: Rajesh Balamohan
 Attachments: Screenshot 2022-09-08 at 9.22.07 AM.png

Vectored IO threadpool should be closed on any interrupts or during 
S3AFileSystem/S3AInputStream close() calls.

E.g Query which got cancelled in the middle of the run. However, in background 
(e.g LLAP) vectored IO threads continued to run.

 

!Screenshot 2022-09-08 at 9.22.07 AM.png|width=537,height=164!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-18347) Restrict vectoredIO threadpool to reduce memory pressure

2022-07-18 Thread Rajesh Balamohan (Jira)
Rajesh Balamohan created HADOOP-18347:
-

 Summary: Restrict vectoredIO threadpool to reduce memory pressure
 Key: HADOOP-18347
 URL: https://issues.apache.org/jira/browse/HADOOP-18347
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: common, fs, fs/adl, fs/s3
Reporter: Rajesh Balamohan


https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java#L964-L967

Currently, it fetches all the ranges with unbounded threadpool. This will not 
cause memory pressures with standard benchmarks like TPCDS. However, when large 
number of ranges are present with large files, this could potentially spike up 
memory usage of the task. Limiting the threadpool size could reduce the memory 
usage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18106) Handle memory fragmentation in S3 Vectored IO implementation.

2022-06-06 Thread Rajesh Balamohan (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17550781#comment-17550781
 ] 

Rajesh Balamohan commented on HADOOP-18106:
---

This will be applicable mainly for direct byte buffers. Otherwise, it will be 
auto released at the time of GC.

> Handle memory fragmentation in S3 Vectored IO implementation.
> -
>
> Key: HADOOP-18106
> URL: https://issues.apache.org/jira/browse/HADOOP-18106
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Mukund Thakur
>Assignee: Mukund Thakur
>Priority: Major
>
> As we have implemented merging of ranges in the S3AInputStream implementation 
> of vectored IO api, it can lead to memory fragmentation. Let me explain by 
> example.
>  
> Suppose client requests for 3 ranges. 
> 0-500, 700-1000 and 1200-1500.
> Now because of merging, all the above ranges will get merged into one and we 
> will allocate a big byte buffer of 0-1500 size but return sliced byte buffers 
> for the desired ranges.
> Now once the client is done reading all the ranges, it will only be able to 
> free the memory for requested ranges and memory of the gaps will never be 
> released for eg here (500-700 and 1000-1200).
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-18115) EvaluatingStatisticsMap::entrySet may not need parallelstream

2022-02-03 Thread Rajesh Balamohan (Jira)
Rajesh Balamohan created HADOOP-18115:
-

 Summary: EvaluatingStatisticsMap::entrySet may not need 
parallelstream
 Key: HADOOP-18115
 URL: https://issues.apache.org/jira/browse/HADOOP-18115
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs
Reporter: Rajesh Balamohan
 Attachments: Screenshot 2022-02-04 at 11.10.39 AM.png

When large number of S3AInputStreams are opened, it ends up showing up in 
profile, as internally parallelstream makes use of fork and join. If it is not 
mandatory, it can be refactored to get rid of parallelstream. Here is the 
relevant profile output for ref.

  !Screenshot 2022-02-04 at 11.10.39 AM.png|width=632,height=429!

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-17531) DistCp: Reduce memory usage on copying huge directories

2021-02-17 Thread Rajesh Balamohan (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286328#comment-17286328
 ] 

Rajesh Balamohan commented on HADOOP-17531:
---

[~ayushtkn]: Was the test tried out with HDFS or local fs? Doing that in S3 
listing can give very different results. HADOOP-11827 tried to fix speed, 
compromising memory usage.

> DistCp: Reduce memory usage on copying huge directories
> ---
>
> Key: HADOOP-17531
> URL: https://issues.apache.org/jira/browse/HADOOP-17531
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Priority: Critical
> Attachments: MoveToStackIterator.patch, gc-NewD-512M-3.8ML.log
>
>
> Presently distCp, uses the producer-consumer kind of setup while building the 
> listing, the input queue and output queue are both unbounded, thus the 
> listStatus grows quite huge.
> Rel Code Part :
> https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java#L635
> This goes on bredth-first traversal kind of stuff(uses queue instead of 
> earlier stack), so if you have files at lower depth, it will like open up the 
> entire tree and the start processing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-17347) ABFS: Read optimizations

2021-01-03 Thread Rajesh Balamohan (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257885#comment-17257885
 ] 

Rajesh Balamohan commented on HADOOP-17347:
---


>> If the read is for the last 8 bytes, read the full file.

Can you plz share details on this? Does this mean that it is going to load 4 MB 
(or buffer size) worth of data during footer reads? If so, it would be 
expensive for short jobs that rely on footer reads.

> ABFS: Read optimizations
> 
>
> Key: HADOOP-17347
> URL: https://issues.apache.org/jira/browse/HADOOP-17347
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.0
>Reporter: Bilahari T H
>Assignee: Bilahari T H
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 12h 10m
>  Remaining Estimate: 0h
>
> Optimize read performance for the following scenarios
>  # Read small files completely
>  Files that are of size smaller than the read buffer size can be considered 
> as small files. In case of such files it would be better to read the full 
> file into the AbfsInputStream buffer.
>  # Read last block if the read is for footer
>  If the read is for the last 8 bytes, read the full file.
>  This will optimize reads for parquet files. [Parquet file 
> format|https://www.ellicium.com/parquet-file-format-structure/]
> Both these optimizations will be present under configs as follows
>  # fs.azure.read.smallfilescompletely
>  # fs.azure.read.optimizefooterread



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-17156) Clear readahead requests on stream close

2020-07-27 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HADOOP-17156:
--
Priority: Minor  (was: Major)

> Clear readahead requests on stream close
> 
>
> Key: HADOOP-17156
> URL: https://issues.apache.org/jira/browse/HADOOP-17156
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.3.0
>Reporter: Rajesh Balamohan
>Priority: Minor
>
> It would be good to close/clear pending read ahead requests on stream close().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17156) Clear readahead requests on stream close

2020-07-27 Thread Rajesh Balamohan (Jira)
Rajesh Balamohan created HADOOP-17156:
-

 Summary: Clear readahead requests on stream close
 Key: HADOOP-17156
 URL: https://issues.apache.org/jira/browse/HADOOP-17156
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Affects Versions: 3.3.0
Reporter: Rajesh Balamohan


It would be good to close/clear pending read ahead requests on stream close().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-17020) RawFileSystem could localize default block size to avoid sync bottleneck in config

2020-05-08 Thread Rajesh Balamohan (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17102474#comment-17102474
 ] 

Rajesh Balamohan edited comment on HADOOP-17020 at 5/8/20, 11:09 AM:
-

Sure. Thanks  [~ste...@apache.org] . Shared changes related to blocksize sync 
issue.

PR: [https://github.com/apache/hadoop/pull/2002]


was (Author: rajesh.balamohan):
Sure. Thanks  [~ste...@apache.org] .

PR: https://github.com/apache/hadoop/pull/2002

> RawFileSystem could localize default block size to avoid sync bottleneck in 
> config
> --
>
> Key: HADOOP-17020
> URL: https://issues.apache.org/jira/browse/HADOOP-17020
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: HADOOP-17020.1.patch, Screenshot 2020-04-29 at 5.24.53 
> PM.png, Screenshot 2020-05-01 at 7.12.06 AM.png
>
>
> RawLocalFileSystem could localize default block size to avoid sync bottleneck 
> with Configuration object. 
> https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/RawLocalFileSystem.java#L666



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-17020) RawFileSystem could localize default block size to avoid sync bottleneck in config

2020-05-08 Thread Rajesh Balamohan (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17102474#comment-17102474
 ] 

Rajesh Balamohan commented on HADOOP-17020:
---

Sure. Thanks  [~ste...@apache.org] .

PR: https://github.com/apache/hadoop/pull/2002

> RawFileSystem could localize default block size to avoid sync bottleneck in 
> config
> --
>
> Key: HADOOP-17020
> URL: https://issues.apache.org/jira/browse/HADOOP-17020
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: HADOOP-17020.1.patch, Screenshot 2020-04-29 at 5.24.53 
> PM.png, Screenshot 2020-05-01 at 7.12.06 AM.png
>
>
> RawLocalFileSystem could localize default block size to avoid sync bottleneck 
> with Configuration object. 
> https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/RawLocalFileSystem.java#L666



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-17020) RawFileSystem could localize default block size to avoid sync bottleneck in config

2020-05-01 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HADOOP-17020:
--
Status: Open  (was: Patch Available)

> RawFileSystem could localize default block size to avoid sync bottleneck in 
> config
> --
>
> Key: HADOOP-17020
> URL: https://issues.apache.org/jira/browse/HADOOP-17020
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: HADOOP-17020.1.patch, Screenshot 2020-04-29 at 5.24.53 
> PM.png, Screenshot 2020-05-01 at 7.12.06 AM.png
>
>
> RawLocalFileSystem could localize default block size to avoid sync bottleneck 
> with Configuration object. 
> https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/RawLocalFileSystem.java#L666



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-17020) RawFileSystem could localize default block size to avoid sync bottleneck in config

2020-05-01 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HADOOP-17020:
--
Status: Patch Available  (was: Open)

> RawFileSystem could localize default block size to avoid sync bottleneck in 
> config
> --
>
> Key: HADOOP-17020
> URL: https://issues.apache.org/jira/browse/HADOOP-17020
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: HADOOP-17020.1.patch, Screenshot 2020-04-29 at 5.24.53 
> PM.png, Screenshot 2020-05-01 at 7.12.06 AM.png
>
>
> RawLocalFileSystem could localize default block size to avoid sync bottleneck 
> with Configuration object. 
> https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/RawLocalFileSystem.java#L666



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-17020) RawFileSystem could localize default block size to avoid sync bottleneck in config

2020-04-30 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HADOOP-17020:
--
Attachment: HADOOP-17020.1.patch

> RawFileSystem could localize default block size to avoid sync bottleneck in 
> config
> --
>
> Key: HADOOP-17020
> URL: https://issues.apache.org/jira/browse/HADOOP-17020
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: HADOOP-17020.1.patch, Screenshot 2020-04-29 at 5.24.53 
> PM.png, Screenshot 2020-05-01 at 7.12.06 AM.png
>
>
> RawLocalFileSystem could localize default block size to avoid sync bottleneck 
> with Configuration object. 
> https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/RawLocalFileSystem.java#L666



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-17020) RawFileSystem could localize default block size to avoid sync bottleneck in config

2020-04-30 Thread Rajesh Balamohan (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17097127#comment-17097127
 ] 

Rajesh Balamohan commented on HADOOP-17020:
---

Also found similar kind of issue in mkdirs.  !Screenshot 2020-05-01 at 7.12.06 
AM.png! 

> RawFileSystem could localize default block size to avoid sync bottleneck in 
> config
> --
>
> Key: HADOOP-17020
> URL: https://issues.apache.org/jira/browse/HADOOP-17020
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: Screenshot 2020-04-29 at 5.24.53 PM.png, Screenshot 
> 2020-05-01 at 7.12.06 AM.png
>
>
> RawLocalFileSystem could localize default block size to avoid sync bottleneck 
> with Configuration object. 
> https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/RawLocalFileSystem.java#L666



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-17020) RawFileSystem could localize default block size to avoid sync bottleneck in config

2020-04-30 Thread Rajesh Balamohan (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17097127#comment-17097127
 ] 

Rajesh Balamohan edited comment on HADOOP-17020 at 5/1/20, 2:17 AM:


Also found similar kind of issue in mkdirs.  !Screenshot 2020-05-01 at 7.12.06 
AM.png|width=481,height=178!


was (Author: rajesh.balamohan):
Also found similar kind of issue in mkdirs.  !Screenshot 2020-05-01 at 7.12.06 
AM.png! 

> RawFileSystem could localize default block size to avoid sync bottleneck in 
> config
> --
>
> Key: HADOOP-17020
> URL: https://issues.apache.org/jira/browse/HADOOP-17020
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: Screenshot 2020-04-29 at 5.24.53 PM.png, Screenshot 
> 2020-05-01 at 7.12.06 AM.png
>
>
> RawLocalFileSystem could localize default block size to avoid sync bottleneck 
> with Configuration object. 
> https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/RawLocalFileSystem.java#L666



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-17020) RawFileSystem could localize default block size to avoid sync bottleneck in config

2020-04-30 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HADOOP-17020:
--
Attachment: Screenshot 2020-05-01 at 7.12.06 AM.png

> RawFileSystem could localize default block size to avoid sync bottleneck in 
> config
> --
>
> Key: HADOOP-17020
> URL: https://issues.apache.org/jira/browse/HADOOP-17020
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: Screenshot 2020-04-29 at 5.24.53 PM.png, Screenshot 
> 2020-05-01 at 7.12.06 AM.png
>
>
> RawLocalFileSystem could localize default block size to avoid sync bottleneck 
> with Configuration object. 
> https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/RawLocalFileSystem.java#L666



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-17020) RawFileSystem could localize default block size to avoid sync bottleneck in config

2020-04-29 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HADOOP-17020:
--
Description: 
RawLocalFileSystem could localize default block size to avoid sync bottleneck 
with Configuration object. 

https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/RawLocalFileSystem.java#L666



  was:
RawLocalFileSystem could localize default block size to avoid sync bottleneck 
with Configuration object. 

https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/RawLocalFileSystem.java#L666

!Screenshot 2020-04-29 at 5.24.53 PM.png


> RawFileSystem could localize default block size to avoid sync bottleneck in 
> config
> --
>
> Key: HADOOP-17020
> URL: https://issues.apache.org/jira/browse/HADOOP-17020
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: Screenshot 2020-04-29 at 5.24.53 PM.png
>
>
> RawLocalFileSystem could localize default block size to avoid sync bottleneck 
> with Configuration object. 
> https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/RawLocalFileSystem.java#L666



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17020) RawFileSystem could localize default block size to avoid sync bottleneck in config

2020-04-29 Thread Rajesh Balamohan (Jira)
Rajesh Balamohan created HADOOP-17020:
-

 Summary: RawFileSystem could localize default block size to avoid 
sync bottleneck in config
 Key: HADOOP-17020
 URL: https://issues.apache.org/jira/browse/HADOOP-17020
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs
Reporter: Rajesh Balamohan
 Attachments: Screenshot 2020-04-29 at 5.24.53 PM.png

RawLocalFileSystem could localize default block size to avoid sync bottleneck 
with Configuration object. 

https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/RawLocalFileSystem.java#L666

!Screenshot 2020-04-29 at 5.24.53 PM.png



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-16751) DurationInfo text parsing/formatting should be moved out of hotpath

2019-12-08 Thread Rajesh Balamohan (Jira)
Rajesh Balamohan created HADOOP-16751:
-

 Summary: DurationInfo text parsing/formatting should be moved out 
of hotpath
 Key: HADOOP-16751
 URL: https://issues.apache.org/jira/browse/HADOOP-16751
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Reporter: Rajesh Balamohan
 Attachments: Screenshot 2019-12-09 at 10.32.33 AM.png, 
image-2019-12-09-10-45-17-351.png

{color:#172b4d}It would be good to lazy evaluate the text on need basis.{color}

{color:#172b4d}[https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/DurationInfo.java#L68]{color}

{color:#172b4d}All pink color in the following diagram are from this 
codepath.{color}

 

{color:#172b4d}!Screenshot 2019-12-09 at 10.32.33 
AM.png|width=1008,height=920!{color}

 

{color:#172b4d}!image-2019-12-09-10-45-17-351.png|width=571,height=373!{color}

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16711) With S3Guard + authmode, consider skipping "verifyBuckets" check in S3A fs init()

2019-11-18 Thread Rajesh Balamohan (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-16711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976966#comment-16976966
 ] 

Rajesh Balamohan commented on HADOOP-16711:
---

ORC-570 is specific to lazy init on ORC side and after this fix was tried out. 
For S3, it would be nice to combine #1, #2. As of now, genuine callers are also 
impacted by verifyBuckets call.

This still does not cover the {{FileSystem::get()}} spinning up lots of FS 
inits with simultaneous thread accessing at same time (e.g LLAP).

> With S3Guard + authmode, consider skipping "verifyBuckets" check in S3A fs 
> init()
> -
>
> Key: HADOOP-16711
> URL: https://issues.apache.org/jira/browse/HADOOP-16711
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Rajesh Balamohan
>Priority: Minor
>  Labels: performance
> Attachments: HADOOP-16711.prelim.1.patch
>
>
> When authoritative mode is enabled with s3guard, it would be good to skip 
> verifyBuckets call during S3A filesystem init(). This would save call to S3 
> during init method.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16711) With S3Guard + authmode, consider skipping "verifyBuckets" check in S3A fs init()

2019-11-14 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HADOOP-16711:
--
Attachment: HADOOP-16711.prelim.1.patch

> With S3Guard + authmode, consider skipping "verifyBuckets" check in S3A fs 
> init()
> -
>
> Key: HADOOP-16711
> URL: https://issues.apache.org/jira/browse/HADOOP-16711
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Rajesh Balamohan
>Priority: Minor
>  Labels: performance
> Attachments: HADOOP-16711.prelim.1.patch
>
>
> When authoritative mode is enabled with s3guard, it would be good to skip 
> verifyBuckets call during S3A filesystem init(). This would save call to S3 
> during init method.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-16711) With S3Guard + authmode, consider skipping "verifyBuckets" check in S3A fs init()

2019-11-14 Thread Rajesh Balamohan (Jira)
Rajesh Balamohan created HADOOP-16711:
-

 Summary: With S3Guard + authmode, consider skipping 
"verifyBuckets" check in S3A fs init()
 Key: HADOOP-16711
 URL: https://issues.apache.org/jira/browse/HADOOP-16711
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Reporter: Rajesh Balamohan


When authoritative mode is enabled with s3guard, it would be good to skip 
verifyBuckets call during S3A filesystem init(). This would save call to S3 
during init method.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-16709) Consider having the ability to turn off TTL in S3Guard + Authoritative mode

2019-11-13 Thread Rajesh Balamohan (Jira)
Rajesh Balamohan created HADOOP-16709:
-

 Summary: Consider having the ability to turn off TTL in S3Guard + 
Authoritative mode
 Key: HADOOP-16709
 URL: https://issues.apache.org/jira/browse/HADOOP-16709
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs/s3
Reporter: Rajesh Balamohan


Authoritative mode has TTL which is set to 15 minutes by default. However, 
there are cases when we know for sure that the data wouldn't be changed/updated.

In certain cases, AppMaster ends up spending good amount of time in getSplits 
due to TTL expiry. It would be great to have an option to disable TTL (or 
specify as -1 when TTL shouldn't be checked).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-16648) HDFS Native Client does not build correctly

2019-10-11 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan resolved HADOOP-16648.
---
Resolution: Duplicate

Marking this as dup of HDFS-14900

> HDFS Native Client does not build correctly
> ---
>
> Key: HADOOP-16648
> URL: https://issues.apache.org/jira/browse/HADOOP-16648
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: native
>Affects Versions: 3.3.0
>Reporter: Rajesh Balamohan
>Priority: Blocker
>
> Builds are failing in PR with following exception in native client.  
> {noformat}
> [WARNING] make[2]: Leaving directory 
> '/home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1591/src/hadoop-hdfs-project/hadoop-hdfs-native-client/target'
> [WARNING] /opt/cmake/bin/cmake -E cmake_progress_report 
> /home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1591/src/hadoop-hdfs-project/hadoop-hdfs-native-client/target/CMakeFiles
>   2 3 4 5 6 7 8 9 10 11
> [WARNING] [ 28%] Built target common_obj
> [WARNING] make[2]: Leaving directory 
> '/home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1591/src/hadoop-hdfs-project/hadoop-hdfs-native-client/target'
> [WARNING] /opt/cmake/bin/cmake -E cmake_progress_report 
> /home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1591/src/hadoop-hdfs-project/hadoop-hdfs-native-client/target/CMakeFiles
>   31
> [WARNING] [ 28%] Built target gmock_main_obj
> [WARNING] make[1]: Leaving directory 
> '/home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1591/src/hadoop-hdfs-project/hadoop-hdfs-native-client/target'
> [WARNING] Makefile:127: recipe for target 'all' failed
> [WARNING] make[2]: *** No rule to make target 
> '/home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1591/src/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/proto/PROTOBUF_PROTOC_EXECUTABLE-NOTFOUND',
>  needed by 'main/native/libhdfspp/lib/proto/ClientNamenodeProtocol.hrpc.inl'. 
>  Stop.
> [WARNING] make[1]: *** 
> [main/native/libhdfspp/lib/proto/CMakeFiles/proto_obj.dir/all] Error 2
> [WARNING] make[1]: *** Waiting for unfinished jobs
> [WARNING] make: *** [all] Error 2
> [INFO] 
> 
> [INFO] Reactor Summary:
> [INFO] 
> [INFO] Apache Hadoop Main . SUCCESS [  0.301 
> s]
> [INFO] Apache Hadoop Build Tools .. SUCCESS [  1.348 
> s]
> [INFO] Apache Hadoop Project POM .. SUCCESS [  0.501 
> s]
> [INFO] Apache Hadoop Annotations .. SUCCESS [  1.391 
> s]
> [INFO] Apache Hadoop Project Dist POM . SUCCESS [  0.115 
> s]
> [INFO] Apache Hadoop Assemblies ... SUCCESS [  0.168 
> s]
> [INFO] Apache Hadoop Maven Plugins  SUCCESS [  4.490 
> s]
> [INFO] Apache Hadoop MiniKDC .. SUCCESS [  2.773 
> s]
> [INFO] Apache Hadoop Auth . SUCCESS [  7.922 
> s]
> [INFO] Apache Hadoop Auth Examples  SUCCESS [  1.381 
> s]
> [INFO] Apache Hadoop Common ... SUCCESS [ 34.562 
> s]
> [INFO] Apache Hadoop NFS .. SUCCESS [  5.583 
> s]
> [INFO] Apache Hadoop KMS .. SUCCESS [  5.931 
> s]
> [INFO] Apache Hadoop Registry . SUCCESS [  5.816 
> s]
> [INFO] Apache Hadoop Common Project ... SUCCESS [  0.056 
> s]
> [INFO] Apache Hadoop HDFS Client .. SUCCESS [ 27.104 
> s]
> [INFO] Apache Hadoop HDFS . SUCCESS [ 42.065 
> s]
> [INFO] Apache Hadoop HDFS Native Client ... FAILURE [ 19.349 
> s]
> {noformat}
> Creating this ticket, as couple of pull requests had the same issue.
> e.g 
> https://builds.apache.org/job/hadoop-multibranch/job/PR-1591/2/artifact/out/patch-compile-root.txt
> https://builds.apache.org/job/hadoop-multibranch/job/PR-1614/1/artifact/out/patch-compile-root.txt



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16648) HDFS Native Client does not build correctly

2019-10-11 Thread Rajesh Balamohan (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-16648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949175#comment-16949175
 ] 

Rajesh Balamohan commented on HADOOP-16648:
---

Closing this ticket as HDFS-14900 fixes the issue. Thanks [~ayushtkn], 
[~ste...@apache.org]

> HDFS Native Client does not build correctly
> ---
>
> Key: HADOOP-16648
> URL: https://issues.apache.org/jira/browse/HADOOP-16648
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: native
>Affects Versions: 3.3.0
>Reporter: Rajesh Balamohan
>Priority: Blocker
>
> Builds are failing in PR with following exception in native client.  
> {noformat}
> [WARNING] make[2]: Leaving directory 
> '/home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1591/src/hadoop-hdfs-project/hadoop-hdfs-native-client/target'
> [WARNING] /opt/cmake/bin/cmake -E cmake_progress_report 
> /home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1591/src/hadoop-hdfs-project/hadoop-hdfs-native-client/target/CMakeFiles
>   2 3 4 5 6 7 8 9 10 11
> [WARNING] [ 28%] Built target common_obj
> [WARNING] make[2]: Leaving directory 
> '/home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1591/src/hadoop-hdfs-project/hadoop-hdfs-native-client/target'
> [WARNING] /opt/cmake/bin/cmake -E cmake_progress_report 
> /home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1591/src/hadoop-hdfs-project/hadoop-hdfs-native-client/target/CMakeFiles
>   31
> [WARNING] [ 28%] Built target gmock_main_obj
> [WARNING] make[1]: Leaving directory 
> '/home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1591/src/hadoop-hdfs-project/hadoop-hdfs-native-client/target'
> [WARNING] Makefile:127: recipe for target 'all' failed
> [WARNING] make[2]: *** No rule to make target 
> '/home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1591/src/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/proto/PROTOBUF_PROTOC_EXECUTABLE-NOTFOUND',
>  needed by 'main/native/libhdfspp/lib/proto/ClientNamenodeProtocol.hrpc.inl'. 
>  Stop.
> [WARNING] make[1]: *** 
> [main/native/libhdfspp/lib/proto/CMakeFiles/proto_obj.dir/all] Error 2
> [WARNING] make[1]: *** Waiting for unfinished jobs
> [WARNING] make: *** [all] Error 2
> [INFO] 
> 
> [INFO] Reactor Summary:
> [INFO] 
> [INFO] Apache Hadoop Main . SUCCESS [  0.301 
> s]
> [INFO] Apache Hadoop Build Tools .. SUCCESS [  1.348 
> s]
> [INFO] Apache Hadoop Project POM .. SUCCESS [  0.501 
> s]
> [INFO] Apache Hadoop Annotations .. SUCCESS [  1.391 
> s]
> [INFO] Apache Hadoop Project Dist POM . SUCCESS [  0.115 
> s]
> [INFO] Apache Hadoop Assemblies ... SUCCESS [  0.168 
> s]
> [INFO] Apache Hadoop Maven Plugins  SUCCESS [  4.490 
> s]
> [INFO] Apache Hadoop MiniKDC .. SUCCESS [  2.773 
> s]
> [INFO] Apache Hadoop Auth . SUCCESS [  7.922 
> s]
> [INFO] Apache Hadoop Auth Examples  SUCCESS [  1.381 
> s]
> [INFO] Apache Hadoop Common ... SUCCESS [ 34.562 
> s]
> [INFO] Apache Hadoop NFS .. SUCCESS [  5.583 
> s]
> [INFO] Apache Hadoop KMS .. SUCCESS [  5.931 
> s]
> [INFO] Apache Hadoop Registry . SUCCESS [  5.816 
> s]
> [INFO] Apache Hadoop Common Project ... SUCCESS [  0.056 
> s]
> [INFO] Apache Hadoop HDFS Client .. SUCCESS [ 27.104 
> s]
> [INFO] Apache Hadoop HDFS . SUCCESS [ 42.065 
> s]
> [INFO] Apache Hadoop HDFS Native Client ... FAILURE [ 19.349 
> s]
> {noformat}
> Creating this ticket, as couple of pull requests had the same issue.
> e.g 
> https://builds.apache.org/job/hadoop-multibranch/job/PR-1591/2/artifact/out/patch-compile-root.txt
> https://builds.apache.org/job/hadoop-multibranch/job/PR-1614/1/artifact/out/patch-compile-root.txt



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16648) HDFS Native Client does not build correctly

2019-10-10 Thread Rajesh Balamohan (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-16648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16948984#comment-16948984
 ] 

Rajesh Balamohan commented on HADOOP-16648:
---

Sure, thanks [~ayushtkn], [~ste...@apache.org]. I will check if the HDFS patch 
solves the issue.

> HDFS Native Client does not build correctly
> ---
>
> Key: HADOOP-16648
> URL: https://issues.apache.org/jira/browse/HADOOP-16648
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: native
>Affects Versions: 3.3.0
>Reporter: Rajesh Balamohan
>Priority: Blocker
>
> Builds are failing in PR with following exception in native client.  
> {noformat}
> [WARNING] make[2]: Leaving directory 
> '/home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1591/src/hadoop-hdfs-project/hadoop-hdfs-native-client/target'
> [WARNING] /opt/cmake/bin/cmake -E cmake_progress_report 
> /home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1591/src/hadoop-hdfs-project/hadoop-hdfs-native-client/target/CMakeFiles
>   2 3 4 5 6 7 8 9 10 11
> [WARNING] [ 28%] Built target common_obj
> [WARNING] make[2]: Leaving directory 
> '/home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1591/src/hadoop-hdfs-project/hadoop-hdfs-native-client/target'
> [WARNING] /opt/cmake/bin/cmake -E cmake_progress_report 
> /home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1591/src/hadoop-hdfs-project/hadoop-hdfs-native-client/target/CMakeFiles
>   31
> [WARNING] [ 28%] Built target gmock_main_obj
> [WARNING] make[1]: Leaving directory 
> '/home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1591/src/hadoop-hdfs-project/hadoop-hdfs-native-client/target'
> [WARNING] Makefile:127: recipe for target 'all' failed
> [WARNING] make[2]: *** No rule to make target 
> '/home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1591/src/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/proto/PROTOBUF_PROTOC_EXECUTABLE-NOTFOUND',
>  needed by 'main/native/libhdfspp/lib/proto/ClientNamenodeProtocol.hrpc.inl'. 
>  Stop.
> [WARNING] make[1]: *** 
> [main/native/libhdfspp/lib/proto/CMakeFiles/proto_obj.dir/all] Error 2
> [WARNING] make[1]: *** Waiting for unfinished jobs
> [WARNING] make: *** [all] Error 2
> [INFO] 
> 
> [INFO] Reactor Summary:
> [INFO] 
> [INFO] Apache Hadoop Main . SUCCESS [  0.301 
> s]
> [INFO] Apache Hadoop Build Tools .. SUCCESS [  1.348 
> s]
> [INFO] Apache Hadoop Project POM .. SUCCESS [  0.501 
> s]
> [INFO] Apache Hadoop Annotations .. SUCCESS [  1.391 
> s]
> [INFO] Apache Hadoop Project Dist POM . SUCCESS [  0.115 
> s]
> [INFO] Apache Hadoop Assemblies ... SUCCESS [  0.168 
> s]
> [INFO] Apache Hadoop Maven Plugins  SUCCESS [  4.490 
> s]
> [INFO] Apache Hadoop MiniKDC .. SUCCESS [  2.773 
> s]
> [INFO] Apache Hadoop Auth . SUCCESS [  7.922 
> s]
> [INFO] Apache Hadoop Auth Examples  SUCCESS [  1.381 
> s]
> [INFO] Apache Hadoop Common ... SUCCESS [ 34.562 
> s]
> [INFO] Apache Hadoop NFS .. SUCCESS [  5.583 
> s]
> [INFO] Apache Hadoop KMS .. SUCCESS [  5.931 
> s]
> [INFO] Apache Hadoop Registry . SUCCESS [  5.816 
> s]
> [INFO] Apache Hadoop Common Project ... SUCCESS [  0.056 
> s]
> [INFO] Apache Hadoop HDFS Client .. SUCCESS [ 27.104 
> s]
> [INFO] Apache Hadoop HDFS . SUCCESS [ 42.065 
> s]
> [INFO] Apache Hadoop HDFS Native Client ... FAILURE [ 19.349 
> s]
> {noformat}
> Creating this ticket, as couple of pull requests had the same issue.
> e.g 
> https://builds.apache.org/job/hadoop-multibranch/job/PR-1591/2/artifact/out/patch-compile-root.txt
> https://builds.apache.org/job/hadoop-multibranch/job/PR-1614/1/artifact/out/patch-compile-root.txt



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16648) HDFS Native Client does not build correctly

2019-10-10 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HADOOP-16648:
--
Affects Version/s: 3.3.0

> HDFS Native Client does not build correctly
> ---
>
> Key: HADOOP-16648
> URL: https://issues.apache.org/jira/browse/HADOOP-16648
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: native
>Affects Versions: 3.3.0
>Reporter: Rajesh Balamohan
>Priority: Blocker
>
> Builds are failing in PR with following exception in native client.  
> {noformat}
> [WARNING] make[2]: Leaving directory 
> '/home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1591/src/hadoop-hdfs-project/hadoop-hdfs-native-client/target'
> [WARNING] /opt/cmake/bin/cmake -E cmake_progress_report 
> /home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1591/src/hadoop-hdfs-project/hadoop-hdfs-native-client/target/CMakeFiles
>   2 3 4 5 6 7 8 9 10 11
> [WARNING] [ 28%] Built target common_obj
> [WARNING] make[2]: Leaving directory 
> '/home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1591/src/hadoop-hdfs-project/hadoop-hdfs-native-client/target'
> [WARNING] /opt/cmake/bin/cmake -E cmake_progress_report 
> /home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1591/src/hadoop-hdfs-project/hadoop-hdfs-native-client/target/CMakeFiles
>   31
> [WARNING] [ 28%] Built target gmock_main_obj
> [WARNING] make[1]: Leaving directory 
> '/home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1591/src/hadoop-hdfs-project/hadoop-hdfs-native-client/target'
> [WARNING] Makefile:127: recipe for target 'all' failed
> [WARNING] make[2]: *** No rule to make target 
> '/home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1591/src/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/proto/PROTOBUF_PROTOC_EXECUTABLE-NOTFOUND',
>  needed by 'main/native/libhdfspp/lib/proto/ClientNamenodeProtocol.hrpc.inl'. 
>  Stop.
> [WARNING] make[1]: *** 
> [main/native/libhdfspp/lib/proto/CMakeFiles/proto_obj.dir/all] Error 2
> [WARNING] make[1]: *** Waiting for unfinished jobs
> [WARNING] make: *** [all] Error 2
> [INFO] 
> 
> [INFO] Reactor Summary:
> [INFO] 
> [INFO] Apache Hadoop Main . SUCCESS [  0.301 
> s]
> [INFO] Apache Hadoop Build Tools .. SUCCESS [  1.348 
> s]
> [INFO] Apache Hadoop Project POM .. SUCCESS [  0.501 
> s]
> [INFO] Apache Hadoop Annotations .. SUCCESS [  1.391 
> s]
> [INFO] Apache Hadoop Project Dist POM . SUCCESS [  0.115 
> s]
> [INFO] Apache Hadoop Assemblies ... SUCCESS [  0.168 
> s]
> [INFO] Apache Hadoop Maven Plugins  SUCCESS [  4.490 
> s]
> [INFO] Apache Hadoop MiniKDC .. SUCCESS [  2.773 
> s]
> [INFO] Apache Hadoop Auth . SUCCESS [  7.922 
> s]
> [INFO] Apache Hadoop Auth Examples  SUCCESS [  1.381 
> s]
> [INFO] Apache Hadoop Common ... SUCCESS [ 34.562 
> s]
> [INFO] Apache Hadoop NFS .. SUCCESS [  5.583 
> s]
> [INFO] Apache Hadoop KMS .. SUCCESS [  5.931 
> s]
> [INFO] Apache Hadoop Registry . SUCCESS [  5.816 
> s]
> [INFO] Apache Hadoop Common Project ... SUCCESS [  0.056 
> s]
> [INFO] Apache Hadoop HDFS Client .. SUCCESS [ 27.104 
> s]
> [INFO] Apache Hadoop HDFS . SUCCESS [ 42.065 
> s]
> [INFO] Apache Hadoop HDFS Native Client ... FAILURE [ 19.349 
> s]
> {noformat}
> Creating this ticket, as couple of pull requests had the same issue.
> e.g 
> https://builds.apache.org/job/hadoop-multibranch/job/PR-1591/2/artifact/out/patch-compile-root.txt
> https://builds.apache.org/job/hadoop-multibranch/job/PR-1614/1/artifact/out/patch-compile-root.txt



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-16648) HDFS Native Client does not build correctly

2019-10-10 Thread Rajesh Balamohan (Jira)
Rajesh Balamohan created HADOOP-16648:
-

 Summary: HDFS Native Client does not build correctly
 Key: HADOOP-16648
 URL: https://issues.apache.org/jira/browse/HADOOP-16648
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: native
Reporter: Rajesh Balamohan


Builds are failing in PR with following exception in native client.  

{noformat}
[WARNING] make[2]: Leaving directory 
'/home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1591/src/hadoop-hdfs-project/hadoop-hdfs-native-client/target'
[WARNING] /opt/cmake/bin/cmake -E cmake_progress_report 
/home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1591/src/hadoop-hdfs-project/hadoop-hdfs-native-client/target/CMakeFiles
  2 3 4 5 6 7 8 9 10 11
[WARNING] [ 28%] Built target common_obj
[WARNING] make[2]: Leaving directory 
'/home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1591/src/hadoop-hdfs-project/hadoop-hdfs-native-client/target'
[WARNING] /opt/cmake/bin/cmake -E cmake_progress_report 
/home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1591/src/hadoop-hdfs-project/hadoop-hdfs-native-client/target/CMakeFiles
  31
[WARNING] [ 28%] Built target gmock_main_obj
[WARNING] make[1]: Leaving directory 
'/home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1591/src/hadoop-hdfs-project/hadoop-hdfs-native-client/target'
[WARNING] Makefile:127: recipe for target 'all' failed
[WARNING] make[2]: *** No rule to make target 
'/home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1591/src/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/proto/PROTOBUF_PROTOC_EXECUTABLE-NOTFOUND',
 needed by 'main/native/libhdfspp/lib/proto/ClientNamenodeProtocol.hrpc.inl'.  
Stop.
[WARNING] make[1]: *** 
[main/native/libhdfspp/lib/proto/CMakeFiles/proto_obj.dir/all] Error 2
[WARNING] make[1]: *** Waiting for unfinished jobs
[WARNING] make: *** [all] Error 2
[INFO] 
[INFO] Reactor Summary:
[INFO] 
[INFO] Apache Hadoop Main . SUCCESS [  0.301 s]
[INFO] Apache Hadoop Build Tools .. SUCCESS [  1.348 s]
[INFO] Apache Hadoop Project POM .. SUCCESS [  0.501 s]
[INFO] Apache Hadoop Annotations .. SUCCESS [  1.391 s]
[INFO] Apache Hadoop Project Dist POM . SUCCESS [  0.115 s]
[INFO] Apache Hadoop Assemblies ... SUCCESS [  0.168 s]
[INFO] Apache Hadoop Maven Plugins  SUCCESS [  4.490 s]
[INFO] Apache Hadoop MiniKDC .. SUCCESS [  2.773 s]
[INFO] Apache Hadoop Auth . SUCCESS [  7.922 s]
[INFO] Apache Hadoop Auth Examples  SUCCESS [  1.381 s]
[INFO] Apache Hadoop Common ... SUCCESS [ 34.562 s]
[INFO] Apache Hadoop NFS .. SUCCESS [  5.583 s]
[INFO] Apache Hadoop KMS .. SUCCESS [  5.931 s]
[INFO] Apache Hadoop Registry . SUCCESS [  5.816 s]
[INFO] Apache Hadoop Common Project ... SUCCESS [  0.056 s]
[INFO] Apache Hadoop HDFS Client .. SUCCESS [ 27.104 s]
[INFO] Apache Hadoop HDFS . SUCCESS [ 42.065 s]
[INFO] Apache Hadoop HDFS Native Client ... FAILURE [ 19.349 s]
{noformat}

Creating this ticket, as couple of pull requests had the same issue.

e.g 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1591/2/artifact/out/patch-compile-root.txt
https://builds.apache.org/job/hadoop-multibranch/job/PR-1614/1/artifact/out/patch-compile-root.txt



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16604) Provide copy functionality for cloud native applications

2019-10-04 Thread Rajesh Balamohan (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-16604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16944262#comment-16944262
 ] 

Rajesh Balamohan commented on HADOOP-16604:
---

Thanks for sharing the details [~ste...@apache.org] . Initial step is to enable 
copyFile(URI, URI) for S3AFileSystem. I have created a subtask for this. This 
is for copying single file to destination and higher level apps can make use of 
the API to parallelize copy. In next iterations, we can provide parallel copy 
option within FS itself.

> Provide copy functionality for cloud native applications
> 
>
> Key: HADOOP-16604
> URL: https://issues.apache.org/jira/browse/HADOOP-16604
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs, fs/azure, fs/s3
>Affects Versions: 3.2.1
>Reporter: Rajesh Balamohan
>Priority: Major
>
> Lot of cloud native systems provide out of the box and optimized copy 
> functionality within their system. They avoid bringing data over to the 
> client and write back to the destination.
> It would be good to have a cloud native interface, which can be implemented 
> by the cloud connectors to provide (e.g {{copy(URI srcFile, URI destFile)}})
> This would be helpful for applications which make use of these connectors and 
> enhance copy performance within cloud.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16629) support copyFile in s3afilesystem

2019-10-04 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HADOOP-16629:
--
Affects Version/s: 3.2.1

> support copyFile in s3afilesystem
> -
>
> Key: HADOOP-16629
> URL: https://issues.apache.org/jira/browse/HADOOP-16629
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.1
>Reporter: Rajesh Balamohan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-16629) support copyFile in s3afilesystem

2019-10-04 Thread Rajesh Balamohan (Jira)
Rajesh Balamohan created HADOOP-16629:
-

 Summary: support copyFile in s3afilesystem
 Key: HADOOP-16629
 URL: https://issues.apache.org/jira/browse/HADOOP-16629
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Rajesh Balamohan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16629) support copyFile in s3afilesystem

2019-10-04 Thread Rajesh Balamohan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HADOOP-16629:
--
Component/s: fs/s3

> support copyFile in s3afilesystem
> -
>
> Key: HADOOP-16629
> URL: https://issues.apache.org/jira/browse/HADOOP-16629
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Rajesh Balamohan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-16604) Provide copy functionality for cloud native applications

2019-09-25 Thread Rajesh Balamohan (Jira)
Rajesh Balamohan created HADOOP-16604:
-

 Summary: Provide copy functionality for cloud native applications
 Key: HADOOP-16604
 URL: https://issues.apache.org/jira/browse/HADOOP-16604
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs
Reporter: Rajesh Balamohan


Lot of cloud native systems provide out of the box and optimized copy 
functionality within their system. They avoid bringing data over to the client 
and write back to the destination.

It would be good to have a cloud native interface, which can be implemented by 
the cloud connectors to provide (e.g {{copy(URI srcFile, URI destFile)}})

This would be helpful for applications which make use of these connectors and 
enhance copy performance within cloud.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15042) Azure::PageBlobInputStream::skip can return -ve value when numberOfPagesRemaining is 0

2017-11-15 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HADOOP-15042:
--
Assignee: Rajesh Balamohan
  Status: Patch Available  (was: Open)

> Azure::PageBlobInputStream::skip can return -ve value when 
> numberOfPagesRemaining is 0
> --
>
> Key: HADOOP-15042
> URL: https://issues.apache.org/jira/browse/HADOOP-15042
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/azure
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HADOOP-15042.001.patch
>
>
> {{PageBlobInputStream::skip-->skipImpl}} returns negative values when 
> {{numberOfPagesRemaining=0}}. This can cause wrong position to be set in 
> NativeAzureFileSystem::seek() and can lead to errors.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-15042) Azure::PageBlobInputStream::skip can return -ve value when numberOfPagesRemaining is 0

2017-11-15 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16254718#comment-16254718
 ] 

Rajesh Balamohan edited comment on HADOOP-15042 at 11/16/17 4:04 AM:
-

{noformat}
from hadoop-azure dir.

mvn test -Dtest=Test\*,ITest\*

Results :

Tests run: 843, Failures: 0, Errors: 0, Skipped: 117
{noformat}

WASB Region for the test: East Asia


was (Author: rajesh.balamohan):

{noformat}
mvn test -Dtest=Test\*,ITest\*

Results :

Tests run: 843, Failures: 0, Errors: 0, Skipped: 117
{noformat}

WASB Region for the test: East Asia

> Azure::PageBlobInputStream::skip can return -ve value when 
> numberOfPagesRemaining is 0
> --
>
> Key: HADOOP-15042
> URL: https://issues.apache.org/jira/browse/HADOOP-15042
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/azure
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: HADOOP-15042.001.patch
>
>
> {{PageBlobInputStream::skip-->skipImpl}} returns negative values when 
> {{numberOfPagesRemaining=0}}. This can cause wrong position to be set in 
> NativeAzureFileSystem::seek() and can lead to errors.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15042) Azure::PageBlobInputStream::skip can return -ve value when numberOfPagesRemaining is 0

2017-11-15 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HADOOP-15042:
--
Attachment: HADOOP-15042.001.patch


{noformat}
mvn test -Dtest=Test\*,ITest\*

Results :

Tests run: 843, Failures: 0, Errors: 0, Skipped: 117
{noformat}

WASB Region for the test: East Asia

> Azure::PageBlobInputStream::skip can return -ve value when 
> numberOfPagesRemaining is 0
> --
>
> Key: HADOOP-15042
> URL: https://issues.apache.org/jira/browse/HADOOP-15042
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/azure
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: HADOOP-15042.001.patch
>
>
> {{PageBlobInputStream::skip-->skipImpl}} returns negative values when 
> {{numberOfPagesRemaining=0}}. This can cause wrong position to be set in 
> NativeAzureFileSystem::seek() and can lead to errors.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15042) Azure::PageBlobInputStream::skip can return -ve value when numberOfPagesRemaining is 0

2017-11-15 Thread Rajesh Balamohan (JIRA)
Rajesh Balamohan created HADOOP-15042:
-

 Summary: Azure::PageBlobInputStream::skip can return -ve value 
when numberOfPagesRemaining is 0
 Key: HADOOP-15042
 URL: https://issues.apache.org/jira/browse/HADOOP-15042
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/azure
Reporter: Rajesh Balamohan
Priority: Minor


{{PageBlobInputStream::skip-->skipImpl}} returns negative values when 
{{numberOfPagesRemaining=0}}. This can cause wrong position to be set in 
NativeAzureFileSystem::seek() and can lead to errors.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-14988) WASB: Expose WASB status metrics as counters in Hadoop

2017-10-26 Thread Rajesh Balamohan (JIRA)
Rajesh Balamohan created HADOOP-14988:
-

 Summary: WASB: Expose WASB status metrics as counters in Hadoop
 Key: HADOOP-14988
 URL: https://issues.apache.org/jira/browse/HADOOP-14988
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/azure
Reporter: Rajesh Balamohan
Priority: Minor


It would be good to expose WASB status metrics (e.g 503) as Hadoop counters. 

Here is an example from a spark job, where it ends up spending large amount of 
time in retries. Adding hadoop counters would help in analyzing and tuning long 
running tasks.

{noformat}
2017-10-23 23:07:20,876 DEBUG [Executor task launch worker for task 2463] 
azure.SelfThrottlingIntercept:  SelfThrottlingIntercept:: SendingRequest:   
threadId=99, requestType=read , isFirstRequest=false, sleepDuration=0
2017-10-23 23:07:20,877 DEBUG [Executor task launch worker for task 2463] 
azure.SelfThrottlingIntercept: SelfThrottlingIntercept:: ResponseReceived: 
threadId=99, Status=503, Elapsed(ms)=1, ETAG=null, contentLength=198, 
requestMethod=GET
2017-10-23 23:07:21,877 DEBUG [Executor task launch worker for task 2463] 
azure.SelfThrottlingIntercept:  SelfThrottlingIntercept:: SendingRequest:   
threadId=99, requestType=read , isFirstRequest=false, sleepDuration=0
2017-10-23 23:07:21,879 DEBUG [Executor task launch worker for task 2463] 
azure.SelfThrottlingIntercept: SelfThrottlingIntercept:: ResponseReceived: 
threadId=99, Status=503, Elapsed(ms)=2, ETAG=null, contentLength=198, 
requestMethod=GET
2017-10-23 23:07:24,070 DEBUG [Executor task launch worker for task 2463] 
azure.SelfThrottlingIntercept:  SelfThrottlingIntercept:: SendingRequest:   
threadId=99, requestType=read , isFirstRequest=false, sleepDuration=0
2017-10-23 23:07:24,073 DEBUG [Executor task launch worker for task 2463] 
azure.SelfThrottlingIntercept: q:: ResponseReceived: threadId=99, Status=503, 
Elapsed(ms)=3, ETAG=null, contentLength=198, requestMethod=GET
2017-10-23 23:07:27,917 DEBUG [Executor task launch worker for task 2463] 
azure.SelfThrottlingIntercept:  SelfThrottlingIntercept:: SendingRequest:   
threadId=99, requestType=read , isFirstRequest=false, sleepDuration=0
2017-10-23 23:07:27,920 DEBUG [Executor task launch worker for task 2463] 
azure.SelfThrottlingIntercept: SelfThrottlingIntercept:: ResponseReceived: 
threadId=99, Status=503, Elapsed(ms)=2, ETAG=null, contentLength=198, 
requestMethod=GET
2017-10-23 23:07:36,879 DEBUG [Executor task launch worker for task 2463] 
azure.SelfThrottlingIntercept:  SelfThrottlingIntercept:: SendingRequest:   
threadId=99, requestType=read , isFirstRequest=false, sleepDuration=0
2017-10-23 23:07:36,881 DEBUG [Executor task launch worker for task 2463] 
azure.SelfThrottlingIntercept: SelfThrottlingIntercept:: ResponseReceived: 
threadId=99, Status=503, Elapsed(ms)=1, ETAG=null, contentLength=198, 
requestMethod=GET
2017-10-23 23:07:54,786 DEBUG [Executor task launch worker for task 2463] 
azure.SelfThrottlingIntercept:  SelfThrottlingIntercept:: SendingRequest:   
threadId=99, requestType=read , isFirstRequest=false, sleepDuration=0
2017-10-23 23:07:54,789 DEBUG [Executor task launch worker for task 2463] 
azure.SelfThrottlingIntercept: SelfThrottlingIntercept:: ResponseReceived: 
threadId=99, Status=503, Elapsed(ms)=3, ETAG=null, contentLength=198, 
requestMethod=GET
2017-10-23 23:08:24,790 DEBUG [Executor task launch worker for task 2463] 
azure.SelfThrottlingIntercept:  SelfThrottlingIntercept:: SendingRequest:   
threadId=99, requestType=read , isFirstRequest=false, sleepDuration=0
2017-10-23 23:08:24,794 DEBUG [Executor task launch worker for task 2463] 
azure.SelfThrottlingIntercept: SelfThrottlingIntercept:: ResponseReceived: 
threadId=99, Status=503, Elapsed(ms)=4, ETAG=null, contentLength=198, 
requestMethod=GET
2017-10-23 23:08:54,794 DEBUG [Executor task launch worker for task 2463] 
azure.SelfThrottlingIntercept:  SelfThrottlingIntercept:: SendingRequest:   
threadId=99, requestType=read , isFirstRequest=false, sleepDuration=0
{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14965) s3a input stream "normal" fadvise mode to be adaptive

2017-10-22 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16214478#comment-16214478
 ] 

Rajesh Balamohan commented on HADOOP-14965:
---

It would be good to tune itself based on the seek access patterns.

> s3a input stream "normal" fadvise mode to be adaptive
> -
>
> Key: HADOOP-14965
> URL: https://issues.apache.org/jira/browse/HADOOP-14965
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Steve Loughran
>
> HADOOP-14535 added seek optimisation to wasb, but rather than require the 
> caller to declare sequential vs random, it works out for itself.
> # defaults to sequential, lazy seek
> # if the caller ever seeks backwards, switches to random IO.
> This means that on the use pattern of columnar stores: of go to end of file, 
> read summary, then go to columns and work forwards, will switch to random IO 
> after that first seek back (cost: one aborted HTTP connection)/.
> Where this should benefit the most is in downstream apps where you are 
> working with different data sources in the same object store/running of the 
> same app config, but have different read patterns. I'm seeing exactly this in 
> some of my spark tests, where it's near impossible to set things up so that 
> .gz files are read sequentially, but ORC data is read in random IO
> I propose the "normal" fadvise => adaptive, sequential==sequential always, 
> random => random from the outset.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14680) Azure: IndexOutOfBoundsException in BlockBlobInputStream

2017-07-25 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16100923#comment-16100923
 ] 

Rajesh Balamohan commented on HADOOP-14680:
---

Thanks for the patch [~tmarquardt]. Patch lgtm (non-binding).  I tried out the 
patch on multi-node cluster and it works fine.

> Azure: IndexOutOfBoundsException in BlockBlobInputStream
> 
>
> Key: HADOOP-14680
> URL: https://issues.apache.org/jira/browse/HADOOP-14680
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/azure
>Reporter: Rajesh Balamohan
>Assignee: Thomas Marquardt
>Priority: Minor
> Attachments: HADOOP-14680-001.patch
>
>
> https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azure/BlockBlobInputStream.java#L361
> On certain conditions, BlockBlobInputStream can throw 
> IndexOutOfBoundsException. Following is an example
> {{length:297898, offset:4194304, buf.len:4492202, writePos:4194304}} : 
> In this case, {{MemoryOutputStream::capacity()}} would end up returning 
> negative value and can cause {{IndexOutOfBoundsException}}
> It should be {{return buffer.length - offset;}} to determine current capacity.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-14680) Azure: IndexOutOfBoundsException in BlockBlobInputStream

2017-07-23 Thread Rajesh Balamohan (JIRA)
Rajesh Balamohan created HADOOP-14680:
-

 Summary: Azure: IndexOutOfBoundsException in BlockBlobInputStream
 Key: HADOOP-14680
 URL: https://issues.apache.org/jira/browse/HADOOP-14680
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/azure
Reporter: Rajesh Balamohan
Priority: Minor


https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azure/BlockBlobInputStream.java#L361

On certain conditions, BlockBlobInputStream can throw 
IndexOutOfBoundsException. Following is an example

{{length:297898, offset:4194304, buf.len:4492202, writePos:4194304}} : 
In this case, {{MemoryOutputStream::capacity()}} would end up returning 
negative value and can cause {{IndexOutOfBoundsException}}

It should be {{return buffer.length - offset;}} to determine current capacity.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Assigned] (HADOOP-11572) s3a delete() operation fails during a concurrent delete of child entries

2017-07-18 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan reassigned HADOOP-11572:
-

Assignee: Steve Loughran  (was: Rajesh Balamohan)

> s3a delete() operation fails during a concurrent delete of child entries
> 
>
> Key: HADOOP-11572
> URL: https://issues.apache.org/jira/browse/HADOOP-11572
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.6.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Fix For: 2.9.0, 3.0.0-alpha4
>
> Attachments: HADOOP-11572-001.patch, HADOOP-11572-branch-2-002.patch, 
> HADOOP-11572-branch-2-003.patch
>
>
> Reviewing the code, s3a has the problem raised in HADOOP-6688: deletion of a 
> child entry during a recursive directory delete is propagated as an 
> exception, rather than ignored as a detail which idempotent operations should 
> just ignore.
> the exception should be caught and, if a file not found problem, logged 
> rather than propagated



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Assigned] (HADOOP-11572) s3a delete() operation fails during a concurrent delete of child entries

2017-07-18 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan reassigned HADOOP-11572:
-

Assignee: Rajesh Balamohan  (was: Steve Loughran)

> s3a delete() operation fails during a concurrent delete of child entries
> 
>
> Key: HADOOP-11572
> URL: https://issues.apache.org/jira/browse/HADOOP-11572
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.6.0
>Reporter: Steve Loughran
>Assignee: Rajesh Balamohan
> Fix For: 2.9.0, 3.0.0-alpha4
>
> Attachments: HADOOP-11572-001.patch, HADOOP-11572-branch-2-002.patch, 
> HADOOP-11572-branch-2-003.patch
>
>
> Reviewing the code, s3a has the problem raised in HADOOP-6688: deletion of a 
> child entry during a recursive directory delete is propagated as an 
> exception, rather than ignored as a detail which idempotent operations should 
> just ignore.
> the exception should be caught and, if a file not found problem, logged 
> rather than propagated



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-14612) Reduce memory copy in BlobOutputStreamInternal::dispatchWrite

2017-06-28 Thread Rajesh Balamohan (JIRA)
Rajesh Balamohan created HADOOP-14612:
-

 Summary: Reduce memory copy in 
BlobOutputStreamInternal::dispatchWrite
 Key: HADOOP-14612
 URL: https://issues.apache.org/jira/browse/HADOOP-14612
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Reporter: Rajesh Balamohan
Priority: Minor


Currently in {{BlobOutputStreamInternal::dispatchWrite}}, buffer is copied 
internally for every write. During large uploads this can be around 4 MB. This 
can be avoided if there is internal class which extends ByteArrayOutputStream 
with additional method "ByteArrayInputStream getInputStream()".



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14596) AWS SDK 1.11+ aborts() on close() if > 0 bytes in stream; logs error

2017-06-28 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16066377#comment-16066377
 ] 

Rajesh Balamohan commented on HADOOP-14596:
---

Thanks for sharing the patch [~ste...@apache.org]. {{skip}} sometimes may not 
skip to the intended position; should we check for the return value of 
{{skip()}} and repeat until {{remaining}} is reached?

> AWS SDK 1.11+ aborts() on close() if > 0 bytes in stream; logs error
> 
>
> Key: HADOOP-14596
> URL: https://issues.apache.org/jira/browse/HADOOP-14596
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Steve Loughran
>Priority: Minor
> Attachments: HADOOP-14596-001.patch, testlog.txt
>
>
> The latest SDK now tells us off when we do a seek() by aborting the TCP stream
> {code}
> - Not all bytes were read from the S3ObjectInputStream, aborting HTTP 
> connection. This is likely an error and may result in sub-optimal behavior. 
> Request only the bytes you need via a ranged GET or drain the input stream 
> after use.
> 2017-06-27 15:47:35,789 [ScalaTest-main-running-S3ACSVReadSuite] WARN  
> internal.S3AbortableInputStream (S3AbortableInputStream.java:close(163)) - 
> Not all bytes were read from the S3ObjectInputStream, aborting HTTP 
> connection. This is likely an error and may result in sub-optimal behavior. 
> Request only the bytes you need via a ranged GET or drain the input stream 
> after use.
> 2017-06-27 15:47:37,409 [ScalaTest-main-running-S3ACSVReadSuite] WARN  
> internal.S3AbortableInputStream (S3AbortableInputStream.java:close(163)) - 
> Not all bytes were read from the S3ObjectInputStream, aborting HTTP 
> connection. This is likely an error and may result in sub-optimal behavior. 
> Request only the bytes you need via a ranged GET or drain the input stream 
> after use.
> 2017-06-27 15:47:39,003 [ScalaTest-main-running-S3ACSVReadSuite] WARN  
> internal.S3AbortableInputStream (S3AbortableInputStream.java:close(163)) - 
> Not all bytes were read from the S3ObjectInputStream, aborting HTTP 
> connection. This is likely an error and may result in sub-optimal behavior. 
> Request only the bytes you need via a ranged GET or drain the input stream 
> after use.
> 2017-06-27 15:47:40,627 [ScalaTest-main-running-S3ACSVReadSuite] WARN  
> internal.S3AbortableInputStream (S3AbortableInputStream.java:close(163)) - 
> Not all bytes were read from the S3ObjectInputStream, aborting HTTP 
> connection. This is likely an error and may result in sub-optimal behavior. 
> Request only the bytes you need via a ranged GET or drain the input stream 
> after use.
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14500) Azure: TestFileSystemOperationExceptionHandling{,MultiThreaded} fails

2017-06-07 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16041527#comment-16041527
 ] 

Rajesh Balamohan commented on HADOOP-14500:
---

It was simulation related I believe. When earlier patch results were submitted 
“fs.contract.test.fs.wasb” was not added IIRC. So some of the tests were 
getting ignored as mentioedn in.  For the results posted here, I added all the 
3 parameters mentioned in “fs.azure.test.account.name”, 
“fs.azure.account.key.{ACCOUNTNAME}.blob.core.windows.net” and 
“fs.contract.test.fs.wasb”.

Before HADOOP-14478, seek() was closing the inputstream explicitly and 
re-opening it. With HADOOP-14478, it is not expected to do so. So seek need not 
throw FNFE; however would throw this error in subsequent read.

> Azure: TestFileSystemOperationExceptionHandling{,MultiThreaded} fails
> -
>
> Key: HADOOP-14500
> URL: https://issues.apache.org/jira/browse/HADOOP-14500
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/azure, test
>Reporter: Mingliang Liu
>Assignee: Rajesh Balamohan
> Attachments: HADOOP-14500-001.patch
>
>
> The following test fails:
> {code}
> TestFileSystemOperationExceptionHandling.testSingleThreadBlockBlobSeekScenario
>  Expected exception: java.io.FileNotFoundException
> TestFileSystemOperationsExceptionHandlingMultiThreaded.testMultiThreadBlockBlobSeekScenario
>  Expected exception: java.io.FileNotFoundException
> {code}
> I did early analysis and found [HADOOP-14478] maybe the reason. I think we 
> can fix the test itself here.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14500) Azure: TestFileSystemOperationExceptionHandling{,MultiThreaded} fails

2017-06-06 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HADOOP-14500:
--
Status: Patch Available  (was: Open)

> Azure: TestFileSystemOperationExceptionHandling{,MultiThreaded} fails
> -
>
> Key: HADOOP-14500
> URL: https://issues.apache.org/jira/browse/HADOOP-14500
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/azure, test
>Reporter: Mingliang Liu
>Assignee: Rajesh Balamohan
> Attachments: HADOOP-14500-001.patch
>
>
> The following test fails:
> {code}
> TestFileSystemOperationExceptionHandling.testSingleThreadBlockBlobSeekScenario
>  Expected exception: java.io.FileNotFoundException
> TestFileSystemOperationsExceptionHandlingMultiThreaded.testMultiThreadBlockBlobSeekScenario
>  Expected exception: java.io.FileNotFoundException
> {code}
> I did early analysis and found [HADOOP-14478] maybe the reason. I think we 
> can fix the test itself here.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14500) Azure: TestFileSystemOperationExceptionHandling{,MultiThreaded} fails

2017-06-06 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HADOOP-14500:
--
Attachment: HADOOP-14500-001.patch

Tests were executed against WASB/Japan region. 

{noformat}
Running org.apache.hadoop.fs.azure.TestAzureConcurrentOutOfBandIo
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.838 sec - in 
org.apache.hadoop.fs.azure.TestAzureConcurrentOutOfBandIo
Running org.apache.hadoop.fs.azure.TestAzureFileSystemErrorConditions
Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.786 sec - in 
org.apache.hadoop.fs.azure.TestAzureFileSystemErrorConditions

Results :

Failed tests:
  
TestNativeAzureFileSystemMocked>NativeAzureFileSystemBaseTest.testFolderLastModifiedTime:649
 null

Tests run: 703, Failures: 1, Errors: 0, Skipped: 119
{noformat}

Test case failure is not related to the patch.

{noformat}
testFolderLastModifiedTime(org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked)
  Time elapsed: 15.023 sec  <<< FAILURE!
java.lang.AssertionError: null
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertFalse(Assert.java:64)
at org.junit.Assert.assertFalse(Assert.java:74)
at 
org.apache.hadoop.fs.azure.NativeAzureFileSystemBaseTest.testFolderLastModifiedTime(NativeAzureFileSystemBaseTest.java:649)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)

{noformat}

> Azure: TestFileSystemOperationExceptionHandling{,MultiThreaded} fails
> -
>
> Key: HADOOP-14500
> URL: https://issues.apache.org/jira/browse/HADOOP-14500
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/azure, test
>Reporter: Mingliang Liu
>Assignee: Rajesh Balamohan
> Attachments: HADOOP-14500-001.patch
>
>
> The following test fails:
> {code}
> TestFileSystemOperationExceptionHandling.testSingleThreadBlockBlobSeekScenario
>  Expected exception: java.io.FileNotFoundException
> TestFileSystemOperationsExceptionHandlingMultiThreaded.testMultiThreadBlockBlobSeekScenario
>  Expected exception: java.io.FileNotFoundException
> {code}
> I did early analysis and found [HADOOP-14478] maybe the reason. I think we 
> can fix the test itself here.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Assigned] (HADOOP-14500) Azure: TestFileSystemOperationExceptionHandling{,MultiThreaded} fails

2017-06-06 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan reassigned HADOOP-14500:
-

Assignee: Rajesh Balamohan

> Azure: TestFileSystemOperationExceptionHandling{,MultiThreaded} fails
> -
>
> Key: HADOOP-14500
> URL: https://issues.apache.org/jira/browse/HADOOP-14500
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/azure, test
>Reporter: Mingliang Liu
>Assignee: Rajesh Balamohan
>
> The following test fails:
> {code}
> TestFileSystemOperationExceptionHandling.testSingleThreadBlockBlobSeekScenario
>  Expected exception: java.io.FileNotFoundException
> TestFileSystemOperationsExceptionHandlingMultiThreaded.testMultiThreadBlockBlobSeekScenario
>  Expected exception: java.io.FileNotFoundException
> {code}
> I did early analysis and found [HADOOP-14478] maybe the reason. I think we 
> can fix the test itself here.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14478) Optimize NativeAzureFsInputStream for positional reads

2017-06-05 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037914#comment-16037914
 ] 

Rajesh Balamohan commented on HADOOP-14478:
---

[~liuml07] - Perf improvement would be observed when {{BlobInputStream}} is 
fixed. Thanks for creating HADOOP-14490.

> Optimize NativeAzureFsInputStream for positional reads
> --
>
> Key: HADOOP-14478
> URL: https://issues.apache.org/jira/browse/HADOOP-14478
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/azure
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
> Fix For: 2.9.0, 3.0.0-alpha4
>
> Attachments: HADOOP-14478.001.patch, HADOOP-14478.002.patch, 
> HADOOP-14478.003.patch
>
>
> Azure's {{BlobbInputStream}} internally buffers 4 MB of data irrespective of 
> the data length requested for. This would be beneficial for sequential reads. 
> However, for positional reads (seek to specific location, read x number of 
> bytes, seek back to original location) this may not be beneficial and might 
> even download lot more data which are not used later.
> It would be good to override {{readFully(long position, byte[] buffer, int 
> offset, int length)}} for {{NativeAzureFsInputStream}} and make use of 
> {{mark(readLimit)}} as a hint to Azure's BlobInputStream.
> BlobInputStream reference: 
> https://github.com/Azure/azure-storage-java/blob/master/microsoft-azure-storage/src/com/microsoft/azure/storage/blob/BlobInputStream.java#L448
> BlobInputStream can consider this as a hint later to determine the amount of 
> data to be read ahead. Changes to BlobInputStream would not be addressed in 
> this JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14473) Optimize NativeAzureFileSystem::seek for forward seeks

2017-06-05 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HADOOP-14473:
--
Resolution: Resolved
Status: Resolved  (was: Patch Available)

Closing this ticket, since HADOOP-14478 takes care of this.

> Optimize NativeAzureFileSystem::seek for forward seeks
> --
>
> Key: HADOOP-14473
> URL: https://issues.apache.org/jira/browse/HADOOP-14473
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/azure
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
> Attachments: HADOOP-14473-001.patch
>
>
> {{NativeAzureFileSystem::seek()}} closes and re-opens the inputstream 
> irrespective of forward/backward seek. It would be beneficial to re-open the 
> stream on backward seek.
> https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azure/NativeAzureFileSystem.java#L889



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14478) Optimize NativeAzureFsInputStream for positional reads

2017-06-05 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037890#comment-16037890
 ] 

Rajesh Balamohan commented on HADOOP-14478:
---

Thanks [~liuml07], [~ste...@apache.org].

> Optimize NativeAzureFsInputStream for positional reads
> --
>
> Key: HADOOP-14478
> URL: https://issues.apache.org/jira/browse/HADOOP-14478
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/azure
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
> Fix For: 2.9.0, 3.0.0-alpha4
>
> Attachments: HADOOP-14478.001.patch, HADOOP-14478.002.patch, 
> HADOOP-14478.003.patch
>
>
> Azure's {{BlobbInputStream}} internally buffers 4 MB of data irrespective of 
> the data length requested for. This would be beneficial for sequential reads. 
> However, for positional reads (seek to specific location, read x number of 
> bytes, seek back to original location) this may not be beneficial and might 
> even download lot more data which are not used later.
> It would be good to override {{readFully(long position, byte[] buffer, int 
> offset, int length)}} for {{NativeAzureFsInputStream}} and make use of 
> {{mark(readLimit)}} as a hint to Azure's BlobInputStream.
> BlobInputStream reference: 
> https://github.com/Azure/azure-storage-java/blob/master/microsoft-azure-storage/src/com/microsoft/azure/storage/blob/BlobInputStream.java#L448
> BlobInputStream can consider this as a hint later to determine the amount of 
> data to be read ahead. Changes to BlobInputStream would not be addressed in 
> this JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14478) Optimize NativeAzureFsInputStream for positional reads

2017-06-02 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16035724#comment-16035724
 ] 

Rajesh Balamohan commented on HADOOP-14478:
---

Thanks [~liuml07]

> Optimize NativeAzureFsInputStream for positional reads
> --
>
> Key: HADOOP-14478
> URL: https://issues.apache.org/jira/browse/HADOOP-14478
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/azure
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
> Attachments: HADOOP-14478.001.patch, HADOOP-14478.002.patch, 
> HADOOP-14478.003.patch
>
>
> Azure's {{BlobbInputStream}} internally buffers 4 MB of data irrespective of 
> the data length requested for. This would be beneficial for sequential reads. 
> However, for positional reads (seek to specific location, read x number of 
> bytes, seek back to original location) this may not be beneficial and might 
> even download lot more data which are not used later.
> It would be good to override {{readFully(long position, byte[] buffer, int 
> offset, int length)}} for {{NativeAzureFsInputStream}} and make use of 
> {{mark(readLimit)}} as a hint to Azure's BlobInputStream.
> BlobInputStream reference: 
> https://github.com/Azure/azure-storage-java/blob/master/microsoft-azure-storage/src/com/microsoft/azure/storage/blob/BlobInputStream.java#L448
> BlobInputStream can consider this as a hint later to determine the amount of 
> data to be read ahead. Changes to BlobInputStream would not be addressed in 
> this JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14478) Optimize NativeAzureFsInputStream for positional reads

2017-06-02 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HADOOP-14478:
--
Attachment: HADOOP-14478.003.patch

Attaching .3 patch to address checkstyle issue (removed unused import statement)

> Optimize NativeAzureFsInputStream for positional reads
> --
>
> Key: HADOOP-14478
> URL: https://issues.apache.org/jira/browse/HADOOP-14478
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/azure
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
> Attachments: HADOOP-14478.001.patch, HADOOP-14478.002.patch, 
> HADOOP-14478.003.patch
>
>
> Azure's {{BlobbInputStream}} internally buffers 4 MB of data irrespective of 
> the data length requested for. This would be beneficial for sequential reads. 
> However, for positional reads (seek to specific location, read x number of 
> bytes, seek back to original location) this may not be beneficial and might 
> even download lot more data which are not used later.
> It would be good to override {{readFully(long position, byte[] buffer, int 
> offset, int length)}} for {{NativeAzureFsInputStream}} and make use of 
> {{mark(readLimit)}} as a hint to Azure's BlobInputStream.
> BlobInputStream reference: 
> https://github.com/Azure/azure-storage-java/blob/master/microsoft-azure-storage/src/com/microsoft/azure/storage/blob/BlobInputStream.java#L448
> BlobInputStream can consider this as a hint later to determine the amount of 
> data to be read ahead. Changes to BlobInputStream would not be addressed in 
> this JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14473) Optimize NativeAzureFileSystem::seek for forward seeks

2017-06-02 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16034683#comment-16034683
 ] 

Rajesh Balamohan commented on HADOOP-14473:
---

Since it was easier to combine this patch with HADOOP-14478, I have merged it 
and posted the revised patch there.

In the revised patch, I have fixed an issue in seek() and shared the test 
results as well there. Tests were run against "japan west region" end point.

{{BlobInputStream::skip()}} is more of a no-op call. Issue was related to 
closing the stream and opening it again via {{store.retrieve()}} as it would 
end up creating new {{BlobInputStream}}. And that would internally need 
additional http call as it needs to download blob attributes internally in 
{{BlobInputStream}}. This has been avoided in the patch. 

I completely agree that it would be good to get the instrumentation similar to 
s3a, and it was very useful. Please let me know if this could be done in 
incremental tickets.

> Optimize NativeAzureFileSystem::seek for forward seeks
> --
>
> Key: HADOOP-14473
> URL: https://issues.apache.org/jira/browse/HADOOP-14473
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/azure
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
> Attachments: HADOOP-14473-001.patch
>
>
> {{NativeAzureFileSystem::seek()}} closes and re-opens the inputstream 
> irrespective of forward/backward seek. It would be beneficial to re-open the 
> stream on backward seek.
> https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azure/NativeAzureFileSystem.java#L889



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Assigned] (HADOOP-14478) Optimize NativeAzureFsInputStream for positional reads

2017-06-02 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan reassigned HADOOP-14478:
-

Assignee: Rajesh Balamohan

> Optimize NativeAzureFsInputStream for positional reads
> --
>
> Key: HADOOP-14478
> URL: https://issues.apache.org/jira/browse/HADOOP-14478
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/azure
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
> Attachments: HADOOP-14478.001.patch, HADOOP-14478.002.patch
>
>
> Azure's {{BlobbInputStream}} internally buffers 4 MB of data irrespective of 
> the data length requested for. This would be beneficial for sequential reads. 
> However, for positional reads (seek to specific location, read x number of 
> bytes, seek back to original location) this may not be beneficial and might 
> even download lot more data which are not used later.
> It would be good to override {{readFully(long position, byte[] buffer, int 
> offset, int length)}} for {{NativeAzureFsInputStream}} and make use of 
> {{mark(readLimit)}} as a hint to Azure's BlobInputStream.
> BlobInputStream reference: 
> https://github.com/Azure/azure-storage-java/blob/master/microsoft-azure-storage/src/com/microsoft/azure/storage/blob/BlobInputStream.java#L448
> BlobInputStream can consider this as a hint later to determine the amount of 
> data to be read ahead. Changes to BlobInputStream would not be addressed in 
> this JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14478) Optimize NativeAzureFsInputStream for positional reads

2017-06-02 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HADOOP-14478:
--
Attachment: HADOOP-14478.002.patch

Attaching .2 version with fixes in seek().  Also attaching test results from 
hadoop-azure module.

My azure machine and endpoints are hosted in "Japan West region"

{noformat}

hdiuser@hn0:~/hadoop/hadoop-tools/hadoop-azure⟫ mvn test

...
..
Tests run: 16, Failures: 0, Errors: 0, Skipped: 16, Time elapsed: 0.421 sec - 
in org.apache.hadoop.fs.azure.TestFileSystemOperationExceptionHandling
Running org.apache.hadoop.fs.azure.TestAzureConcurrentOutOfBandIo
Tests run: 1, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0.361 sec - in 
org.apache.hadoop.fs.azure.TestAzureConcurrentOutOfBandIo
Running org.apache.hadoop.fs.azure.TestAzureFileSystemErrorConditions
Tests run: 6, Failures: 0, Errors: 0, Skipped: 3, Time elapsed: 0.939 sec - in 
org.apache.hadoop.fs.azure.TestAzureFileSystemErrorConditions

Results :

Tests run: 703, Failures: 0, Errors: 0, Skipped: 436

[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time: 01:50 min
[INFO] Finished at: 2017-06-02T13:08:42+00:00
[INFO] Final Memory: 29M/1574M
[INFO] 
{noformat}

> Optimize NativeAzureFsInputStream for positional reads
> --
>
> Key: HADOOP-14478
> URL: https://issues.apache.org/jira/browse/HADOOP-14478
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/azure
>Reporter: Rajesh Balamohan
> Attachments: HADOOP-14478.001.patch, HADOOP-14478.002.patch
>
>
> Azure's {{BlobbInputStream}} internally buffers 4 MB of data irrespective of 
> the data length requested for. This would be beneficial for sequential reads. 
> However, for positional reads (seek to specific location, read x number of 
> bytes, seek back to original location) this may not be beneficial and might 
> even download lot more data which are not used later.
> It would be good to override {{readFully(long position, byte[] buffer, int 
> offset, int length)}} for {{NativeAzureFsInputStream}} and make use of 
> {{mark(readLimit)}} as a hint to Azure's BlobInputStream.
> BlobInputStream reference: 
> https://github.com/Azure/azure-storage-java/blob/master/microsoft-azure-storage/src/com/microsoft/azure/storage/blob/BlobInputStream.java#L448
> BlobInputStream can consider this as a hint later to determine the amount of 
> data to be read ahead. Changes to BlobInputStream would not be addressed in 
> this JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14478) Optimize NativeAzureFsInputStream for positional reads

2017-06-02 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HADOOP-14478:
--
Description: 
Azure's {{BlobbInputStream}} internally buffers 4 MB of data irrespective of 
the data length requested for. This would be beneficial for sequential reads. 
However, for positional reads (seek to specific location, read x number of 
bytes, seek back to original location) this may not be beneficial and might 
even download lot more data which are not used later.

It would be good to override {{readFully(long position, byte[] buffer, int 
offset, int length)}} for {{NativeAzureFsInputStream}} and make use of 
{{mark(readLimit)}} as a hint to Azure's BlobInputStream.

BlobInputStream reference: 
https://github.com/Azure/azure-storage-java/blob/master/microsoft-azure-storage/src/com/microsoft/azure/storage/blob/BlobInputStream.java#L448

BlobInputStream can consider this as a hint later to determine the amount of 
data to be read ahead. Changes to BlobInputStream would not be addressed in 
this JIRA.




  was:
Azure's {{BlobbInputStream}} internally buffers 4 MB of data irrespective of 
the data length requested for. This would be beneficial for sequential reads. 
However, for positional reads (seek to specific location, read x number of 
bytes, seek back to original location) this may not be beneficial and might 
even download lot more data which are not used later.

It would be good to override {{readFully(long position, byte[] buffer, int 
offset, int length)}} for {{NativeAzureFsInputStream}} and make use of 
{{mark(readLimit)}} as a hint to Azure's BlobInputStream.

BlobInputStream reference: 
https://github.com/Azure/azure-storage-java/blob/master/microsoft-azure-storage/src/com/microsoft/azure/storage/blob/BlobInputStream.java#L448

BlobInputStream can consider this as a hint later to determine the amount of 
data to be read ahead. Changes to BlobInputStream would not be apart of this 
JIRA.





> Optimize NativeAzureFsInputStream for positional reads
> --
>
> Key: HADOOP-14478
> URL: https://issues.apache.org/jira/browse/HADOOP-14478
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/azure
>Reporter: Rajesh Balamohan
> Attachments: HADOOP-14478.001.patch
>
>
> Azure's {{BlobbInputStream}} internally buffers 4 MB of data irrespective of 
> the data length requested for. This would be beneficial for sequential reads. 
> However, for positional reads (seek to specific location, read x number of 
> bytes, seek back to original location) this may not be beneficial and might 
> even download lot more data which are not used later.
> It would be good to override {{readFully(long position, byte[] buffer, int 
> offset, int length)}} for {{NativeAzureFsInputStream}} and make use of 
> {{mark(readLimit)}} as a hint to Azure's BlobInputStream.
> BlobInputStream reference: 
> https://github.com/Azure/azure-storage-java/blob/master/microsoft-azure-storage/src/com/microsoft/azure/storage/blob/BlobInputStream.java#L448
> BlobInputStream can consider this as a hint later to determine the amount of 
> data to be read ahead. Changes to BlobInputStream would not be addressed in 
> this JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14478) Optimize NativeAzureFsInputStream for positional reads

2017-06-02 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HADOOP-14478:
--
Attachment: HADOOP-14478.001.patch

Attaching .1 patch for review. This includes changes related to HADOOP-14473 as 
well.

> Optimize NativeAzureFsInputStream for positional reads
> --
>
> Key: HADOOP-14478
> URL: https://issues.apache.org/jira/browse/HADOOP-14478
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/azure
>Reporter: Rajesh Balamohan
> Attachments: HADOOP-14478.001.patch
>
>
> Azure's {{BlobbInputStream}} internally buffers 4 MB of data irrespective of 
> the data length requested for. This would be beneficial for sequential reads. 
> However, for positional reads (seek to specific location, read x number of 
> bytes, seek back to original location) this may not be beneficial and might 
> even download lot more data which are not used later.
> It would be good to override {{readFully(long position, byte[] buffer, int 
> offset, int length)}} for {{NativeAzureFsInputStream}} and make use of 
> {{mark(readLimit)}} as a hint to Azure's BlobInputStream.
> BlobInputStream reference: 
> https://github.com/Azure/azure-storage-java/blob/master/microsoft-azure-storage/src/com/microsoft/azure/storage/blob/BlobInputStream.java#L448
> BlobInputStream can consider this as a hint later to determine the amount of 
> data to be read ahead. Changes to BlobInputStream would not be apart of this 
> JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-14478) Optimize NativeAzureFsInputStream for positional reads

2017-06-02 Thread Rajesh Balamohan (JIRA)
Rajesh Balamohan created HADOOP-14478:
-

 Summary: Optimize NativeAzureFsInputStream for positional reads
 Key: HADOOP-14478
 URL: https://issues.apache.org/jira/browse/HADOOP-14478
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/azure
Reporter: Rajesh Balamohan


Azure's {{BlobbInputStream}} internally buffers 4 MB of data irrespective of 
the data length requested for. This would be beneficial for sequential reads. 
However, for positional reads (seek to specific location, read x number of 
bytes, seek back to original location) this may not be beneficial and might 
even download lot more data which are not used later.

It would be good to override {{readFully(long position, byte[] buffer, int 
offset, int length)}} for {{NativeAzureFsInputStream}} and make use of 
{{mark(readLimit)}} as a hint to Azure's BlobInputStream.

BlobInputStream reference: 
https://github.com/Azure/azure-storage-java/blob/master/microsoft-azure-storage/src/com/microsoft/azure/storage/blob/BlobInputStream.java#L448

BlobInputStream can consider this as a hint later to determine the amount of 
data to be read ahead. Changes to BlobInputStream would not be apart of this 
JIRA.






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14473) Optimize NativeAzureFileSystem::seek for forward seeks

2017-05-31 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HADOOP-14473:
--
Attachment: HADOOP-14473-001.patch

> Optimize NativeAzureFileSystem::seek for forward seeks
> --
>
> Key: HADOOP-14473
> URL: https://issues.apache.org/jira/browse/HADOOP-14473
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/azure
>Reporter: Rajesh Balamohan
> Attachments: HADOOP-14473-001.patch
>
>
> {{NativeAzureFileSystem::seek()}} closes and re-opens the inputstream 
> irrespective of forward/backward seek. It would be beneficial to re-open the 
> stream on backward seek.
> https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azure/NativeAzureFileSystem.java#L889



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-14473) Optimize NativeAzureFileSystem::seek for forward seeks

2017-05-31 Thread Rajesh Balamohan (JIRA)
Rajesh Balamohan created HADOOP-14473:
-

 Summary: Optimize NativeAzureFileSystem::seek for forward seeks
 Key: HADOOP-14473
 URL: https://issues.apache.org/jira/browse/HADOOP-14473
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/azure
Reporter: Rajesh Balamohan
Priority: Minor


{{NativeAzureFileSystem::seek()}} closes and re-opens the inputstream 
irrespective of forward/backward seek. It would be beneficial to re-open the 
stream on backward seek.

https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azure/NativeAzureFileSystem.java#L889



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13926) S3Guard: S3AFileSystem::listLocatedStatus() to employ MetadataStore

2017-04-03 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15954481#comment-15954481
 ] 

Rajesh Balamohan commented on HADOOP-13926:
---

Thanks for the patch [~liuml07]. Patch LGTM.  Very minor comment: 
{{Listing::createFileStatusListingIterator}} may need to have 
{{providedStatus}} in its javadoc.

> S3Guard: S3AFileSystem::listLocatedStatus() to employ MetadataStore
> ---
>
> Key: HADOOP-13926
> URL: https://issues.apache.org/jira/browse/HADOOP-13926
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Rajesh Balamohan
>Assignee: Mingliang Liu
> Attachments: HADOOP-13926-HADOOP-13345.001.patch, 
> HADOOP-13926-HADOOP-13345.002.patch, HADOOP-13926-HADOOP-13345.003.patch, 
> HADOOP-13926-HADOOP-13345.004.patch, 
> HADOOP-13926.wip.proto.branch-13345.1.patch
>
>
> Need to check if {{listLocatedStatus}} can make use of metastore's 
> listChildren feature.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14154) Set isAuthoritative flag when creating DirListingMetadata in DynamoDBMetaStore

2017-03-09 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HADOOP-14154:
--
Status: Open  (was: Patch Available)

> Set isAuthoritative flag when creating DirListingMetadata in DynamoDBMetaStore
> --
>
> Key: HADOOP-14154
> URL: https://issues.apache.org/jira/browse/HADOOP-14154
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: HADOOP-14154-HADOOP-13345.001.patch, 
> HADOOP-14154-HADOOP-13345.002.patch
>
>
> Currently {{DynamoDBMetaStore::listChildren}} does not populate 
> {{isAuthoritative}} flag when creating {{DirListingMetadata}}. 
> This causes additional S3 lookups even when users have enabled 
> {{fs.s3a.metadatastore.authoritative}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14154) Set isAuthoritative flag when creating DirListingMetadata in DynamoDBMetaStore

2017-03-09 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15904013#comment-15904013
 ] 

Rajesh Balamohan commented on HADOOP-14154:
---

Thanks for the clarification [~fabbri]. Would that isAuthoritative flag have to 
be setup by higher level applications like Pig/Hive/MR?

> Set isAuthoritative flag when creating DirListingMetadata in DynamoDBMetaStore
> --
>
> Key: HADOOP-14154
> URL: https://issues.apache.org/jira/browse/HADOOP-14154
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: HADOOP-14154-HADOOP-13345.001.patch, 
> HADOOP-14154-HADOOP-13345.002.patch
>
>
> Currently {{DynamoDBMetaStore::listChildren}} does not populate 
> {{isAuthoritative}} flag when creating {{DirListingMetadata}}. 
> This causes additional S3 lookups even when users have enabled 
> {{fs.s3a.metadatastore.authoritative}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-14165) Add S3Guard.dirListingUnion in S3AFileSystem#listFiles, listLocatedStatus

2017-03-08 Thread Rajesh Balamohan (JIRA)
Rajesh Balamohan created HADOOP-14165:
-

 Summary: Add S3Guard.dirListingUnion in S3AFileSystem#listFiles, 
listLocatedStatus
 Key: HADOOP-14165
 URL: https://issues.apache.org/jira/browse/HADOOP-14165
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Reporter: Rajesh Balamohan
Priority: Minor


{{S3Guard::dirListingUnion}} merges information from backing store and DDB to 
create consistent view. This needs to be added in {{S3AFileSystem::listFiles}} 
and {{S3AFileSystem::listLocatedStatus}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14154) Set isAuthoritative flag when creating DirListingMetadata in DynamoDBMetaStore

2017-03-08 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HADOOP-14154:
--
Attachment: HADOOP-14154-HADOOP-13345.002.patch

Fixing checkstyle issues.

> Set isAuthoritative flag when creating DirListingMetadata in DynamoDBMetaStore
> --
>
> Key: HADOOP-14154
> URL: https://issues.apache.org/jira/browse/HADOOP-14154
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: HADOOP-14154-HADOOP-13345.001.patch, 
> HADOOP-14154-HADOOP-13345.002.patch
>
>
> Currently {{DynamoDBMetaStore::listChildren}} does not populate 
> {{isAuthoritative}} flag when creating {{DirListingMetadata}}. 
> This causes additional S3 lookups even when users have enabled 
> {{fs.s3a.metadatastore.authoritative}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14154) Set isAuthoritative flag when creating DirListingMetadata in DynamoDBMetaStore

2017-03-07 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HADOOP-14154:
--
Status: Patch Available  (was: Open)

> Set isAuthoritative flag when creating DirListingMetadata in DynamoDBMetaStore
> --
>
> Key: HADOOP-14154
> URL: https://issues.apache.org/jira/browse/HADOOP-14154
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: HADOOP-14154-HADOOP-13345.001.patch
>
>
> Currently {{DynamoDBMetaStore::listChildren}} does not populate 
> {{isAuthoritative}} flag when creating {{DirListingMetadata}}. 
> This causes additional S3 lookups even when users have enabled 
> {{fs.s3a.metadatastore.authoritative}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14154) Set isAuthoritative flag when creating DirListingMetadata in DynamoDBMetaStore

2017-03-07 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HADOOP-14154:
--
Attachment: HADOOP-14154-HADOOP-13345.001.patch

> Set isAuthoritative flag when creating DirListingMetadata in DynamoDBMetaStore
> --
>
> Key: HADOOP-14154
> URL: https://issues.apache.org/jira/browse/HADOOP-14154
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: HADOOP-14154-HADOOP-13345.001.patch
>
>
> Currently {{DynamoDBMetaStore::listChildren}} does not populate 
> {{isAuthoritative}} flag when creating {{DirListingMetadata}}. 
> This causes additional S3 lookups even when users have enabled 
> {{fs.s3a.metadatastore.authoritative}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-14154) Set isAuthoritative flag when creating DirListingMetadata in DynamoDBMetaStore

2017-03-07 Thread Rajesh Balamohan (JIRA)
Rajesh Balamohan created HADOOP-14154:
-

 Summary: Set isAuthoritative flag when creating DirListingMetadata 
in DynamoDBMetaStore
 Key: HADOOP-14154
 URL: https://issues.apache.org/jira/browse/HADOOP-14154
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Reporter: Rajesh Balamohan
Priority: Minor



Currently {{DynamoDBMetaStore::listChildren}} does not populate 
{{isAuthoritative}} flag when creating {{DirListingMetadata}}. 

This causes additional S3 lookups even when users have enabled 
{{fs.s3a.metadatastore.authoritative}}.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13914) s3guard: improve S3AFileStatus#isEmptyDirectory handling

2017-03-07 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898949#comment-15898949
 ] 

Rajesh Balamohan commented on HADOOP-13914:
---

{noformat}
S3AFileStatus innerGetFileStatus(final Path f, boolean needEmptyDirectoryFlag) 
throws IOException {
..
// Check MetadataStore, if any.
..
PathMetadata pm = metadataStore.get(path, true);
..
{noformat}

Should {{needEmptyDirectoryFlag}} be passed on to {{MetadataStore}} ?  This 
would avoid additional {QuerySpec}. This is similar [~liuml07]'s comment #1.

> s3guard: improve S3AFileStatus#isEmptyDirectory handling
> 
>
> Key: HADOOP-13914
> URL: https://issues.apache.org/jira/browse/HADOOP-13914
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: HADOOP-13345
>Reporter: Aaron Fabbri
>Assignee: Aaron Fabbri
> Attachments: HADOOP-13914-HADOOP-13345.000.patch, 
> HADOOP-13914-HADOOP-13345.002.patch, HADOOP-13914-HADOOP-13345.003.patch, 
> HADOOP-13914-HADOOP-13345.004.patch, HADOOP-13914-HADOOP-13345.005.patch, 
> s3guard-empty-dirs.md, test-only-HADOOP-13914.patch
>
>
> As discussed in HADOOP-13449, proper support for the isEmptyDirectory() flag 
> stored in S3AFileStatus is missing from DynamoDBMetadataStore.
> The approach taken by LocalMetadataStore is not suitable for the DynamoDB 
> implementation, and also sacrifices good code separation to minimize 
> S3AFileSystem changes pre-merge to trunk.
> I will attach a design doc that attempts to clearly explain the problem and 
> preferred solution.  I suggest we do this work after merging the HADOOP-13345 
> branch to trunk, but am open to suggestions.
> I can also attach a patch of a integration test that exercises the missing 
> case and demonstrates a failure with DynamoDBMetadataStore.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14081) S3A: Consider avoiding array copy in S3ABlockOutputStream (ByteArrayBlock)

2017-02-20 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875140#comment-15875140
 ] 

Rajesh Balamohan commented on HADOOP-14081:
---

Thanks [~ste...@apache.org]. I ran with "mvn test -Dtest=ITestS\* -Dscale". I 
should have used scale test param for huge filesize upload.

> S3A: Consider avoiding array copy in S3ABlockOutputStream (ByteArrayBlock)
> --
>
> Key: HADOOP-14081
> URL: https://issues.apache.org/jira/browse/HADOOP-14081
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: HADOOP-14081.001.patch
>
>
> In {{S3ADataBlocks::ByteArrayBlock}}, data is copied whenever {{startUpload}} 
> is called. It might be possible to directly access the byte[] array from 
> ByteArrayOutputStream. 
> Might have to extend ByteArrayOutputStream and create a method like 
> getInputStream() which can return ByteArrayInputStream.  This would avoid 
> expensive array copy during large upload.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14081) S3A: Consider avoiding array copy in S3ABlockOutputStream (ByteArrayBlock)

2017-02-17 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15871711#comment-15871711
 ] 

Rajesh Balamohan commented on HADOOP-14081:
---

Thanks [~ste...@apache.org].  Here are the test results (region: S3 bucket in 
U.S east. Tests were run from my laptop). Errors are due to socket time outs 
(180 seconds). Checked ITestS3AContractGetFileStatus.teardown, which was again 
due to socket timeout.

{noformat}
Results :

Tests in error:
  
ITestS3ContractOpen>AbstractFSContractTestBase.setup:193->AbstractFSContractTestBase.mkdirs:338
 » SocketTimeout
  
ITestS3AContractGetFileStatus.teardown:40->AbstractFSContractTestBase.teardown:204->AbstractFSContractTestBase.deleteTestDirInTeardown:213
 »
  
ITestS3AContractRootDir>AbstractContractRootDirectoryTest.testRmEmptyRootDirNonRecursive:116
 » PathIO
  
ITestS3NContractOpen>AbstractFSContractTestBase.setup:193->AbstractFSContractTestBase.mkdirs:338
 » SocketTimeout

Tests run: 454, Failures: 0, Errors: 4, Skipped: 56

..
..
[INFO] Total time: 02:11 h 
{noformat}

> S3A: Consider avoiding array copy in S3ABlockOutputStream (ByteArrayBlock)
> --
>
> Key: HADOOP-14081
> URL: https://issues.apache.org/jira/browse/HADOOP-14081
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HADOOP-14081.001.patch
>
>
> In {{S3ADataBlocks::ByteArrayBlock}}, data is copied whenever {{startUpload}} 
> is called. It might be possible to directly access the byte[] array from 
> ByteArrayOutputStream. 
> Might have to extend ByteArrayOutputStream and create a method like 
> getInputStream() which can return ByteArrayInputStream.  This would avoid 
> expensive array copy during large upload.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14081) S3A: Consider avoiding array copy in S3ABlockOutputStream (ByteArrayBlock)

2017-02-15 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HADOOP-14081:
--
Status: Patch Available  (was: Open)

> S3A: Consider avoiding array copy in S3ABlockOutputStream (ByteArrayBlock)
> --
>
> Key: HADOOP-14081
> URL: https://issues.apache.org/jira/browse/HADOOP-14081
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: HADOOP-14081.001.patch
>
>
> In {{S3ADataBlocks::ByteArrayBlock}}, data is copied whenever {{startUpload}} 
> is called. It might be possible to directly access the byte[] array from 
> ByteArrayOutputStream. 
> Might have to extend ByteArrayOutputStream and create a method like 
> getInputStream() which can return ByteArrayInputStream.  This would avoid 
> expensive array copy during large upload.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14081) S3A: Consider avoiding array copy in S3ABlockOutputStream (ByteArrayBlock)

2017-02-15 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HADOOP-14081:
--
Attachment: HADOOP-14081.001.patch

> S3A: Consider avoiding array copy in S3ABlockOutputStream (ByteArrayBlock)
> --
>
> Key: HADOOP-14081
> URL: https://issues.apache.org/jira/browse/HADOOP-14081
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: HADOOP-14081.001.patch
>
>
> In {{S3ADataBlocks::ByteArrayBlock}}, data is copied whenever {{startUpload}} 
> is called. It might be possible to directly access the byte[] array from 
> ByteArrayOutputStream. 
> Might have to extend ByteArrayOutputStream and create a method like 
> getInputStream() which can return ByteArrayInputStream.  This would avoid 
> expensive array copy during large upload.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-14081) S3A: Consider avoiding array copy in S3ABlockOutputStream (ByteArrayBlock)

2017-02-14 Thread Rajesh Balamohan (JIRA)
Rajesh Balamohan created HADOOP-14081:
-

 Summary: S3A: Consider avoiding array copy in S3ABlockOutputStream 
(ByteArrayBlock)
 Key: HADOOP-14081
 URL: https://issues.apache.org/jira/browse/HADOOP-14081
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs/s3
Reporter: Rajesh Balamohan
Priority: Minor


In {{S3ADataBlocks::ByteArrayBlock}}, data is copied whenever {{startUpload}} 
is called. It might be possible to directly access the byte[] array from 
ByteArrayOutputStream. 

Might have to extend ByteArrayOutputStream and create a method like 
getInputStream() which can return ByteArrayInputStream.  This would avoid 
expensive array copy during large upload.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14081) S3A: Consider avoiding array copy in S3ABlockOutputStream (ByteArrayBlock)

2017-02-14 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HADOOP-14081:
--
Issue Type: Sub-task  (was: Improvement)
Parent: HADOOP-13204

> S3A: Consider avoiding array copy in S3ABlockOutputStream (ByteArrayBlock)
> --
>
> Key: HADOOP-14081
> URL: https://issues.apache.org/jira/browse/HADOOP-14081
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Rajesh Balamohan
>Priority: Minor
>
> In {{S3ADataBlocks::ByteArrayBlock}}, data is copied whenever {{startUpload}} 
> is called. It might be possible to directly access the byte[] array from 
> ByteArrayOutputStream. 
> Might have to extend ByteArrayOutputStream and create a method like 
> getInputStream() which can return ByteArrayInputStream.  This would avoid 
> expensive array copy during large upload.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13926) S3Guard: Improve listLocatedStatus

2017-01-10 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15816583#comment-15816583
 ] 

Rajesh Balamohan commented on HADOOP-13926:
---

Agreed. That would need change in s3guard to support few million entries in 
{{DirListingMetadata listChildren(Path path)}} or make changes in the API to 
support larger set of entries in the listing.

> S3Guard: Improve listLocatedStatus
> --
>
> Key: HADOOP-13926
> URL: https://issues.apache.org/jira/browse/HADOOP-13926
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: HADOOP-13926.wip.proto.branch-13345.1.patch
>
>
> Need to check if {{listLocatedStatus}} can make use of metastore's 
> listChildren feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13926) S3Guard: Improve listLocatedStatus

2016-12-22 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HADOOP-13926:
--
Attachment: HADOOP-13926.wip.proto.branch-13345.1.patch

> S3Guard: Improve listLocatedStatus
> --
>
> Key: HADOOP-13926
> URL: https://issues.apache.org/jira/browse/HADOOP-13926
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Rajesh Balamohan
>Priority: Minor
> Attachments: HADOOP-13926.wip.proto.branch-13345.1.patch
>
>
> Need to check if {{listLocatedStatus}} can make use of metastore's 
> listChildren feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-13936) S3Guard: DynamoDB can go out of sync with S3AFileSystem::delete operation

2016-12-22 Thread Rajesh Balamohan (JIRA)
Rajesh Balamohan created HADOOP-13936:
-

 Summary: S3Guard: DynamoDB can go out of sync with 
S3AFileSystem::delete operation
 Key: HADOOP-13936
 URL: https://issues.apache.org/jira/browse/HADOOP-13936
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Reporter: Rajesh Balamohan
Priority: Minor


As a part of {{S3AFileSystem.delete}} operation {{innerDelete}} is invoked, 
which deletes keys from S3 in batches (default is 1000). But DynamoDB is 
updated only at the end of this operation. This can cause issues when deleting 
large number of keys. 

E.g, it is possible to get exception after deleting 1000 keys and in such cases 
dynamoDB would not be updated. This can cause DynamoDB to go out of sync. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13934) S3Guard: DynamoDBMetaStore::move could be throwing exception due to BatchWriteItem limits

2016-12-21 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HADOOP-13934:
--
Summary: S3Guard: DynamoDBMetaStore::move could be throwing exception due 
to BatchWriteItem limits  (was: S3Guard: DynamoDBMetaStore::move throws 
exception with limited info)

> S3Guard: DynamoDBMetaStore::move could be throwing exception due to 
> BatchWriteItem limits
> -
>
> Key: HADOOP-13934
> URL: https://issues.apache.org/jira/browse/HADOOP-13934
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Rajesh Balamohan
>Priority: Minor
> Fix For: 2.9.0
>
>
> When using {{DynamoDBMetadataStore}} with a insert heavy hive app , it 
> started throwing exceptions in {{DynamoDBMetadataStore::move}}. But just with 
> the following exception, it is relatively hard to debug on the real issue in 
> DynamoDB side. 
> {noformat}
> Caused by: com.amazonaws.services.dynamodbv2.model.AmazonDynamoDBException: 1 
> validation error detected: Value 
> '{ddb-table-name-334=[com.amazonaws.dynamodb.v20120810.WriteRequest@ca1da583, 
> com.amazonaws.dynamodb.v20120810.WriteRequest@ca1fc7cd, 
> com.amazonaws.dynamodb.v20120810.WriteRequest@ca4244e6, 
> com.amazonaws.dynamodb.v20120810.WriteRequest@ca2f58a9, 
> com.amazonaws.dynamodb.v20120810.WriteRequest@ca3525f8,
> ...
> ...
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1529)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1167)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:948)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:661)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:635)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:618)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$300(AmazonHttpClient.java:586)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:573)
> at 
> com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:445)
> at 
> com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.doInvoke(AmazonDynamoDBClient.java:1722)
> at 
> com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.invoke(AmazonDynamoDBClient.java:1698)
> at 
> com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.batchWriteItem(AmazonDynamoDBClient.java:668)
> at 
> com.amazonaws.services.dynamodbv2.document.internal.BatchWriteItemImpl.doBatchWriteItem(BatchWriteItemImpl.java:111)
> at 
> com.amazonaws.services.dynamodbv2.document.internal.BatchWriteItemImpl.batchWriteItem(BatchWriteItemImpl.java:52)
> at 
> com.amazonaws.services.dynamodbv2.document.DynamoDB.batchWriteItem(DynamoDB.java:178)
> at 
> org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.move(DynamoDBMetadataStore.java:351)
> ... 28 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13934) S3Guard: DynamoDBMetaStore::move throws exception with limited info

2016-12-21 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15769320#comment-15769320
 ] 

Rajesh Balamohan commented on HADOOP-13934:
---

Suspecting the API limit in batchitem 
(http://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_BatchWriteItem.html).
 If the number of items are > 25, it could end up throwing this exception. It 
might be good to invoke this in micro batches?

> S3Guard: DynamoDBMetaStore::move throws exception with limited info
> ---
>
> Key: HADOOP-13934
> URL: https://issues.apache.org/jira/browse/HADOOP-13934
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Rajesh Balamohan
>Priority: Minor
> Fix For: 2.9.0
>
>
> When using {{DynamoDBMetadataStore}} with a insert heavy hive app , it 
> started throwing exceptions in {{DynamoDBMetadataStore::move}}. But just with 
> the following exception, it is relatively hard to debug on the real issue in 
> DynamoDB side. 
> {noformat}
> Caused by: com.amazonaws.services.dynamodbv2.model.AmazonDynamoDBException: 1 
> validation error detected: Value 
> '{ddb-table-name-334=[com.amazonaws.dynamodb.v20120810.WriteRequest@ca1da583, 
> com.amazonaws.dynamodb.v20120810.WriteRequest@ca1fc7cd, 
> com.amazonaws.dynamodb.v20120810.WriteRequest@ca4244e6, 
> com.amazonaws.dynamodb.v20120810.WriteRequest@ca2f58a9, 
> com.amazonaws.dynamodb.v20120810.WriteRequest@ca3525f8,
> ...
> ...
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1529)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1167)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:948)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:661)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:635)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:618)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$300(AmazonHttpClient.java:586)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:573)
> at 
> com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:445)
> at 
> com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.doInvoke(AmazonDynamoDBClient.java:1722)
> at 
> com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.invoke(AmazonDynamoDBClient.java:1698)
> at 
> com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.batchWriteItem(AmazonDynamoDBClient.java:668)
> at 
> com.amazonaws.services.dynamodbv2.document.internal.BatchWriteItemImpl.doBatchWriteItem(BatchWriteItemImpl.java:111)
> at 
> com.amazonaws.services.dynamodbv2.document.internal.BatchWriteItemImpl.batchWriteItem(BatchWriteItemImpl.java:52)
> at 
> com.amazonaws.services.dynamodbv2.document.DynamoDB.batchWriteItem(DynamoDB.java:178)
> at 
> org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.move(DynamoDBMetadataStore.java:351)
> ... 28 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-13934) S3Guard: DynamoDBMetaStore::move throws exception with limited info

2016-12-21 Thread Rajesh Balamohan (JIRA)
Rajesh Balamohan created HADOOP-13934:
-

 Summary: S3Guard: DynamoDBMetaStore::move throws exception with 
limited info
 Key: HADOOP-13934
 URL: https://issues.apache.org/jira/browse/HADOOP-13934
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Reporter: Rajesh Balamohan
Priority: Minor
 Fix For: 2.9.0


When using {{DynamoDBMetadataStore}} with a insert heavy hive app , it started 
throwing exceptions in {{DynamoDBMetadataStore::move}}. But just with the 
following exception, it is relatively hard to debug on the real issue in 
DynamoDB side. 

{noformat}

Caused by: com.amazonaws.services.dynamodbv2.model.AmazonDynamoDBException: 1 
validation error detected: Value 
'{ddb-table-name-334=[com.amazonaws.dynamodb.v20120810.WriteRequest@ca1da583, 
com.amazonaws.dynamodb.v20120810.WriteRequest@ca1fc7cd, 
com.amazonaws.dynamodb.v20120810.WriteRequest@ca4244e6, 
com.amazonaws.dynamodb.v20120810.WriteRequest@ca2f58a9, 
com.amazonaws.dynamodb.v20120810.WriteRequest@ca3525f8,
...
...
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1529)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1167)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:948)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:661)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:635)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:618)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$300(AmazonHttpClient.java:586)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:573)
at 
com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:445)
at 
com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.doInvoke(AmazonDynamoDBClient.java:1722)
at 
com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.invoke(AmazonDynamoDBClient.java:1698)
at 
com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.batchWriteItem(AmazonDynamoDBClient.java:668)
at 
com.amazonaws.services.dynamodbv2.document.internal.BatchWriteItemImpl.doBatchWriteItem(BatchWriteItemImpl.java:111)
at 
com.amazonaws.services.dynamodbv2.document.internal.BatchWriteItemImpl.batchWriteItem(BatchWriteItemImpl.java:52)
at 
com.amazonaws.services.dynamodbv2.document.DynamoDB.batchWriteItem(DynamoDB.java:178)
at 
org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.move(DynamoDBMetadataStore.java:351)
... 28 more
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-13931) S3AGuard: Use BatchWriteItem in DynamoDBMetadataStore::put(DirListingMetadata)

2016-12-21 Thread Rajesh Balamohan (JIRA)
Rajesh Balamohan created HADOOP-13931:
-

 Summary: S3AGuard: Use BatchWriteItem in 
DynamoDBMetadataStore::put(DirListingMetadata)
 Key: HADOOP-13931
 URL: https://issues.apache.org/jira/browse/HADOOP-13931
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Rajesh Balamohan
Priority: Minor


Using {{batchWriteItem}} might be performant in 
{{DynamoDBMetadataStore::put(DirListingMetadata meta)}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-13925) S3Guard: NPE when table is already populated in dynamodb and user specifies "fs.s3a.s3guard.ddb.table.create=false"

2016-12-21 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan resolved HADOOP-13925.
---
Resolution: Duplicate

> S3Guard: NPE when table is already populated in dynamodb and user specifies 
> "fs.s3a.s3guard.ddb.table.create=false"
> ---
>
> Key: HADOOP-13925
> URL: https://issues.apache.org/jira/browse/HADOOP-13925
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Rajesh Balamohan
>Assignee: Mingliang Liu
>Priority: Minor
>
> When table is present dynamodb store and already populated, it is possible 
> that users can specify {{fs.s3a.s3guard.ddb.table.create=false}}.  In such 
> cases, {{DynamoDBMetadataStore.get}} would end up throwing NPE as {{table}} 
> object may not be initialized. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-13926) S3Guard: Improve listLocatedStatus

2016-12-21 Thread Rajesh Balamohan (JIRA)
Rajesh Balamohan created HADOOP-13926:
-

 Summary: S3Guard: Improve listLocatedStatus
 Key: HADOOP-13926
 URL: https://issues.apache.org/jira/browse/HADOOP-13926
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Reporter: Rajesh Balamohan
Priority: Minor


Need to check if {{listLocatedStatus}} can make use of metastore's listChildren 
feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-13925) S3Guard: NPE when table is already populated in dynamodb and user specifies "fs.s3a.s3guard.ddb.table.create=false"

2016-12-21 Thread Rajesh Balamohan (JIRA)
Rajesh Balamohan created HADOOP-13925:
-

 Summary: S3Guard: NPE when table is already populated in dynamodb 
and user specifies "fs.s3a.s3guard.ddb.table.create=false"
 Key: HADOOP-13925
 URL: https://issues.apache.org/jira/browse/HADOOP-13925
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Reporter: Rajesh Balamohan
Priority: Minor


When table is present dynamodb store and already populated, it is possible that 
users can specify {{fs.s3a.s3guard.ddb.table.create=false}}.  In such cases, 
{{DynamoDBMetadataStore.get}} would end up throwing NPE as {{table}} object may 
not be initialized. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-13757) Remove verifyBuckets overhead in S3AFileSystem::initialize()

2016-10-24 Thread Rajesh Balamohan (JIRA)
Rajesh Balamohan created HADOOP-13757:
-

 Summary: Remove verifyBuckets overhead in 
S3AFileSystem::initialize()
 Key: HADOOP-13757
 URL: https://issues.apache.org/jira/browse/HADOOP-13757
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Reporter: Rajesh Balamohan
Priority: Minor


{{S3AFileSystem.initialize()}} invokes verifyBuckets, but in case the bucket 
does not exist and gets a 403 error message, it ends up returning {{true}} for 
{{s3.doesBucketExists(bucketName}}.  In that aspect,  verifyBuckets() is an 
unnecessary call during initialization. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13727) S3A: Reduce high number of connections to EC2 Instance Metadata Service caused by InstanceProfileCredentialsProvider.

2016-10-21 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15594685#comment-15594685
 ] 

Rajesh Balamohan commented on HADOOP-13727:
---

I have tried out this patch in AWS test environment and fixes the issue.  Are 
you referring to running entire test suite in aws ec2?.

> S3A: Reduce high number of connections to EC2 Instance Metadata Service 
> caused by InstanceProfileCredentialsProvider.
> -
>
> Key: HADOOP-13727
> URL: https://issues.apache.org/jira/browse/HADOOP-13727
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Rajesh Balamohan
>Assignee: Chris Nauroth
>Priority: Minor
> Attachments: HADOOP-13727-branch-2.001.patch, 
> HADOOP-13727-branch-2.002.patch, HADOOP-13727-branch-2.003.patch, 
> HADOOP-13727-branch-2.004.patch
>
>
> When running in an EC2 VM, S3A can make use of 
> {{InstanceProfileCredentialsProvider}} from the AWS SDK to obtain credentials 
> from the EC2 Instance Metadata Service.  We have observed that for a highly 
> multi-threaded application, this may generate a high number of calls to the 
> Instance Metadata Service.  The service may throttle the client by replying 
> with an HTTP 429 response or forcibly closing connections.  We can greatly 
> reduce the number of calls to the service by enforcing that all threads use a 
> single shared instance of {{InstanceProfileCredentialsProvider}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13560) S3ABlockOutputStream to support huge (many GB) file writes

2016-09-29 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15535127#comment-15535127
 ] 

Rajesh Balamohan commented on HADOOP-13560:
---

S3ABlockOutputStream::initiateMultiPartUpload() has the following 
{noformat}
LOG.debug("Initiating Multipart upload for block {}", currentBlock);
{noformat}

In S3ADataBlocks.java,  patch has the following for ByteArrayBlock
{noformat}
@Override
public String toString() {
  return "ByteArrayBlock{" +
  "state=" + getState() +
  ", buffer=" + buffer +
  ", limit=" + limit +
  ", dataSize=" + dataSize +
  '}';
}
{noformat}

When DEBUG log was enabled to check the AWS traffic, it ended up printing the 
entire contents of the buffer. When trying to debug large data transfer (4 GB 
in my case), it ended up printing huge chunks which may not be needed. Would it 
be possible to only the buffer sizes?.

> S3ABlockOutputStream to support huge (many GB) file writes
> --
>
> Key: HADOOP-13560
> URL: https://issues.apache.org/jira/browse/HADOOP-13560
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.9.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Attachments: HADOOP-13560-branch-2-001.patch, 
> HADOOP-13560-branch-2-002.patch, HADOOP-13560-branch-2-003.patch, 
> HADOOP-13560-branch-2-004.patch
>
>
> An AWS SDK [issue|https://github.com/aws/aws-sdk-java/issues/367] highlights 
> that metadata isn't copied on large copies.
> 1. Add a test to do that large copy/rname and verify that the copy really 
> works
> 2. Verify that metadata makes it over.
> Verifying large file rename is important on its own, as it is needed for very 
> large commit operations for committers using rename



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13169) Randomize file list in SimpleCopyListing

2016-09-16 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HADOOP-13169:
--
Attachment: HADOOP-13169-branch-2-010.patch

Thank you [~ste...@apache.org], [~cnauroth]. Attached the revised the patch to 
address the review comments.

> Randomize file list in SimpleCopyListing
> 
>
> Key: HADOOP-13169
> URL: https://issues.apache.org/jira/browse/HADOOP-13169
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HADOOP-13169-branch-2-001.patch, 
> HADOOP-13169-branch-2-002.patch, HADOOP-13169-branch-2-003.patch, 
> HADOOP-13169-branch-2-004.patch, HADOOP-13169-branch-2-005.patch, 
> HADOOP-13169-branch-2-006.patch, HADOOP-13169-branch-2-007.patch, 
> HADOOP-13169-branch-2-008.patch, HADOOP-13169-branch-2-009.patch, 
> HADOOP-13169-branch-2-010.patch
>
>
> When copying files to S3, based on file listing some mappers can get into S3 
> partition hotspots. This would be more visible, when data is copied from hive 
> warehouse with lots of partitions (e.g date partitions). In such cases, some 
> of the tasks would tend to be a lot more slower than others. It would be good 
> to randomize the file paths which are written out in SimpleCopyListing to 
> avoid this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13169) Randomize file list in SimpleCopyListing

2016-09-15 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HADOOP-13169:
--
Attachment: HADOOP-13169-branch-2-009.patch

Thank you [~cnauroth]. 

Changes in the latest patch:
1. Changed LinkedList to ArrayList in SimpleCopyListing 
2. For the test, I thought of having guava's {{Ordering.arbitrary()}} which 
relies on {{System.identityHashCode}}, but that is also prone to collisions 
(http://bugs.java.com/bugdatabase/view_bug.do?bug_id=6809470).  Instead, using 
{{setSeedForRandomListing(long seed)}} with "@VisibleForTesting" for testing 
purpose.


> Randomize file list in SimpleCopyListing
> 
>
> Key: HADOOP-13169
> URL: https://issues.apache.org/jira/browse/HADOOP-13169
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HADOOP-13169-branch-2-001.patch, 
> HADOOP-13169-branch-2-002.patch, HADOOP-13169-branch-2-003.patch, 
> HADOOP-13169-branch-2-004.patch, HADOOP-13169-branch-2-005.patch, 
> HADOOP-13169-branch-2-006.patch, HADOOP-13169-branch-2-007.patch, 
> HADOOP-13169-branch-2-008.patch, HADOOP-13169-branch-2-009.patch
>
>
> When copying files to S3, based on file listing some mappers can get into S3 
> partition hotspots. This would be more visible, when data is copied from hive 
> warehouse with lots of partitions (e.g date partitions). In such cases, some 
> of the tasks would tend to be a lot more slower than others. It would be good 
> to randomize the file paths which are written out in SimpleCopyListing to 
> avoid this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13169) Randomize file list in SimpleCopyListing

2016-09-14 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HADOOP-13169:
--
Status: Patch Available  (was: Open)

> Randomize file list in SimpleCopyListing
> 
>
> Key: HADOOP-13169
> URL: https://issues.apache.org/jira/browse/HADOOP-13169
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HADOOP-13169-branch-2-001.patch, 
> HADOOP-13169-branch-2-002.patch, HADOOP-13169-branch-2-003.patch, 
> HADOOP-13169-branch-2-004.patch, HADOOP-13169-branch-2-005.patch, 
> HADOOP-13169-branch-2-006.patch, HADOOP-13169-branch-2-007.patch, 
> HADOOP-13169-branch-2-008.patch
>
>
> When copying files to S3, based on file listing some mappers can get into S3 
> partition hotspots. This would be more visible, when data is copied from hive 
> warehouse with lots of partitions (e.g date partitions). In such cases, some 
> of the tasks would tend to be a lot more slower than others. It would be good 
> to randomize the file paths which are written out in SimpleCopyListing to 
> avoid this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13169) Randomize file list in SimpleCopyListing

2016-09-14 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HADOOP-13169:
--
Attachment: HADOOP-13169-branch-2-008.patch

> Randomize file list in SimpleCopyListing
> 
>
> Key: HADOOP-13169
> URL: https://issues.apache.org/jira/browse/HADOOP-13169
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HADOOP-13169-branch-2-001.patch, 
> HADOOP-13169-branch-2-002.patch, HADOOP-13169-branch-2-003.patch, 
> HADOOP-13169-branch-2-004.patch, HADOOP-13169-branch-2-005.patch, 
> HADOOP-13169-branch-2-006.patch, HADOOP-13169-branch-2-007.patch, 
> HADOOP-13169-branch-2-008.patch
>
>
> When copying files to S3, based on file listing some mappers can get into S3 
> partition hotspots. This would be more visible, when data is copied from hive 
> warehouse with lots of partitions (e.g date partitions). In such cases, some 
> of the tasks would tend to be a lot more slower than others. It would be good 
> to randomize the file paths which are written out in SimpleCopyListing to 
> avoid this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13169) Randomize file list in SimpleCopyListing

2016-09-14 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HADOOP-13169:
--
Attachment: (was: HADOOP-13169-branch-2-008.patch)

> Randomize file list in SimpleCopyListing
> 
>
> Key: HADOOP-13169
> URL: https://issues.apache.org/jira/browse/HADOOP-13169
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HADOOP-13169-branch-2-001.patch, 
> HADOOP-13169-branch-2-002.patch, HADOOP-13169-branch-2-003.patch, 
> HADOOP-13169-branch-2-004.patch, HADOOP-13169-branch-2-005.patch, 
> HADOOP-13169-branch-2-006.patch, HADOOP-13169-branch-2-007.patch, 
> HADOOP-13169-branch-2-008.patch
>
>
> When copying files to S3, based on file listing some mappers can get into S3 
> partition hotspots. This would be more visible, when data is copied from hive 
> warehouse with lots of partitions (e.g date partitions). In such cases, some 
> of the tasks would tend to be a lot more slower than others. It would be good 
> to randomize the file paths which are written out in SimpleCopyListing to 
> avoid this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13169) Randomize file list in SimpleCopyListing

2016-09-14 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HADOOP-13169:
--
Status: Open  (was: Patch Available)

> Randomize file list in SimpleCopyListing
> 
>
> Key: HADOOP-13169
> URL: https://issues.apache.org/jira/browse/HADOOP-13169
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HADOOP-13169-branch-2-001.patch, 
> HADOOP-13169-branch-2-002.patch, HADOOP-13169-branch-2-003.patch, 
> HADOOP-13169-branch-2-004.patch, HADOOP-13169-branch-2-005.patch, 
> HADOOP-13169-branch-2-006.patch, HADOOP-13169-branch-2-007.patch, 
> HADOOP-13169-branch-2-008.patch
>
>
> When copying files to S3, based on file listing some mappers can get into S3 
> partition hotspots. This would be more visible, when data is copied from hive 
> warehouse with lots of partitions (e.g date partitions). In such cases, some 
> of the tasks would tend to be a lot more slower than others. It would be good 
> to randomize the file paths which are written out in SimpleCopyListing to 
> avoid this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13169) Randomize file list in SimpleCopyListing

2016-09-14 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HADOOP-13169:
--
Attachment: HADOOP-13169-branch-2-008.patch

Thanks [~ste...@apache.org].  Added isDebugEnabled() to be consistent with rest 
of the code in the latest patch.

> Randomize file list in SimpleCopyListing
> 
>
> Key: HADOOP-13169
> URL: https://issues.apache.org/jira/browse/HADOOP-13169
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HADOOP-13169-branch-2-001.patch, 
> HADOOP-13169-branch-2-002.patch, HADOOP-13169-branch-2-003.patch, 
> HADOOP-13169-branch-2-004.patch, HADOOP-13169-branch-2-005.patch, 
> HADOOP-13169-branch-2-006.patch, HADOOP-13169-branch-2-007.patch, 
> HADOOP-13169-branch-2-008.patch
>
>
> When copying files to S3, based on file listing some mappers can get into S3 
> partition hotspots. This would be more visible, when data is copied from hive 
> warehouse with lots of partitions (e.g date partitions). In such cases, some 
> of the tasks would tend to be a lot more slower than others. It would be good 
> to randomize the file paths which are written out in SimpleCopyListing to 
> avoid this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13169) Randomize file list in SimpleCopyListing

2016-09-14 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HADOOP-13169:
--
Attachment: HADOOP-13169-branch-2-007.patch

> Randomize file list in SimpleCopyListing
> 
>
> Key: HADOOP-13169
> URL: https://issues.apache.org/jira/browse/HADOOP-13169
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HADOOP-13169-branch-2-001.patch, 
> HADOOP-13169-branch-2-002.patch, HADOOP-13169-branch-2-003.patch, 
> HADOOP-13169-branch-2-004.patch, HADOOP-13169-branch-2-005.patch, 
> HADOOP-13169-branch-2-006.patch, HADOOP-13169-branch-2-007.patch
>
>
> When copying files to S3, based on file listing some mappers can get into S3 
> partition hotspots. This would be more visible, when data is copied from hive 
> warehouse with lots of partitions (e.g date partitions). In such cases, some 
> of the tasks would tend to be a lot more slower than others. It would be good 
> to randomize the file paths which are written out in SimpleCopyListing to 
> avoid this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13169) Randomize file list in SimpleCopyListing

2016-09-14 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HADOOP-13169:
--
Status: Patch Available  (was: Open)

> Randomize file list in SimpleCopyListing
> 
>
> Key: HADOOP-13169
> URL: https://issues.apache.org/jira/browse/HADOOP-13169
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HADOOP-13169-branch-2-001.patch, 
> HADOOP-13169-branch-2-002.patch, HADOOP-13169-branch-2-003.patch, 
> HADOOP-13169-branch-2-004.patch, HADOOP-13169-branch-2-005.patch, 
> HADOOP-13169-branch-2-006.patch, HADOOP-13169-branch-2-007.patch
>
>
> When copying files to S3, based on file listing some mappers can get into S3 
> partition hotspots. This would be more visible, when data is copied from hive 
> warehouse with lots of partitions (e.g date partitions). In such cases, some 
> of the tasks would tend to be a lot more slower than others. It would be good 
> to randomize the file paths which are written out in SimpleCopyListing to 
> avoid this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13169) Randomize file list in SimpleCopyListing

2016-09-14 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HADOOP-13169:
--
Status: Open  (was: Patch Available)

> Randomize file list in SimpleCopyListing
> 
>
> Key: HADOOP-13169
> URL: https://issues.apache.org/jira/browse/HADOOP-13169
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools/distcp
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HADOOP-13169-branch-2-001.patch, 
> HADOOP-13169-branch-2-002.patch, HADOOP-13169-branch-2-003.patch, 
> HADOOP-13169-branch-2-004.patch, HADOOP-13169-branch-2-005.patch, 
> HADOOP-13169-branch-2-006.patch
>
>
> When copying files to S3, based on file listing some mappers can get into S3 
> partition hotspots. This would be more visible, when data is copied from hive 
> warehouse with lots of partitions (e.g date partitions). In such cases, some 
> of the tasks would tend to be a lot more slower than others. It would be good 
> to randomize the file paths which are written out in SimpleCopyListing to 
> avoid this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



  1   2   >