[jira] [Updated] (HADOOP-14943) Add common getFileBlockLocations() emulation for object stores, including S3A

2022-01-05 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-14943:

Parent: HADOOP-18067  (was: HADOOP-17566)

> Add common getFileBlockLocations() emulation for object stores, including S3A
> -
>
> Key: HADOOP-14943
> URL: https://issues.apache.org/jira/browse/HADOOP-14943
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Attachments: HADOOP-14943-001.patch, HADOOP-14943-002.patch, 
> HADOOP-14943-002.patch, HADOOP-14943-003.patch, HADOOP-14943-004.patch
>
>
> It looks suspiciously like S3A isn't providing the partitioning data needed 
> in {{listLocatedStatus}} and {{getFileBlockLocations()}} needed to break up a 
> file by the blocksize. This will stop tools using the MRv1 APIS doing the 
> partitioning properly if the input format isn't doing it own split logic.
> FileInputFormat in MRv2 is a bit more configurable about input split 
> calculation & will split up large files. but otherwise, the partitioning is 
> being done more by the default values of the executing engine, rather than 
> any config data from the filesystem about what its "block size" is,
> NativeAzureFS does a better job; maybe that could be factored out to 
> hadoop-common and reused?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14943) Add common getFileBlockLocations() emulation for object stores, including S3A

2020-02-14 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-14943:

Priority: Minor  (was: Major)

> Add common getFileBlockLocations() emulation for object stores, including S3A
> -
>
> Key: HADOOP-14943
> URL: https://issues.apache.org/jira/browse/HADOOP-14943
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Attachments: HADOOP-14943-001.patch, HADOOP-14943-002.patch, 
> HADOOP-14943-002.patch, HADOOP-14943-003.patch, HADOOP-14943-004.patch
>
>
> It looks suspiciously like S3A isn't providing the partitioning data needed 
> in {{listLocatedStatus}} and {{getFileBlockLocations()}} needed to break up a 
> file by the blocksize. This will stop tools using the MRv1 APIS doing the 
> partitioning properly if the input format isn't doing it own split logic.
> FileInputFormat in MRv2 is a bit more configurable about input split 
> calculation & will split up large files. but otherwise, the partitioning is 
> being done more by the default values of the executing engine, rather than 
> any config data from the filesystem about what its "block size" is,
> NativeAzureFS does a better job; maybe that could be factored out to 
> hadoop-common and reused?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14943) Add common getFileBlockLocations() emulation for object stores, including S3A

2020-02-14 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-14943:

Parent Issue: HADOOP-16829  (was: HADOOP-15620)

> Add common getFileBlockLocations() emulation for object stores, including S3A
> -
>
> Key: HADOOP-14943
> URL: https://issues.apache.org/jira/browse/HADOOP-14943
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-14943-001.patch, HADOOP-14943-002.patch, 
> HADOOP-14943-002.patch, HADOOP-14943-003.patch, HADOOP-14943-004.patch
>
>
> It looks suspiciously like S3A isn't providing the partitioning data needed 
> in {{listLocatedStatus}} and {{getFileBlockLocations()}} needed to break up a 
> file by the blocksize. This will stop tools using the MRv1 APIS doing the 
> partitioning properly if the input format isn't doing it own split logic.
> FileInputFormat in MRv2 is a bit more configurable about input split 
> calculation & will split up large files. but otherwise, the partitioning is 
> being done more by the default values of the executing engine, rather than 
> any config data from the filesystem about what its "block size" is,
> NativeAzureFS does a better job; maybe that could be factored out to 
> hadoop-common and reused?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14943) Add common getFileBlockLocations() emulation for object stores, including S3A

2018-08-02 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-14943:

Parent Issue: HADOOP-15620  (was: HADOOP-15220)

> Add common getFileBlockLocations() emulation for object stores, including S3A
> -
>
> Key: HADOOP-14943
> URL: https://issues.apache.org/jira/browse/HADOOP-14943
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-14943-001.patch, HADOOP-14943-002.patch, 
> HADOOP-14943-002.patch, HADOOP-14943-003.patch, HADOOP-14943-004.patch
>
>
> It looks suspiciously like S3A isn't providing the partitioning data needed 
> in {{listLocatedStatus}} and {{getFileBlockLocations()}} needed to break up a 
> file by the blocksize. This will stop tools using the MRv1 APIS doing the 
> partitioning properly if the input format isn't doing it own split logic.
> FileInputFormat in MRv2 is a bit more configurable about input split 
> calculation & will split up large files. but otherwise, the partitioning is 
> being done more by the default values of the executing engine, rather than 
> any config data from the filesystem about what its "block size" is,
> NativeAzureFS does a better job; maybe that could be factored out to 
> hadoop-common and reused?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14943) Add common getFileBlockLocations() emulation for object stores, including S3A

2018-02-16 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-14943:

Parent Issue: HADOOP-15220  (was: HADOOP-14831)

> Add common getFileBlockLocations() emulation for object stores, including S3A
> -
>
> Key: HADOOP-14943
> URL: https://issues.apache.org/jira/browse/HADOOP-14943
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-14943-001.patch, HADOOP-14943-002.patch, 
> HADOOP-14943-002.patch, HADOOP-14943-003.patch, HADOOP-14943-004.patch
>
>
> It looks suspiciously like S3A isn't providing the partitioning data needed 
> in {{listLocatedStatus}} and {{getFileBlockLocations()}} needed to break up a 
> file by the blocksize. This will stop tools using the MRv1 APIS doing the 
> partitioning properly if the input format isn't doing it own split logic.
> FileInputFormat in MRv2 is a bit more configurable about input split 
> calculation & will split up large files. but otherwise, the partitioning is 
> being done more by the default values of the executing engine, rather than 
> any config data from the filesystem about what its "block size" is,
> NativeAzureFS does a better job; maybe that could be factored out to 
> hadoop-common and reused?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14943) Add common getFileBlockLocations() emulation for object stores, including S3A

2018-02-16 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-14943:

Target Version/s:   (was: 3.1.0)

> Add common getFileBlockLocations() emulation for object stores, including S3A
> -
>
> Key: HADOOP-14943
> URL: https://issues.apache.org/jira/browse/HADOOP-14943
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-14943-001.patch, HADOOP-14943-002.patch, 
> HADOOP-14943-002.patch, HADOOP-14943-003.patch, HADOOP-14943-004.patch
>
>
> It looks suspiciously like S3A isn't providing the partitioning data needed 
> in {{listLocatedStatus}} and {{getFileBlockLocations()}} needed to break up a 
> file by the blocksize. This will stop tools using the MRv1 APIS doing the 
> partitioning properly if the input format isn't doing it own split logic.
> FileInputFormat in MRv2 is a bit more configurable about input split 
> calculation & will split up large files. but otherwise, the partitioning is 
> being done more by the default values of the executing engine, rather than 
> any config data from the filesystem about what its "block size" is,
> NativeAzureFS does a better job; maybe that could be factored out to 
> hadoop-common and reused?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14943) Add common getFileBlockLocations() emulation for object stores, including S3A

2018-02-14 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-14943:

Status: Patch Available  (was: Open)

> Add common getFileBlockLocations() emulation for object stores, including S3A
> -
>
> Key: HADOOP-14943
> URL: https://issues.apache.org/jira/browse/HADOOP-14943
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-14943-001.patch, HADOOP-14943-002.patch, 
> HADOOP-14943-002.patch, HADOOP-14943-003.patch, HADOOP-14943-004.patch
>
>
> It looks suspiciously like S3A isn't providing the partitioning data needed 
> in {{listLocatedStatus}} and {{getFileBlockLocations()}} needed to break up a 
> file by the blocksize. This will stop tools using the MRv1 APIS doing the 
> partitioning properly if the input format isn't doing it own split logic.
> FileInputFormat in MRv2 is a bit more configurable about input split 
> calculation & will split up large files. but otherwise, the partitioning is 
> being done more by the default values of the executing engine, rather than 
> any config data from the filesystem about what its "block size" is,
> NativeAzureFS does a better job; maybe that could be factored out to 
> hadoop-common and reused?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14943) Add common getFileBlockLocations() emulation for object stores, including S3A

2018-02-14 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-14943:

Attachment: HADOOP-14943-004.patch

> Add common getFileBlockLocations() emulation for object stores, including S3A
> -
>
> Key: HADOOP-14943
> URL: https://issues.apache.org/jira/browse/HADOOP-14943
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-14943-001.patch, HADOOP-14943-002.patch, 
> HADOOP-14943-002.patch, HADOOP-14943-003.patch, HADOOP-14943-004.patch
>
>
> It looks suspiciously like S3A isn't providing the partitioning data needed 
> in {{listLocatedStatus}} and {{getFileBlockLocations()}} needed to break up a 
> file by the blocksize. This will stop tools using the MRv1 APIS doing the 
> partitioning properly if the input format isn't doing it own split logic.
> FileInputFormat in MRv2 is a bit more configurable about input split 
> calculation & will split up large files. but otherwise, the partitioning is 
> being done more by the default values of the executing engine, rather than 
> any config data from the filesystem about what its "block size" is,
> NativeAzureFS does a better job; maybe that could be factored out to 
> hadoop-common and reused?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14943) Add common getFileBlockLocations() emulation for object stores, including S3A

2018-02-14 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-14943:

Status: Open  (was: Patch Available)

> Add common getFileBlockLocations() emulation for object stores, including S3A
> -
>
> Key: HADOOP-14943
> URL: https://issues.apache.org/jira/browse/HADOOP-14943
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-14943-001.patch, HADOOP-14943-002.patch, 
> HADOOP-14943-002.patch, HADOOP-14943-003.patch
>
>
> It looks suspiciously like S3A isn't providing the partitioning data needed 
> in {{listLocatedStatus}} and {{getFileBlockLocations()}} needed to break up a 
> file by the blocksize. This will stop tools using the MRv1 APIS doing the 
> partitioning properly if the input format isn't doing it own split logic.
> FileInputFormat in MRv2 is a bit more configurable about input split 
> calculation & will split up large files. but otherwise, the partitioning is 
> being done more by the default values of the executing engine, rather than 
> any config data from the filesystem about what its "block size" is,
> NativeAzureFS does a better job; maybe that could be factored out to 
> hadoop-common and reused?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14943) Add common getFileBlockLocations() emulation for object stores, including S3A

2017-11-23 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-14943:

Attachment: HADOOP-14943-003.patch

utility class is now FileBlockLocationSupport; may be something to make an 
instantiable class so each instance can emulate locality.

testing: s3 london

> Add common getFileBlockLocations() emulation for object stores, including S3A
> -
>
> Key: HADOOP-14943
> URL: https://issues.apache.org/jira/browse/HADOOP-14943
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: HADOOP-14943-001.patch, HADOOP-14943-002.patch, 
> HADOOP-14943-002.patch, HADOOP-14943-003.patch
>
>
> It looks suspiciously like S3A isn't providing the partitioning data needed 
> in {{listLocatedStatus}} and {{getFileBlockLocations()}} needed to break up a 
> file by the blocksize. This will stop tools using the MRv1 APIS doing the 
> partitioning properly if the input format isn't doing it own split logic.
> FileInputFormat in MRv2 is a bit more configurable about input split 
> calculation & will split up large files. but otherwise, the partitioning is 
> being done more by the default values of the executing engine, rather than 
> any config data from the filesystem about what its "block size" is,
> NativeAzureFS does a better job; maybe that could be factored out to 
> hadoop-common and reused?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14943) Add common getFileBlockLocations() emulation for object stores, including S3A

2017-11-23 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-14943:

Status: Patch Available  (was: Open)

> Add common getFileBlockLocations() emulation for object stores, including S3A
> -
>
> Key: HADOOP-14943
> URL: https://issues.apache.org/jira/browse/HADOOP-14943
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: HADOOP-14943-001.patch, HADOOP-14943-002.patch, 
> HADOOP-14943-002.patch, HADOOP-14943-003.patch
>
>
> It looks suspiciously like S3A isn't providing the partitioning data needed 
> in {{listLocatedStatus}} and {{getFileBlockLocations()}} needed to break up a 
> file by the blocksize. This will stop tools using the MRv1 APIS doing the 
> partitioning properly if the input format isn't doing it own split logic.
> FileInputFormat in MRv2 is a bit more configurable about input split 
> calculation & will split up large files. but otherwise, the partitioning is 
> being done more by the default values of the executing engine, rather than 
> any config data from the filesystem about what its "block size" is,
> NativeAzureFS does a better job; maybe that could be factored out to 
> hadoop-common and reused?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14943) Add common getFileBlockLocations() emulation for object stores, including S3A

2017-11-23 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-14943:

Status: Open  (was: Patch Available)

> Add common getFileBlockLocations() emulation for object stores, including S3A
> -
>
> Key: HADOOP-14943
> URL: https://issues.apache.org/jira/browse/HADOOP-14943
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: HADOOP-14943-001.patch, HADOOP-14943-002.patch, 
> HADOOP-14943-002.patch
>
>
> It looks suspiciously like S3A isn't providing the partitioning data needed 
> in {{listLocatedStatus}} and {{getFileBlockLocations()}} needed to break up a 
> file by the blocksize. This will stop tools using the MRv1 APIS doing the 
> partitioning properly if the input format isn't doing it own split logic.
> FileInputFormat in MRv2 is a bit more configurable about input split 
> calculation & will split up large files. but otherwise, the partitioning is 
> being done more by the default values of the executing engine, rather than 
> any config data from the filesystem about what its "block size" is,
> NativeAzureFS does a better job; maybe that could be factored out to 
> hadoop-common and reused?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14943) Add common getFileBlockLocations() emulation for object stores, including S3A

2017-11-21 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-14943:

Parent Issue: HADOOP-14831  (was: HADOOP-13204)

> Add common getFileBlockLocations() emulation for object stores, including S3A
> -
>
> Key: HADOOP-14943
> URL: https://issues.apache.org/jira/browse/HADOOP-14943
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: HADOOP-14943-001.patch, HADOOP-14943-002.patch, 
> HADOOP-14943-002.patch
>
>
> It looks suspiciously like S3A isn't providing the partitioning data needed 
> in {{listLocatedStatus}} and {{getFileBlockLocations()}} needed to break up a 
> file by the blocksize. This will stop tools using the MRv1 APIS doing the 
> partitioning properly if the input format isn't doing it own split logic.
> FileInputFormat in MRv2 is a bit more configurable about input split 
> calculation & will split up large files. but otherwise, the partitioning is 
> being done more by the default values of the executing engine, rather than 
> any config data from the filesystem about what its "block size" is,
> NativeAzureFS does a better job; maybe that could be factored out to 
> hadoop-common and reused?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14943) Add common getFileBlockLocations() emulation for object stores, including S3A

2017-11-17 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-14943:

Attachment: HADOOP-14943-002.patch

> Add common getFileBlockLocations() emulation for object stores, including S3A
> -
>
> Key: HADOOP-14943
> URL: https://issues.apache.org/jira/browse/HADOOP-14943
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Critical
> Attachments: HADOOP-14943-001.patch, HADOOP-14943-002.patch, 
> HADOOP-14943-002.patch
>
>
> It looks suspiciously like S3A isn't providing the partitioning data needed 
> in {{listLocatedStatus}} and {{getFileBlockLocations()}} needed to break up a 
> file by the blocksize. This will stop tools using the MRv1 APIS doing the 
> partitioning properly if the input format isn't doing it own split logic.
> FileInputFormat in MRv2 is a bit more configurable about input split 
> calculation & will split up large files. but otherwise, the partitioning is 
> being done more by the default values of the executing engine, rather than 
> any config data from the filesystem about what its "block size" is,
> NativeAzureFS does a better job; maybe that could be factored out to 
> hadoop-common and reused?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14943) Add common getFileBlockLocations() emulation for object stores, including S3A

2017-11-17 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-14943:

Status: Open  (was: Patch Available)

> Add common getFileBlockLocations() emulation for object stores, including S3A
> -
>
> Key: HADOOP-14943
> URL: https://issues.apache.org/jira/browse/HADOOP-14943
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Critical
> Attachments: HADOOP-14943-001.patch, HADOOP-14943-002.patch, 
> HADOOP-14943-002.patch
>
>
> It looks suspiciously like S3A isn't providing the partitioning data needed 
> in {{listLocatedStatus}} and {{getFileBlockLocations()}} needed to break up a 
> file by the blocksize. This will stop tools using the MRv1 APIS doing the 
> partitioning properly if the input format isn't doing it own split logic.
> FileInputFormat in MRv2 is a bit more configurable about input split 
> calculation & will split up large files. but otherwise, the partitioning is 
> being done more by the default values of the executing engine, rather than 
> any config data from the filesystem about what its "block size" is,
> NativeAzureFS does a better job; maybe that could be factored out to 
> hadoop-common and reused?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14943) Add common getFileBlockLocations() emulation for object stores, including S3A

2017-11-17 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-14943:

Priority: Major  (was: Critical)

> Add common getFileBlockLocations() emulation for object stores, including S3A
> -
>
> Key: HADOOP-14943
> URL: https://issues.apache.org/jira/browse/HADOOP-14943
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: HADOOP-14943-001.patch, HADOOP-14943-002.patch, 
> HADOOP-14943-002.patch
>
>
> It looks suspiciously like S3A isn't providing the partitioning data needed 
> in {{listLocatedStatus}} and {{getFileBlockLocations()}} needed to break up a 
> file by the blocksize. This will stop tools using the MRv1 APIS doing the 
> partitioning properly if the input format isn't doing it own split logic.
> FileInputFormat in MRv2 is a bit more configurable about input split 
> calculation & will split up large files. but otherwise, the partitioning is 
> being done more by the default values of the executing engine, rather than 
> any config data from the filesystem about what its "block size" is,
> NativeAzureFS does a better job; maybe that could be factored out to 
> hadoop-common and reused?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14943) Add common getFileBlockLocations() emulation for object stores, including S3A

2017-11-17 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-14943:

Status: Patch Available  (was: Open)

> Add common getFileBlockLocations() emulation for object stores, including S3A
> -
>
> Key: HADOOP-14943
> URL: https://issues.apache.org/jira/browse/HADOOP-14943
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Critical
> Attachments: HADOOP-14943-001.patch, HADOOP-14943-002.patch, 
> HADOOP-14943-002.patch
>
>
> It looks suspiciously like S3A isn't providing the partitioning data needed 
> in {{listLocatedStatus}} and {{getFileBlockLocations()}} needed to break up a 
> file by the blocksize. This will stop tools using the MRv1 APIS doing the 
> partitioning properly if the input format isn't doing it own split logic.
> FileInputFormat in MRv2 is a bit more configurable about input split 
> calculation & will split up large files. but otherwise, the partitioning is 
> being done more by the default values of the executing engine, rather than 
> any config data from the filesystem about what its "block size" is,
> NativeAzureFS does a better job; maybe that could be factored out to 
> hadoop-common and reused?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14943) Add common getFileBlockLocations() emulation for object stores, including S3A

2017-11-16 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-14943:

Target Version/s: 3.1.0
  Status: Patch Available  (was: Open)

> Add common getFileBlockLocations() emulation for object stores, including S3A
> -
>
> Key: HADOOP-14943
> URL: https://issues.apache.org/jira/browse/HADOOP-14943
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Critical
> Attachments: HADOOP-14943-001.patch, HADOOP-14943-002.patch
>
>
> It looks suspiciously like S3A isn't providing the partitioning data needed 
> in {{listLocatedStatus}} and {{getFileBlockLocations()}} needed to break up a 
> file by the blocksize. This will stop tools using the MRv1 APIS doing the 
> partitioning properly if the input format isn't doing it own split logic.
> FileInputFormat in MRv2 is a bit more configurable about input split 
> calculation & will split up large files. but otherwise, the partitioning is 
> being done more by the default values of the executing engine, rather than 
> any config data from the filesystem about what its "block size" is,
> NativeAzureFS does a better job; maybe that could be factored out to 
> hadoop-common and reused?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14943) Add common getFileBlockLocations() emulation for object stores, including S3A

2017-11-16 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-14943:

Attachment: HADOOP-14943-002.patch

HADOOP-14943 patch 002
* Adds new StoreUtils class in hadoop-common, intended to be home for utils to 
help object stores
* Move code in NativeAzureFileSystem to work out block locations into it
* Add unit tests in hadoop common
* Fix range problem in the copied code (HADOOP-15044)
* Wasb: move to new implementation
* S3A: implement getFileBlockLocations(); extend TestS3AInputStreamPerformance, 
as it is expected to have a test file > the block size of the FS.

With the shared code there's less stuff to test and maintain, easier for other 
implementations to adopt.

Testing: S3A ireland (s3guard/auth => 5:44 test run) and WASB ireland

> Add common getFileBlockLocations() emulation for object stores, including S3A
> -
>
> Key: HADOOP-14943
> URL: https://issues.apache.org/jira/browse/HADOOP-14943
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Critical
> Attachments: HADOOP-14943-001.patch, HADOOP-14943-002.patch
>
>
> It looks suspiciously like S3A isn't providing the partitioning data needed 
> in {{listLocatedStatus}} and {{getFileBlockLocations()}} needed to break up a 
> file by the blocksize. This will stop tools using the MRv1 APIS doing the 
> partitioning properly if the input format isn't doing it own split logic.
> FileInputFormat in MRv2 is a bit more configurable about input split 
> calculation & will split up large files. but otherwise, the partitioning is 
> being done more by the default values of the executing engine, rather than 
> any config data from the filesystem about what its "block size" is,
> NativeAzureFS does a better job; maybe that could be factored out to 
> hadoop-common and reused?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14943) Add common getFileBlockLocations() emulation for object stores, including S3A

2017-11-16 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-14943:

Summary: Add common getFileBlockLocations() emulation for object stores, 
including S3A  (was: S3A to implement getFileBlockLocations() for mapred 
partitioning)

> Add common getFileBlockLocations() emulation for object stores, including S3A
> -
>
> Key: HADOOP-14943
> URL: https://issues.apache.org/jira/browse/HADOOP-14943
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Critical
> Attachments: HADOOP-14943-001.patch
>
>
> It looks suspiciously like S3A isn't providing the partitioning data needed 
> in {{listLocatedStatus}} and {{getFileBlockLocations()}} needed to break up a 
> file by the blocksize. This will stop tools using the MRv1 APIS doing the 
> partitioning properly if the input format isn't doing it own split logic.
> FileInputFormat in MRv2 is a bit more configurable about input split 
> calculation & will split up large files. but otherwise, the partitioning is 
> being done more by the default values of the executing engine, rather than 
> any config data from the filesystem about what its "block size" is,
> NativeAzureFS does a better job; maybe that could be factored out to 
> hadoop-common and reused?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org