[jira] [Updated] (HADOOP-14943) Add common getFileBlockLocations() emulation for object stores, including S3A
[ https://issues.apache.org/jira/browse/HADOOP-14943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-14943: Parent: HADOOP-18067 (was: HADOOP-17566) > Add common getFileBlockLocations() emulation for object stores, including S3A > - > > Key: HADOOP-14943 > URL: https://issues.apache.org/jira/browse/HADOOP-14943 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.8.1 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Attachments: HADOOP-14943-001.patch, HADOOP-14943-002.patch, > HADOOP-14943-002.patch, HADOOP-14943-003.patch, HADOOP-14943-004.patch > > > It looks suspiciously like S3A isn't providing the partitioning data needed > in {{listLocatedStatus}} and {{getFileBlockLocations()}} needed to break up a > file by the blocksize. This will stop tools using the MRv1 APIS doing the > partitioning properly if the input format isn't doing it own split logic. > FileInputFormat in MRv2 is a bit more configurable about input split > calculation & will split up large files. but otherwise, the partitioning is > being done more by the default values of the executing engine, rather than > any config data from the filesystem about what its "block size" is, > NativeAzureFS does a better job; maybe that could be factored out to > hadoop-common and reused? -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14943) Add common getFileBlockLocations() emulation for object stores, including S3A
[ https://issues.apache.org/jira/browse/HADOOP-14943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-14943: Priority: Minor (was: Major) > Add common getFileBlockLocations() emulation for object stores, including S3A > - > > Key: HADOOP-14943 > URL: https://issues.apache.org/jira/browse/HADOOP-14943 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.8.1 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Attachments: HADOOP-14943-001.patch, HADOOP-14943-002.patch, > HADOOP-14943-002.patch, HADOOP-14943-003.patch, HADOOP-14943-004.patch > > > It looks suspiciously like S3A isn't providing the partitioning data needed > in {{listLocatedStatus}} and {{getFileBlockLocations()}} needed to break up a > file by the blocksize. This will stop tools using the MRv1 APIS doing the > partitioning properly if the input format isn't doing it own split logic. > FileInputFormat in MRv2 is a bit more configurable about input split > calculation & will split up large files. but otherwise, the partitioning is > being done more by the default values of the executing engine, rather than > any config data from the filesystem about what its "block size" is, > NativeAzureFS does a better job; maybe that could be factored out to > hadoop-common and reused? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14943) Add common getFileBlockLocations() emulation for object stores, including S3A
[ https://issues.apache.org/jira/browse/HADOOP-14943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-14943: Parent Issue: HADOOP-16829 (was: HADOOP-15620) > Add common getFileBlockLocations() emulation for object stores, including S3A > - > > Key: HADOOP-14943 > URL: https://issues.apache.org/jira/browse/HADOOP-14943 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.8.1 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Attachments: HADOOP-14943-001.patch, HADOOP-14943-002.patch, > HADOOP-14943-002.patch, HADOOP-14943-003.patch, HADOOP-14943-004.patch > > > It looks suspiciously like S3A isn't providing the partitioning data needed > in {{listLocatedStatus}} and {{getFileBlockLocations()}} needed to break up a > file by the blocksize. This will stop tools using the MRv1 APIS doing the > partitioning properly if the input format isn't doing it own split logic. > FileInputFormat in MRv2 is a bit more configurable about input split > calculation & will split up large files. but otherwise, the partitioning is > being done more by the default values of the executing engine, rather than > any config data from the filesystem about what its "block size" is, > NativeAzureFS does a better job; maybe that could be factored out to > hadoop-common and reused? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14943) Add common getFileBlockLocations() emulation for object stores, including S3A
[ https://issues.apache.org/jira/browse/HADOOP-14943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-14943: Parent Issue: HADOOP-15620 (was: HADOOP-15220) > Add common getFileBlockLocations() emulation for object stores, including S3A > - > > Key: HADOOP-14943 > URL: https://issues.apache.org/jira/browse/HADOOP-14943 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.8.1 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Attachments: HADOOP-14943-001.patch, HADOOP-14943-002.patch, > HADOOP-14943-002.patch, HADOOP-14943-003.patch, HADOOP-14943-004.patch > > > It looks suspiciously like S3A isn't providing the partitioning data needed > in {{listLocatedStatus}} and {{getFileBlockLocations()}} needed to break up a > file by the blocksize. This will stop tools using the MRv1 APIS doing the > partitioning properly if the input format isn't doing it own split logic. > FileInputFormat in MRv2 is a bit more configurable about input split > calculation & will split up large files. but otherwise, the partitioning is > being done more by the default values of the executing engine, rather than > any config data from the filesystem about what its "block size" is, > NativeAzureFS does a better job; maybe that could be factored out to > hadoop-common and reused? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14943) Add common getFileBlockLocations() emulation for object stores, including S3A
[ https://issues.apache.org/jira/browse/HADOOP-14943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-14943: Parent Issue: HADOOP-15220 (was: HADOOP-14831) > Add common getFileBlockLocations() emulation for object stores, including S3A > - > > Key: HADOOP-14943 > URL: https://issues.apache.org/jira/browse/HADOOP-14943 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.8.1 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Attachments: HADOOP-14943-001.patch, HADOOP-14943-002.patch, > HADOOP-14943-002.patch, HADOOP-14943-003.patch, HADOOP-14943-004.patch > > > It looks suspiciously like S3A isn't providing the partitioning data needed > in {{listLocatedStatus}} and {{getFileBlockLocations()}} needed to break up a > file by the blocksize. This will stop tools using the MRv1 APIS doing the > partitioning properly if the input format isn't doing it own split logic. > FileInputFormat in MRv2 is a bit more configurable about input split > calculation & will split up large files. but otherwise, the partitioning is > being done more by the default values of the executing engine, rather than > any config data from the filesystem about what its "block size" is, > NativeAzureFS does a better job; maybe that could be factored out to > hadoop-common and reused? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14943) Add common getFileBlockLocations() emulation for object stores, including S3A
[ https://issues.apache.org/jira/browse/HADOOP-14943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-14943: Target Version/s: (was: 3.1.0) > Add common getFileBlockLocations() emulation for object stores, including S3A > - > > Key: HADOOP-14943 > URL: https://issues.apache.org/jira/browse/HADOOP-14943 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.8.1 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Attachments: HADOOP-14943-001.patch, HADOOP-14943-002.patch, > HADOOP-14943-002.patch, HADOOP-14943-003.patch, HADOOP-14943-004.patch > > > It looks suspiciously like S3A isn't providing the partitioning data needed > in {{listLocatedStatus}} and {{getFileBlockLocations()}} needed to break up a > file by the blocksize. This will stop tools using the MRv1 APIS doing the > partitioning properly if the input format isn't doing it own split logic. > FileInputFormat in MRv2 is a bit more configurable about input split > calculation & will split up large files. but otherwise, the partitioning is > being done more by the default values of the executing engine, rather than > any config data from the filesystem about what its "block size" is, > NativeAzureFS does a better job; maybe that could be factored out to > hadoop-common and reused? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14943) Add common getFileBlockLocations() emulation for object stores, including S3A
[ https://issues.apache.org/jira/browse/HADOOP-14943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-14943: Status: Patch Available (was: Open) > Add common getFileBlockLocations() emulation for object stores, including S3A > - > > Key: HADOOP-14943 > URL: https://issues.apache.org/jira/browse/HADOOP-14943 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.8.1 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Attachments: HADOOP-14943-001.patch, HADOOP-14943-002.patch, > HADOOP-14943-002.patch, HADOOP-14943-003.patch, HADOOP-14943-004.patch > > > It looks suspiciously like S3A isn't providing the partitioning data needed > in {{listLocatedStatus}} and {{getFileBlockLocations()}} needed to break up a > file by the blocksize. This will stop tools using the MRv1 APIS doing the > partitioning properly if the input format isn't doing it own split logic. > FileInputFormat in MRv2 is a bit more configurable about input split > calculation & will split up large files. but otherwise, the partitioning is > being done more by the default values of the executing engine, rather than > any config data from the filesystem about what its "block size" is, > NativeAzureFS does a better job; maybe that could be factored out to > hadoop-common and reused? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14943) Add common getFileBlockLocations() emulation for object stores, including S3A
[ https://issues.apache.org/jira/browse/HADOOP-14943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-14943: Attachment: HADOOP-14943-004.patch > Add common getFileBlockLocations() emulation for object stores, including S3A > - > > Key: HADOOP-14943 > URL: https://issues.apache.org/jira/browse/HADOOP-14943 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.8.1 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Attachments: HADOOP-14943-001.patch, HADOOP-14943-002.patch, > HADOOP-14943-002.patch, HADOOP-14943-003.patch, HADOOP-14943-004.patch > > > It looks suspiciously like S3A isn't providing the partitioning data needed > in {{listLocatedStatus}} and {{getFileBlockLocations()}} needed to break up a > file by the blocksize. This will stop tools using the MRv1 APIS doing the > partitioning properly if the input format isn't doing it own split logic. > FileInputFormat in MRv2 is a bit more configurable about input split > calculation & will split up large files. but otherwise, the partitioning is > being done more by the default values of the executing engine, rather than > any config data from the filesystem about what its "block size" is, > NativeAzureFS does a better job; maybe that could be factored out to > hadoop-common and reused? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14943) Add common getFileBlockLocations() emulation for object stores, including S3A
[ https://issues.apache.org/jira/browse/HADOOP-14943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-14943: Status: Open (was: Patch Available) > Add common getFileBlockLocations() emulation for object stores, including S3A > - > > Key: HADOOP-14943 > URL: https://issues.apache.org/jira/browse/HADOOP-14943 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.8.1 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Attachments: HADOOP-14943-001.patch, HADOOP-14943-002.patch, > HADOOP-14943-002.patch, HADOOP-14943-003.patch > > > It looks suspiciously like S3A isn't providing the partitioning data needed > in {{listLocatedStatus}} and {{getFileBlockLocations()}} needed to break up a > file by the blocksize. This will stop tools using the MRv1 APIS doing the > partitioning properly if the input format isn't doing it own split logic. > FileInputFormat in MRv2 is a bit more configurable about input split > calculation & will split up large files. but otherwise, the partitioning is > being done more by the default values of the executing engine, rather than > any config data from the filesystem about what its "block size" is, > NativeAzureFS does a better job; maybe that could be factored out to > hadoop-common and reused? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14943) Add common getFileBlockLocations() emulation for object stores, including S3A
[ https://issues.apache.org/jira/browse/HADOOP-14943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-14943: Attachment: HADOOP-14943-003.patch utility class is now FileBlockLocationSupport; may be something to make an instantiable class so each instance can emulate locality. testing: s3 london > Add common getFileBlockLocations() emulation for object stores, including S3A > - > > Key: HADOOP-14943 > URL: https://issues.apache.org/jira/browse/HADOOP-14943 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.8.1 >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: HADOOP-14943-001.patch, HADOOP-14943-002.patch, > HADOOP-14943-002.patch, HADOOP-14943-003.patch > > > It looks suspiciously like S3A isn't providing the partitioning data needed > in {{listLocatedStatus}} and {{getFileBlockLocations()}} needed to break up a > file by the blocksize. This will stop tools using the MRv1 APIS doing the > partitioning properly if the input format isn't doing it own split logic. > FileInputFormat in MRv2 is a bit more configurable about input split > calculation & will split up large files. but otherwise, the partitioning is > being done more by the default values of the executing engine, rather than > any config data from the filesystem about what its "block size" is, > NativeAzureFS does a better job; maybe that could be factored out to > hadoop-common and reused? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14943) Add common getFileBlockLocations() emulation for object stores, including S3A
[ https://issues.apache.org/jira/browse/HADOOP-14943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-14943: Status: Patch Available (was: Open) > Add common getFileBlockLocations() emulation for object stores, including S3A > - > > Key: HADOOP-14943 > URL: https://issues.apache.org/jira/browse/HADOOP-14943 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.8.1 >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: HADOOP-14943-001.patch, HADOOP-14943-002.patch, > HADOOP-14943-002.patch, HADOOP-14943-003.patch > > > It looks suspiciously like S3A isn't providing the partitioning data needed > in {{listLocatedStatus}} and {{getFileBlockLocations()}} needed to break up a > file by the blocksize. This will stop tools using the MRv1 APIS doing the > partitioning properly if the input format isn't doing it own split logic. > FileInputFormat in MRv2 is a bit more configurable about input split > calculation & will split up large files. but otherwise, the partitioning is > being done more by the default values of the executing engine, rather than > any config data from the filesystem about what its "block size" is, > NativeAzureFS does a better job; maybe that could be factored out to > hadoop-common and reused? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14943) Add common getFileBlockLocations() emulation for object stores, including S3A
[ https://issues.apache.org/jira/browse/HADOOP-14943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-14943: Status: Open (was: Patch Available) > Add common getFileBlockLocations() emulation for object stores, including S3A > - > > Key: HADOOP-14943 > URL: https://issues.apache.org/jira/browse/HADOOP-14943 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.8.1 >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: HADOOP-14943-001.patch, HADOOP-14943-002.patch, > HADOOP-14943-002.patch > > > It looks suspiciously like S3A isn't providing the partitioning data needed > in {{listLocatedStatus}} and {{getFileBlockLocations()}} needed to break up a > file by the blocksize. This will stop tools using the MRv1 APIS doing the > partitioning properly if the input format isn't doing it own split logic. > FileInputFormat in MRv2 is a bit more configurable about input split > calculation & will split up large files. but otherwise, the partitioning is > being done more by the default values of the executing engine, rather than > any config data from the filesystem about what its "block size" is, > NativeAzureFS does a better job; maybe that could be factored out to > hadoop-common and reused? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14943) Add common getFileBlockLocations() emulation for object stores, including S3A
[ https://issues.apache.org/jira/browse/HADOOP-14943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-14943: Parent Issue: HADOOP-14831 (was: HADOOP-13204) > Add common getFileBlockLocations() emulation for object stores, including S3A > - > > Key: HADOOP-14943 > URL: https://issues.apache.org/jira/browse/HADOOP-14943 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.8.1 >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: HADOOP-14943-001.patch, HADOOP-14943-002.patch, > HADOOP-14943-002.patch > > > It looks suspiciously like S3A isn't providing the partitioning data needed > in {{listLocatedStatus}} and {{getFileBlockLocations()}} needed to break up a > file by the blocksize. This will stop tools using the MRv1 APIS doing the > partitioning properly if the input format isn't doing it own split logic. > FileInputFormat in MRv2 is a bit more configurable about input split > calculation & will split up large files. but otherwise, the partitioning is > being done more by the default values of the executing engine, rather than > any config data from the filesystem about what its "block size" is, > NativeAzureFS does a better job; maybe that could be factored out to > hadoop-common and reused? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14943) Add common getFileBlockLocations() emulation for object stores, including S3A
[ https://issues.apache.org/jira/browse/HADOOP-14943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-14943: Attachment: HADOOP-14943-002.patch > Add common getFileBlockLocations() emulation for object stores, including S3A > - > > Key: HADOOP-14943 > URL: https://issues.apache.org/jira/browse/HADOOP-14943 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.8.1 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Critical > Attachments: HADOOP-14943-001.patch, HADOOP-14943-002.patch, > HADOOP-14943-002.patch > > > It looks suspiciously like S3A isn't providing the partitioning data needed > in {{listLocatedStatus}} and {{getFileBlockLocations()}} needed to break up a > file by the blocksize. This will stop tools using the MRv1 APIS doing the > partitioning properly if the input format isn't doing it own split logic. > FileInputFormat in MRv2 is a bit more configurable about input split > calculation & will split up large files. but otherwise, the partitioning is > being done more by the default values of the executing engine, rather than > any config data from the filesystem about what its "block size" is, > NativeAzureFS does a better job; maybe that could be factored out to > hadoop-common and reused? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14943) Add common getFileBlockLocations() emulation for object stores, including S3A
[ https://issues.apache.org/jira/browse/HADOOP-14943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-14943: Status: Open (was: Patch Available) > Add common getFileBlockLocations() emulation for object stores, including S3A > - > > Key: HADOOP-14943 > URL: https://issues.apache.org/jira/browse/HADOOP-14943 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.8.1 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Critical > Attachments: HADOOP-14943-001.patch, HADOOP-14943-002.patch, > HADOOP-14943-002.patch > > > It looks suspiciously like S3A isn't providing the partitioning data needed > in {{listLocatedStatus}} and {{getFileBlockLocations()}} needed to break up a > file by the blocksize. This will stop tools using the MRv1 APIS doing the > partitioning properly if the input format isn't doing it own split logic. > FileInputFormat in MRv2 is a bit more configurable about input split > calculation & will split up large files. but otherwise, the partitioning is > being done more by the default values of the executing engine, rather than > any config data from the filesystem about what its "block size" is, > NativeAzureFS does a better job; maybe that could be factored out to > hadoop-common and reused? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14943) Add common getFileBlockLocations() emulation for object stores, including S3A
[ https://issues.apache.org/jira/browse/HADOOP-14943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-14943: Priority: Major (was: Critical) > Add common getFileBlockLocations() emulation for object stores, including S3A > - > > Key: HADOOP-14943 > URL: https://issues.apache.org/jira/browse/HADOOP-14943 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.8.1 >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: HADOOP-14943-001.patch, HADOOP-14943-002.patch, > HADOOP-14943-002.patch > > > It looks suspiciously like S3A isn't providing the partitioning data needed > in {{listLocatedStatus}} and {{getFileBlockLocations()}} needed to break up a > file by the blocksize. This will stop tools using the MRv1 APIS doing the > partitioning properly if the input format isn't doing it own split logic. > FileInputFormat in MRv2 is a bit more configurable about input split > calculation & will split up large files. but otherwise, the partitioning is > being done more by the default values of the executing engine, rather than > any config data from the filesystem about what its "block size" is, > NativeAzureFS does a better job; maybe that could be factored out to > hadoop-common and reused? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14943) Add common getFileBlockLocations() emulation for object stores, including S3A
[ https://issues.apache.org/jira/browse/HADOOP-14943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-14943: Status: Patch Available (was: Open) > Add common getFileBlockLocations() emulation for object stores, including S3A > - > > Key: HADOOP-14943 > URL: https://issues.apache.org/jira/browse/HADOOP-14943 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.8.1 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Critical > Attachments: HADOOP-14943-001.patch, HADOOP-14943-002.patch, > HADOOP-14943-002.patch > > > It looks suspiciously like S3A isn't providing the partitioning data needed > in {{listLocatedStatus}} and {{getFileBlockLocations()}} needed to break up a > file by the blocksize. This will stop tools using the MRv1 APIS doing the > partitioning properly if the input format isn't doing it own split logic. > FileInputFormat in MRv2 is a bit more configurable about input split > calculation & will split up large files. but otherwise, the partitioning is > being done more by the default values of the executing engine, rather than > any config data from the filesystem about what its "block size" is, > NativeAzureFS does a better job; maybe that could be factored out to > hadoop-common and reused? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14943) Add common getFileBlockLocations() emulation for object stores, including S3A
[ https://issues.apache.org/jira/browse/HADOOP-14943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-14943: Target Version/s: 3.1.0 Status: Patch Available (was: Open) > Add common getFileBlockLocations() emulation for object stores, including S3A > - > > Key: HADOOP-14943 > URL: https://issues.apache.org/jira/browse/HADOOP-14943 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.8.1 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Critical > Attachments: HADOOP-14943-001.patch, HADOOP-14943-002.patch > > > It looks suspiciously like S3A isn't providing the partitioning data needed > in {{listLocatedStatus}} and {{getFileBlockLocations()}} needed to break up a > file by the blocksize. This will stop tools using the MRv1 APIS doing the > partitioning properly if the input format isn't doing it own split logic. > FileInputFormat in MRv2 is a bit more configurable about input split > calculation & will split up large files. but otherwise, the partitioning is > being done more by the default values of the executing engine, rather than > any config data from the filesystem about what its "block size" is, > NativeAzureFS does a better job; maybe that could be factored out to > hadoop-common and reused? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14943) Add common getFileBlockLocations() emulation for object stores, including S3A
[ https://issues.apache.org/jira/browse/HADOOP-14943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-14943: Attachment: HADOOP-14943-002.patch HADOOP-14943 patch 002 * Adds new StoreUtils class in hadoop-common, intended to be home for utils to help object stores * Move code in NativeAzureFileSystem to work out block locations into it * Add unit tests in hadoop common * Fix range problem in the copied code (HADOOP-15044) * Wasb: move to new implementation * S3A: implement getFileBlockLocations(); extend TestS3AInputStreamPerformance, as it is expected to have a test file > the block size of the FS. With the shared code there's less stuff to test and maintain, easier for other implementations to adopt. Testing: S3A ireland (s3guard/auth => 5:44 test run) and WASB ireland > Add common getFileBlockLocations() emulation for object stores, including S3A > - > > Key: HADOOP-14943 > URL: https://issues.apache.org/jira/browse/HADOOP-14943 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.8.1 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Critical > Attachments: HADOOP-14943-001.patch, HADOOP-14943-002.patch > > > It looks suspiciously like S3A isn't providing the partitioning data needed > in {{listLocatedStatus}} and {{getFileBlockLocations()}} needed to break up a > file by the blocksize. This will stop tools using the MRv1 APIS doing the > partitioning properly if the input format isn't doing it own split logic. > FileInputFormat in MRv2 is a bit more configurable about input split > calculation & will split up large files. but otherwise, the partitioning is > being done more by the default values of the executing engine, rather than > any config data from the filesystem about what its "block size" is, > NativeAzureFS does a better job; maybe that could be factored out to > hadoop-common and reused? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14943) Add common getFileBlockLocations() emulation for object stores, including S3A
[ https://issues.apache.org/jira/browse/HADOOP-14943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-14943: Summary: Add common getFileBlockLocations() emulation for object stores, including S3A (was: S3A to implement getFileBlockLocations() for mapred partitioning) > Add common getFileBlockLocations() emulation for object stores, including S3A > - > > Key: HADOOP-14943 > URL: https://issues.apache.org/jira/browse/HADOOP-14943 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.8.1 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Critical > Attachments: HADOOP-14943-001.patch > > > It looks suspiciously like S3A isn't providing the partitioning data needed > in {{listLocatedStatus}} and {{getFileBlockLocations()}} needed to break up a > file by the blocksize. This will stop tools using the MRv1 APIS doing the > partitioning properly if the input format isn't doing it own split logic. > FileInputFormat in MRv2 is a bit more configurable about input split > calculation & will split up large files. but otherwise, the partitioning is > being done more by the default values of the executing engine, rather than > any config data from the filesystem about what its "block size" is, > NativeAzureFS does a better job; maybe that could be factored out to > hadoop-common and reused? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org