[jira] [Updated] (HADOOP-13282) S3 blob etags to be made visible in status/getFileChecksum() calls
[ https://issues.apache.org/jira/browse/HADOOP-13282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-13282: Status: Patch Available (was: Open) > S3 blob etags to be made visible in status/getFileChecksum() calls > -- > > Key: HADOOP-13282 > URL: https://issues.apache.org/jira/browse/HADOOP-13282 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.9.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Attachments: HADOOP-13282-001.patch, HADOOP-13282-002.patch, > HADOOP-13282-003.patch, HADOOP-13282-004.patch > > > If the etags of blobs were exported via {{getFileChecksum()}}, it'd be > possible to probe for a blob being in sync with a local file. Distcp could > use this to decide whether to skip a file or not. > Now, there's a problem there: distcp needs source and dest filesystems to > implement the same algorithm. It'd only work out the box if you were copying > between S3 instances. There are also quirks with encryption and multipart: > [s3 > docs|http://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonResponseHeaders.html]. > At the very least, it's something which could be used when indexing the FS, > to check for changes later. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13282) S3 blob etags to be made visible in status/getFileChecksum() calls
[ https://issues.apache.org/jira/browse/HADOOP-13282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-13282: Attachment: HADOOP-13282-004.patch Patch 004 This applies to s3a trunk, uses once() to translate the underlying getObjectMetadata call (which is retryraw). Test: s3 london with default encryption > S3 blob etags to be made visible in status/getFileChecksum() calls > -- > > Key: HADOOP-13282 > URL: https://issues.apache.org/jira/browse/HADOOP-13282 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.9.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Attachments: HADOOP-13282-001.patch, HADOOP-13282-002.patch, > HADOOP-13282-003.patch, HADOOP-13282-004.patch > > > If the etags of blobs were exported via {{getFileChecksum()}}, it'd be > possible to probe for a blob being in sync with a local file. Distcp could > use this to decide whether to skip a file or not. > Now, there's a problem there: distcp needs source and dest filesystems to > implement the same algorithm. It'd only work out the box if you were copying > between S3 instances. There are also quirks with encryption and multipart: > [s3 > docs|http://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonResponseHeaders.html]. > At the very least, it's something which could be used when indexing the FS, > to check for changes later. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13282) S3 blob etags to be made visible in status/getFileChecksum() calls
[ https://issues.apache.org/jira/browse/HADOOP-13282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-13282: Attachment: HADOOP-13282-003.patch patch 003; 002 with two checkstyle indentations reported against patch 001 fixed > S3 blob etags to be made visible in status/getFileChecksum() calls > -- > > Key: HADOOP-13282 > URL: https://issues.apache.org/jira/browse/HADOOP-13282 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.9.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Attachments: HADOOP-13282-001.patch, HADOOP-13282-002.patch, > HADOOP-13282-003.patch > > > If the etags of blobs were exported via {{getFileChecksum()}}, it'd be > possible to probe for a blob being in sync with a local file. Distcp could > use this to decide whether to skip a file or not. > Now, there's a problem there: distcp needs source and dest filesystems to > implement the same algorithm. It'd only work out the box if you were copying > between S3 instances. There are also quirks with encryption and multipart: > [s3 > docs|http://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonResponseHeaders.html]. > At the very least, it's something which could be used when indexing the FS, > to check for changes later. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13282) S3 blob etags to be made visible in status/getFileChecksum() calls
[ https://issues.apache.org/jira/browse/HADOOP-13282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-13282: Status: Open (was: Patch Available) > S3 blob etags to be made visible in status/getFileChecksum() calls > -- > > Key: HADOOP-13282 > URL: https://issues.apache.org/jira/browse/HADOOP-13282 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.9.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Attachments: HADOOP-13282-001.patch, HADOOP-13282-002.patch > > > If the etags of blobs were exported via {{getFileChecksum()}}, it'd be > possible to probe for a blob being in sync with a local file. Distcp could > use this to decide whether to skip a file or not. > Now, there's a problem there: distcp needs source and dest filesystems to > implement the same algorithm. It'd only work out the box if you were copying > between S3 instances. There are also quirks with encryption and multipart: > [s3 > docs|http://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonResponseHeaders.html]. > At the very least, it's something which could be used when indexing the FS, > to check for changes later. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13282) S3 blob etags to be made visible in status/getFileChecksum() calls
[ https://issues.apache.org/jira/browse/HADOOP-13282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-13282: Status: Patch Available (was: Open) > S3 blob etags to be made visible in status/getFileChecksum() calls > -- > > Key: HADOOP-13282 > URL: https://issues.apache.org/jira/browse/HADOOP-13282 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.9.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Attachments: HADOOP-13282-001.patch, HADOOP-13282-002.patch > > > If the etags of blobs were exported via {{getFileChecksum()}}, it'd be > possible to probe for a blob being in sync with a local file. Distcp could > use this to decide whether to skip a file or not. > Now, there's a problem there: distcp needs source and dest filesystems to > implement the same algorithm. It'd only work out the box if you were copying > between S3 instances. There are also quirks with encryption and multipart: > [s3 > docs|http://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonResponseHeaders.html]. > At the very least, it's something which could be used when indexing the FS, > to check for changes later. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13282) S3 blob etags to be made visible in status/getFileChecksum() calls
[ https://issues.apache.org/jira/browse/HADOOP-13282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-13282: Attachment: HADOOP-13282-002.patch HADOOP-13282: etag support for s3a. * Move the EtagChecksum class into a new fs.store package in hadoop common for use by other stores * add tests there on its core equality/round trip operations * Add a set of ITests for the S3A use. One of these tests is skipped if the FS is known to be encrypted, in case the bucket returns different etags here. To aid: added a getter for the S3AFS encryption algorithm. With these tags, you can assume that if an object's etag changes, it is different. You cannot safely use it to conclude that other objects, especially across stores, are equivalent. (note this patch reorders all the headers in ITestS3AMiscOperations. They'd got out of order, and as it's a low-patch, low-conflict file, I've taken the chance to fix it) Tested S3 London with encryption turned on; s3 ireland without > S3 blob etags to be made visible in status/getFileChecksum() calls > -- > > Key: HADOOP-13282 > URL: https://issues.apache.org/jira/browse/HADOOP-13282 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.9.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Attachments: HADOOP-13282-001.patch, HADOOP-13282-002.patch > > > If the etags of blobs were exported via {{getFileChecksum()}}, it'd be > possible to probe for a blob being in sync with a local file. Distcp could > use this to decide whether to skip a file or not. > Now, there's a problem there: distcp needs source and dest filesystems to > implement the same algorithm. It'd only work out the box if you were copying > between S3 instances. There are also quirks with encryption and multipart: > [s3 > docs|http://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonResponseHeaders.html]. > At the very least, it's something which could be used when indexing the FS, > to check for changes later. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13282) S3 blob etags to be made visible in status/getFileChecksum() calls
[ https://issues.apache.org/jira/browse/HADOOP-13282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-13282: Status: Open (was: Patch Available) > S3 blob etags to be made visible in status/getFileChecksum() calls > -- > > Key: HADOOP-13282 > URL: https://issues.apache.org/jira/browse/HADOOP-13282 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.9.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Attachments: HADOOP-13282-001.patch > > > If the etags of blobs were exported via {{getFileChecksum()}}, it'd be > possible to probe for a blob being in sync with a local file. Distcp could > use this to decide whether to skip a file or not. > Now, there's a problem there: distcp needs source and dest filesystems to > implement the same algorithm. It'd only work out the box if you were copying > between S3 instances. There are also quirks with encryption and multipart: > [s3 > docs|http://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonResponseHeaders.html]. > At the very least, it's something which could be used when indexing the FS, > to check for changes later. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13282) S3 blob etags to be made visible in status/getFileChecksum() calls
[ https://issues.apache.org/jira/browse/HADOOP-13282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-13282: Parent Issue: HADOOP-14831 (was: HADOOP-13204) > S3 blob etags to be made visible in status/getFileChecksum() calls > -- > > Key: HADOOP-13282 > URL: https://issues.apache.org/jira/browse/HADOOP-13282 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.9.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Attachments: HADOOP-13282-001.patch > > > If the etags of blobs were exported via {{getFileChecksum()}}, it'd be > possible to probe for a blob being in sync with a local file. Distcp could > use this to decide whether to skip a file or not. > Now, there's a problem there: distcp needs source and dest filesystems to > implement the same algorithm. It'd only work out the box if you were copying > between S3 instances. There are also quirks with encryption and multipart: > [s3 > docs|http://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonResponseHeaders.html]. > At the very least, it's something which could be used when indexing the FS, > to check for changes later. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13282) S3 blob etags to be made visible in status/getFileChecksum() calls
[ https://issues.apache.org/jira/browse/HADOOP-13282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-13282: Assignee: Steve Loughran Status: Patch Available (was: Open) > S3 blob etags to be made visible in status/getFileChecksum() calls > -- > > Key: HADOOP-13282 > URL: https://issues.apache.org/jira/browse/HADOOP-13282 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.9.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Attachments: HADOOP-13282-001.patch > > > If the etags of blobs were exported via {{getFileChecksum()}}, it'd be > possible to probe for a blob being in sync with a local file. Distcp could > use this to decide whether to skip a file or not. > Now, there's a problem there: distcp needs source and dest filesystems to > implement the same algorithm. It'd only work out the box if you were copying > between S3 instances. There are also quirks with encryption and multipart: > [s3 > docs|http://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonResponseHeaders.html]. > At the very least, it's something which could be used when indexing the FS, > to check for changes later. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13282) S3 blob etags to be made visible in status/getFileChecksum() calls
[ https://issues.apache.org/jira/browse/HADOOP-13282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-13282: Attachment: HADOOP-13282-001.patch Patch 001: code yes, tests no. probably time to do a contract test which verifies that * things fail on: getFileChecksum("/") * as well as other directories * two calls to the same file return the same checksum (whose bytes match) * the same file, if overwritten with different data, has a different checksum > S3 blob etags to be made visible in status/getFileChecksum() calls > -- > > Key: HADOOP-13282 > URL: https://issues.apache.org/jira/browse/HADOOP-13282 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.9.0 >Reporter: Steve Loughran >Priority: Minor > Attachments: HADOOP-13282-001.patch > > > If the etags of blobs were exported via {{getFileChecksum()}}, it'd be > possible to probe for a blob being in sync with a local file. Distcp could > use this to decide whether to skip a file or not. > Now, there's a problem there: distcp needs source and dest filesystems to > implement the same algorithm. It'd only work out the box if you were copying > between S3 instances. There are also quirks with encryption and multipart: > [s3 > docs|http://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonResponseHeaders.html]. > At the very least, it's something which could be used when indexing the FS, > to check for changes later. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13282) S3 blob etags to be made visible in status/getFileChecksum() calls
[ https://issues.apache.org/jira/browse/HADOOP-13282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-13282: Target Version/s: 3.1.0 > S3 blob etags to be made visible in status/getFileChecksum() calls > -- > > Key: HADOOP-13282 > URL: https://issues.apache.org/jira/browse/HADOOP-13282 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.9.0 >Reporter: Steve Loughran >Priority: Minor > > If the etags of blobs were exported via {{getFileChecksum()}}, it'd be > possible to probe for a blob being in sync with a local file. Distcp could > use this to decide whether to skip a file or not. > Now, there's a problem there: distcp needs source and dest filesystems to > implement the same algorithm. It'd only work out the box if you were copying > between S3 instances. There are also quirks with encryption and multipart: > [s3 > docs|http://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonResponseHeaders.html]. > At the very least, it's something which could be used when indexing the FS, > to check for changes later. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13282) S3 blob etags to be made visible in status/getFileChecksum() calls
[ https://issues.apache.org/jira/browse/HADOOP-13282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-13282: Description: If the etags of blobs were exported via {{getFileChecksum()}}, it'd be possible to probe for a blob being in sync with a local file. Distcp could use this to decide whether to skip a file or not. Now, there's a problem there: distcp needs source and dest filesystems to implement the same algorithm. It'd only work out the box if you were copying between S3 instances. There are also quirks with encryption and multipart: [s3 docs|http://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonResponseHeaders.html]. At the very least, it's something which could be used when indexing the FS, to check for changes later. was: If the etags of blobs were exported via {{getFileChecksum() }}, it'd be possible to probe for a blob being in sync with a local file. Distcp could use this to decide whether to skip a file or not. Now, there's a problem there: distcp needs source and dest filesystems to implement the same algorithm. It'd only work out the box if you were copying between S3 instances. There are also quirks with encryption and multipart: [s3 docs|http://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonResponseHeaders.html]. At the very least, it's something which could be used when indexing the FS, to check for changes later. > S3 blob etags to be made visible in status/getFileChecksum() calls > -- > > Key: HADOOP-13282 > URL: https://issues.apache.org/jira/browse/HADOOP-13282 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.9.0 >Reporter: Steve Loughran >Priority: Minor > > If the etags of blobs were exported via {{getFileChecksum()}}, it'd be > possible to probe for a blob being in sync with a local file. Distcp could > use this to decide whether to skip a file or not. > Now, there's a problem there: distcp needs source and dest filesystems to > implement the same algorithm. It'd only work out the box if you were copying > between S3 instances. There are also quirks with encryption and multipart: > [s3 > docs|http://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonResponseHeaders.html]. > At the very least, it's something which could be used when indexing the FS, > to check for changes later. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13282) S3 blob etags to be made visible in status/getFileChecksum() calls
[ https://issues.apache.org/jira/browse/HADOOP-13282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-13282: Affects Version/s: 2.9.0 > S3 blob etags to be made visible in status/getFileChecksum() calls > -- > > Key: HADOOP-13282 > URL: https://issues.apache.org/jira/browse/HADOOP-13282 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.9.0 >Reporter: Steve Loughran >Priority: Minor > > If the etags of blobs were exported via {{getFileChecksum() }}, it'd be > possible to probe for a blob being in sync with a local file. Distcp could > use this to decide whether to skip a file or not. > Now, there's a problem there: distcp needs source and dest filesystems to > implement the same algorithm. It'd only work out the box if you were copying > between S3 instances. There are also quirks with encryption and multipart: > [s3 > docs|http://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonResponseHeaders.html]. > At the very least, it's something which could be used when indexing the FS, > to check for changes later. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org