[jira] [Updated] (HADOOP-15625) S3A input stream to use etags to detect changed source files

2019-03-08 Thread Gabor Bota (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Bota updated HADOOP-15625:

Attachment: HADOOP-15625-015.patch

> S3A input stream to use etags to detect changed source files
> 
>
> Key: HADOOP-15625
> URL: https://issues.apache.org/jira/browse/HADOOP-15625
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>Priority: Major
> Attachments: HADOOP--15625-006.patch, HADOOP-15625-001.patch, 
> HADOOP-15625-002.patch, HADOOP-15625-003.patch, HADOOP-15625-004.patch, 
> HADOOP-15625-005.patch, HADOOP-15625-006.patch, HADOOP-15625-007.patch, 
> HADOOP-15625-008.patch, HADOOP-15625-009.patch, HADOOP-15625-010.patch, 
> HADOOP-15625-011.patch, HADOOP-15625-012.patch, HADOOP-15625-013-delta.patch, 
> HADOOP-15625-013.patch, HADOOP-15625-014.patch, HADOOP-15625-015.patch, 
> HADOOP-15625-015.patch
>
>
> S3A input stream doesn't handle changing source files any better than the 
> other cloud store connectors. Specifically: it doesn't noticed it has 
> changed, caches the length from startup, and whenever a seek triggers a new 
> GET, you may get one of: old data, new data, and even perhaps go from new 
> data to old data due to eventual consistency.
> We can't do anything to stop this, but we could detect changes by
> # caching the etag of the first HEAD/GET (we don't get that HEAD on open with 
> S3Guard, BTW)
> # on future GET requests, verify the etag of the response
> # raise an IOE if the remote file changed during the read.
> It's a more dramatic failure, but it stops changes silently corrupting things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15625) S3A input stream to use etags to detect changed source files

2019-03-07 Thread Gabor Bota (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Bota updated HADOOP-15625:

Status: Patch Available  (was: Open)

> S3A input stream to use etags to detect changed source files
> 
>
> Key: HADOOP-15625
> URL: https://issues.apache.org/jira/browse/HADOOP-15625
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>Priority: Major
> Attachments: HADOOP--15625-006.patch, HADOOP-15625-001.patch, 
> HADOOP-15625-002.patch, HADOOP-15625-003.patch, HADOOP-15625-004.patch, 
> HADOOP-15625-005.patch, HADOOP-15625-006.patch, HADOOP-15625-007.patch, 
> HADOOP-15625-008.patch, HADOOP-15625-009.patch, HADOOP-15625-010.patch, 
> HADOOP-15625-011.patch, HADOOP-15625-012.patch, HADOOP-15625-013-delta.patch, 
> HADOOP-15625-013.patch, HADOOP-15625-014.patch, HADOOP-15625-015.patch
>
>
> S3A input stream doesn't handle changing source files any better than the 
> other cloud store connectors. Specifically: it doesn't noticed it has 
> changed, caches the length from startup, and whenever a seek triggers a new 
> GET, you may get one of: old data, new data, and even perhaps go from new 
> data to old data due to eventual consistency.
> We can't do anything to stop this, but we could detect changes by
> # caching the etag of the first HEAD/GET (we don't get that HEAD on open with 
> S3Guard, BTW)
> # on future GET requests, verify the etag of the response
> # raise an IOE if the remote file changed during the read.
> It's a more dramatic failure, but it stops changes silently corrupting things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15625) S3A input stream to use etags to detect changed source files

2019-03-07 Thread Gabor Bota (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Bota updated HADOOP-15625:

Status: Open  (was: Patch Available)

> S3A input stream to use etags to detect changed source files
> 
>
> Key: HADOOP-15625
> URL: https://issues.apache.org/jira/browse/HADOOP-15625
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>Priority: Major
> Attachments: HADOOP--15625-006.patch, HADOOP-15625-001.patch, 
> HADOOP-15625-002.patch, HADOOP-15625-003.patch, HADOOP-15625-004.patch, 
> HADOOP-15625-005.patch, HADOOP-15625-006.patch, HADOOP-15625-007.patch, 
> HADOOP-15625-008.patch, HADOOP-15625-009.patch, HADOOP-15625-010.patch, 
> HADOOP-15625-011.patch, HADOOP-15625-012.patch, HADOOP-15625-013-delta.patch, 
> HADOOP-15625-013.patch, HADOOP-15625-014.patch, HADOOP-15625-015.patch
>
>
> S3A input stream doesn't handle changing source files any better than the 
> other cloud store connectors. Specifically: it doesn't noticed it has 
> changed, caches the length from startup, and whenever a seek triggers a new 
> GET, you may get one of: old data, new data, and even perhaps go from new 
> data to old data due to eventual consistency.
> We can't do anything to stop this, but we could detect changes by
> # caching the etag of the first HEAD/GET (we don't get that HEAD on open with 
> S3Guard, BTW)
> # on future GET requests, verify the etag of the response
> # raise an IOE if the remote file changed during the read.
> It's a more dramatic failure, but it stops changes silently corrupting things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15625) S3A input stream to use etags to detect changed source files

2019-03-06 Thread Ben Roling (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Roling updated HADOOP-15625:

Attachment: HADOOP-15625-015.patch

> S3A input stream to use etags to detect changed source files
> 
>
> Key: HADOOP-15625
> URL: https://issues.apache.org/jira/browse/HADOOP-15625
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>Priority: Major
> Attachments: HADOOP--15625-006.patch, HADOOP-15625-001.patch, 
> HADOOP-15625-002.patch, HADOOP-15625-003.patch, HADOOP-15625-004.patch, 
> HADOOP-15625-005.patch, HADOOP-15625-006.patch, HADOOP-15625-007.patch, 
> HADOOP-15625-008.patch, HADOOP-15625-009.patch, HADOOP-15625-010.patch, 
> HADOOP-15625-011.patch, HADOOP-15625-012.patch, HADOOP-15625-013-delta.patch, 
> HADOOP-15625-013.patch, HADOOP-15625-014.patch, HADOOP-15625-015.patch
>
>
> S3A input stream doesn't handle changing source files any better than the 
> other cloud store connectors. Specifically: it doesn't noticed it has 
> changed, caches the length from startup, and whenever a seek triggers a new 
> GET, you may get one of: old data, new data, and even perhaps go from new 
> data to old data due to eventual consistency.
> We can't do anything to stop this, but we could detect changes by
> # caching the etag of the first HEAD/GET (we don't get that HEAD on open with 
> S3Guard, BTW)
> # on future GET requests, verify the etag of the response
> # raise an IOE if the remote file changed during the read.
> It's a more dramatic failure, but it stops changes silently corrupting things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15625) S3A input stream to use etags to detect changed source files

2019-03-06 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15625:

Status: Patch Available  (was: Open)

Note that the build failed last time
{code}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-clean-plugin:2.5:clean (default-clean) on 
project hadoop-yarn-applications-catalog-webapp: Failed to clean project: 
Failed to delete 
/testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-catalog/hadoop-yarn-applications-catalog-webapp/target/generated-sources/vendor/ecstatic/test/public
 -> [Help 1]
{code}

that's a directory created as part of YARN-7129. Commented there, let's see 
what's going to happen with this build. Is everyone else getting things to 
compile properly?

> S3A input stream to use etags to detect changed source files
> 
>
> Key: HADOOP-15625
> URL: https://issues.apache.org/jira/browse/HADOOP-15625
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>Priority: Major
> Attachments: HADOOP--15625-006.patch, HADOOP-15625-001.patch, 
> HADOOP-15625-002.patch, HADOOP-15625-003.patch, HADOOP-15625-004.patch, 
> HADOOP-15625-005.patch, HADOOP-15625-006.patch, HADOOP-15625-007.patch, 
> HADOOP-15625-008.patch, HADOOP-15625-009.patch, HADOOP-15625-010.patch, 
> HADOOP-15625-011.patch, HADOOP-15625-012.patch, HADOOP-15625-013-delta.patch, 
> HADOOP-15625-013.patch, HADOOP-15625-014.patch
>
>
> S3A input stream doesn't handle changing source files any better than the 
> other cloud store connectors. Specifically: it doesn't noticed it has 
> changed, caches the length from startup, and whenever a seek triggers a new 
> GET, you may get one of: old data, new data, and even perhaps go from new 
> data to old data due to eventual consistency.
> We can't do anything to stop this, but we could detect changes by
> # caching the etag of the first HEAD/GET (we don't get that HEAD on open with 
> S3Guard, BTW)
> # on future GET requests, verify the etag of the response
> # raise an IOE if the remote file changed during the read.
> It's a more dramatic failure, but it stops changes silently corrupting things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15625) S3A input stream to use etags to detect changed source files

2019-03-06 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15625:

Status: Open  (was: Patch Available)

> S3A input stream to use etags to detect changed source files
> 
>
> Key: HADOOP-15625
> URL: https://issues.apache.org/jira/browse/HADOOP-15625
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>Priority: Major
> Attachments: HADOOP--15625-006.patch, HADOOP-15625-001.patch, 
> HADOOP-15625-002.patch, HADOOP-15625-003.patch, HADOOP-15625-004.patch, 
> HADOOP-15625-005.patch, HADOOP-15625-006.patch, HADOOP-15625-007.patch, 
> HADOOP-15625-008.patch, HADOOP-15625-009.patch, HADOOP-15625-010.patch, 
> HADOOP-15625-011.patch, HADOOP-15625-012.patch, HADOOP-15625-013-delta.patch, 
> HADOOP-15625-013.patch, HADOOP-15625-014.patch
>
>
> S3A input stream doesn't handle changing source files any better than the 
> other cloud store connectors. Specifically: it doesn't noticed it has 
> changed, caches the length from startup, and whenever a seek triggers a new 
> GET, you may get one of: old data, new data, and even perhaps go from new 
> data to old data due to eventual consistency.
> We can't do anything to stop this, but we could detect changes by
> # caching the etag of the first HEAD/GET (we don't get that HEAD on open with 
> S3Guard, BTW)
> # on future GET requests, verify the etag of the response
> # raise an IOE if the remote file changed during the read.
> It's a more dramatic failure, but it stops changes silently corrupting things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15625) S3A input stream to use etags to detect changed source files

2019-03-06 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15625:

Status: Patch Available  (was: Open)

patch 014: patch 013 with the core-default.xml change checked in

> S3A input stream to use etags to detect changed source files
> 
>
> Key: HADOOP-15625
> URL: https://issues.apache.org/jira/browse/HADOOP-15625
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>Priority: Major
> Attachments: HADOOP--15625-006.patch, HADOOP-15625-001.patch, 
> HADOOP-15625-002.patch, HADOOP-15625-003.patch, HADOOP-15625-004.patch, 
> HADOOP-15625-005.patch, HADOOP-15625-006.patch, HADOOP-15625-007.patch, 
> HADOOP-15625-008.patch, HADOOP-15625-009.patch, HADOOP-15625-010.patch, 
> HADOOP-15625-011.patch, HADOOP-15625-012.patch, HADOOP-15625-013-delta.patch, 
> HADOOP-15625-013.patch, HADOOP-15625-014.patch
>
>
> S3A input stream doesn't handle changing source files any better than the 
> other cloud store connectors. Specifically: it doesn't noticed it has 
> changed, caches the length from startup, and whenever a seek triggers a new 
> GET, you may get one of: old data, new data, and even perhaps go from new 
> data to old data due to eventual consistency.
> We can't do anything to stop this, but we could detect changes by
> # caching the etag of the first HEAD/GET (we don't get that HEAD on open with 
> S3Guard, BTW)
> # on future GET requests, verify the etag of the response
> # raise an IOE if the remote file changed during the read.
> It's a more dramatic failure, but it stops changes silently corrupting things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15625) S3A input stream to use etags to detect changed source files

2019-03-06 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15625:

Attachment: HADOOP-15625-014.patch

> S3A input stream to use etags to detect changed source files
> 
>
> Key: HADOOP-15625
> URL: https://issues.apache.org/jira/browse/HADOOP-15625
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>Priority: Major
> Attachments: HADOOP--15625-006.patch, HADOOP-15625-001.patch, 
> HADOOP-15625-002.patch, HADOOP-15625-003.patch, HADOOP-15625-004.patch, 
> HADOOP-15625-005.patch, HADOOP-15625-006.patch, HADOOP-15625-007.patch, 
> HADOOP-15625-008.patch, HADOOP-15625-009.patch, HADOOP-15625-010.patch, 
> HADOOP-15625-011.patch, HADOOP-15625-012.patch, HADOOP-15625-013-delta.patch, 
> HADOOP-15625-013.patch, HADOOP-15625-014.patch
>
>
> S3A input stream doesn't handle changing source files any better than the 
> other cloud store connectors. Specifically: it doesn't noticed it has 
> changed, caches the length from startup, and whenever a seek triggers a new 
> GET, you may get one of: old data, new data, and even perhaps go from new 
> data to old data due to eventual consistency.
> We can't do anything to stop this, but we could detect changes by
> # caching the etag of the first HEAD/GET (we don't get that HEAD on open with 
> S3Guard, BTW)
> # on future GET requests, verify the etag of the response
> # raise an IOE if the remote file changed during the read.
> It's a more dramatic failure, but it stops changes silently corrupting things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15625) S3A input stream to use etags to detect changed source files

2019-03-06 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15625:

Status: Open  (was: Patch Available)

> S3A input stream to use etags to detect changed source files
> 
>
> Key: HADOOP-15625
> URL: https://issues.apache.org/jira/browse/HADOOP-15625
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>Priority: Major
> Attachments: HADOOP--15625-006.patch, HADOOP-15625-001.patch, 
> HADOOP-15625-002.patch, HADOOP-15625-003.patch, HADOOP-15625-004.patch, 
> HADOOP-15625-005.patch, HADOOP-15625-006.patch, HADOOP-15625-007.patch, 
> HADOOP-15625-008.patch, HADOOP-15625-009.patch, HADOOP-15625-010.patch, 
> HADOOP-15625-011.patch, HADOOP-15625-012.patch, HADOOP-15625-013-delta.patch, 
> HADOOP-15625-013.patch
>
>
> S3A input stream doesn't handle changing source files any better than the 
> other cloud store connectors. Specifically: it doesn't noticed it has 
> changed, caches the length from startup, and whenever a seek triggers a new 
> GET, you may get one of: old data, new data, and even perhaps go from new 
> data to old data due to eventual consistency.
> We can't do anything to stop this, but we could detect changes by
> # caching the etag of the first HEAD/GET (we don't get that HEAD on open with 
> S3Guard, BTW)
> # on future GET requests, verify the etag of the response
> # raise an IOE if the remote file changed during the read.
> It's a more dramatic failure, but it stops changes silently corrupting things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15625) S3A input stream to use etags to detect changed source files

2019-03-05 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15625:

Status: Open  (was: Patch Available)

> S3A input stream to use etags to detect changed source files
> 
>
> Key: HADOOP-15625
> URL: https://issues.apache.org/jira/browse/HADOOP-15625
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>Priority: Major
> Attachments: HADOOP--15625-006.patch, HADOOP-15625-001.patch, 
> HADOOP-15625-002.patch, HADOOP-15625-003.patch, HADOOP-15625-004.patch, 
> HADOOP-15625-005.patch, HADOOP-15625-006.patch, HADOOP-15625-007.patch, 
> HADOOP-15625-008.patch, HADOOP-15625-009.patch, HADOOP-15625-010.patch, 
> HADOOP-15625-011.patch, HADOOP-15625-012.patch, HADOOP-15625-013-delta.patch, 
> HADOOP-15625-013.patch
>
>
> S3A input stream doesn't handle changing source files any better than the 
> other cloud store connectors. Specifically: it doesn't noticed it has 
> changed, caches the length from startup, and whenever a seek triggers a new 
> GET, you may get one of: old data, new data, and even perhaps go from new 
> data to old data due to eventual consistency.
> We can't do anything to stop this, but we could detect changes by
> # caching the etag of the first HEAD/GET (we don't get that HEAD on open with 
> S3Guard, BTW)
> # on future GET requests, verify the etag of the response
> # raise an IOE if the remote file changed during the read.
> It's a more dramatic failure, but it stops changes silently corrupting things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15625) S3A input stream to use etags to detect changed source files

2019-03-05 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15625:

Attachment: HADOOP-15625-013.patch

> S3A input stream to use etags to detect changed source files
> 
>
> Key: HADOOP-15625
> URL: https://issues.apache.org/jira/browse/HADOOP-15625
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>Priority: Major
> Attachments: HADOOP--15625-006.patch, HADOOP-15625-001.patch, 
> HADOOP-15625-002.patch, HADOOP-15625-003.patch, HADOOP-15625-004.patch, 
> HADOOP-15625-005.patch, HADOOP-15625-006.patch, HADOOP-15625-007.patch, 
> HADOOP-15625-008.patch, HADOOP-15625-009.patch, HADOOP-15625-010.patch, 
> HADOOP-15625-011.patch, HADOOP-15625-012.patch, HADOOP-15625-013-delta.patch, 
> HADOOP-15625-013.patch
>
>
> S3A input stream doesn't handle changing source files any better than the 
> other cloud store connectors. Specifically: it doesn't noticed it has 
> changed, caches the length from startup, and whenever a seek triggers a new 
> GET, you may get one of: old data, new data, and even perhaps go from new 
> data to old data due to eventual consistency.
> We can't do anything to stop this, but we could detect changes by
> # caching the etag of the first HEAD/GET (we don't get that HEAD on open with 
> S3Guard, BTW)
> # on future GET requests, verify the etag of the response
> # raise an IOE if the remote file changed during the read.
> It's a more dramatic failure, but it stops changes silently corrupting things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15625) S3A input stream to use etags to detect changed source files

2019-03-05 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15625:

Status: Patch Available  (was: Open)

> S3A input stream to use etags to detect changed source files
> 
>
> Key: HADOOP-15625
> URL: https://issues.apache.org/jira/browse/HADOOP-15625
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>Priority: Major
> Attachments: HADOOP--15625-006.patch, HADOOP-15625-001.patch, 
> HADOOP-15625-002.patch, HADOOP-15625-003.patch, HADOOP-15625-004.patch, 
> HADOOP-15625-005.patch, HADOOP-15625-006.patch, HADOOP-15625-007.patch, 
> HADOOP-15625-008.patch, HADOOP-15625-009.patch, HADOOP-15625-010.patch, 
> HADOOP-15625-011.patch, HADOOP-15625-012.patch, HADOOP-15625-013-delta.patch, 
> HADOOP-15625-013.patch
>
>
> S3A input stream doesn't handle changing source files any better than the 
> other cloud store connectors. Specifically: it doesn't noticed it has 
> changed, caches the length from startup, and whenever a seek triggers a new 
> GET, you may get one of: old data, new data, and even perhaps go from new 
> data to old data due to eventual consistency.
> We can't do anything to stop this, but we could detect changes by
> # caching the etag of the first HEAD/GET (we don't get that HEAD on open with 
> S3Guard, BTW)
> # on future GET requests, verify the etag of the response
> # raise an IOE if the remote file changed during the read.
> It's a more dramatic failure, but it stops changes silently corrupting things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15625) S3A input stream to use etags to detect changed source files

2019-03-05 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15625:

Attachment: HADOOP-15625-013-delta.patch

> S3A input stream to use etags to detect changed source files
> 
>
> Key: HADOOP-15625
> URL: https://issues.apache.org/jira/browse/HADOOP-15625
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>Priority: Major
> Attachments: HADOOP--15625-006.patch, HADOOP-15625-001.patch, 
> HADOOP-15625-002.patch, HADOOP-15625-003.patch, HADOOP-15625-004.patch, 
> HADOOP-15625-005.patch, HADOOP-15625-006.patch, HADOOP-15625-007.patch, 
> HADOOP-15625-008.patch, HADOOP-15625-009.patch, HADOOP-15625-010.patch, 
> HADOOP-15625-011.patch, HADOOP-15625-012.patch, HADOOP-15625-013-delta.patch
>
>
> S3A input stream doesn't handle changing source files any better than the 
> other cloud store connectors. Specifically: it doesn't noticed it has 
> changed, caches the length from startup, and whenever a seek triggers a new 
> GET, you may get one of: old data, new data, and even perhaps go from new 
> data to old data due to eventual consistency.
> We can't do anything to stop this, but we could detect changes by
> # caching the etag of the first HEAD/GET (we don't get that HEAD on open with 
> S3Guard, BTW)
> # on future GET requests, verify the etag of the response
> # raise an IOE if the remote file changed during the read.
> It's a more dramatic failure, but it stops changes silently corrupting things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15625) S3A input stream to use etags to detect changed source files

2019-02-27 Thread Ben Roling (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Roling updated HADOOP-15625:

Attachment: HADOOP-15625-012.patch

> S3A input stream to use etags to detect changed source files
> 
>
> Key: HADOOP-15625
> URL: https://issues.apache.org/jira/browse/HADOOP-15625
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>Priority: Major
> Attachments: HADOOP--15625-006.patch, HADOOP-15625-001.patch, 
> HADOOP-15625-002.patch, HADOOP-15625-003.patch, HADOOP-15625-004.patch, 
> HADOOP-15625-005.patch, HADOOP-15625-006.patch, HADOOP-15625-007.patch, 
> HADOOP-15625-008.patch, HADOOP-15625-009.patch, HADOOP-15625-010.patch, 
> HADOOP-15625-011.patch, HADOOP-15625-012.patch
>
>
> S3A input stream doesn't handle changing source files any better than the 
> other cloud store connectors. Specifically: it doesn't noticed it has 
> changed, caches the length from startup, and whenever a seek triggers a new 
> GET, you may get one of: old data, new data, and even perhaps go from new 
> data to old data due to eventual consistency.
> We can't do anything to stop this, but we could detect changes by
> # caching the etag of the first HEAD/GET (we don't get that HEAD on open with 
> S3Guard, BTW)
> # on future GET requests, verify the etag of the response
> # raise an IOE if the remote file changed during the read.
> It's a more dramatic failure, but it stops changes silently corrupting things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15625) S3A input stream to use etags to detect changed source files

2019-02-27 Thread Ben Roling (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Roling updated HADOOP-15625:

Attachment: HADOOP-15625-011.patch

> S3A input stream to use etags to detect changed source files
> 
>
> Key: HADOOP-15625
> URL: https://issues.apache.org/jira/browse/HADOOP-15625
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>Priority: Major
> Attachments: HADOOP--15625-006.patch, HADOOP-15625-001.patch, 
> HADOOP-15625-002.patch, HADOOP-15625-003.patch, HADOOP-15625-004.patch, 
> HADOOP-15625-005.patch, HADOOP-15625-006.patch, HADOOP-15625-007.patch, 
> HADOOP-15625-008.patch, HADOOP-15625-009.patch, HADOOP-15625-010.patch, 
> HADOOP-15625-011.patch
>
>
> S3A input stream doesn't handle changing source files any better than the 
> other cloud store connectors. Specifically: it doesn't noticed it has 
> changed, caches the length from startup, and whenever a seek triggers a new 
> GET, you may get one of: old data, new data, and even perhaps go from new 
> data to old data due to eventual consistency.
> We can't do anything to stop this, but we could detect changes by
> # caching the etag of the first HEAD/GET (we don't get that HEAD on open with 
> S3Guard, BTW)
> # on future GET requests, verify the etag of the response
> # raise an IOE if the remote file changed during the read.
> It's a more dramatic failure, but it stops changes silently corrupting things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15625) S3A input stream to use etags to detect changed source files

2019-02-27 Thread Ben Roling (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Roling updated HADOOP-15625:

Attachment: HADOOP-15625-010.patch

> S3A input stream to use etags to detect changed source files
> 
>
> Key: HADOOP-15625
> URL: https://issues.apache.org/jira/browse/HADOOP-15625
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>Priority: Major
> Attachments: HADOOP--15625-006.patch, HADOOP-15625-001.patch, 
> HADOOP-15625-002.patch, HADOOP-15625-003.patch, HADOOP-15625-004.patch, 
> HADOOP-15625-005.patch, HADOOP-15625-006.patch, HADOOP-15625-007.patch, 
> HADOOP-15625-008.patch, HADOOP-15625-009.patch, HADOOP-15625-010.patch
>
>
> S3A input stream doesn't handle changing source files any better than the 
> other cloud store connectors. Specifically: it doesn't noticed it has 
> changed, caches the length from startup, and whenever a seek triggers a new 
> GET, you may get one of: old data, new data, and even perhaps go from new 
> data to old data due to eventual consistency.
> We can't do anything to stop this, but we could detect changes by
> # caching the etag of the first HEAD/GET (we don't get that HEAD on open with 
> S3Guard, BTW)
> # on future GET requests, verify the etag of the response
> # raise an IOE if the remote file changed during the read.
> It's a more dramatic failure, but it stops changes silently corrupting things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15625) S3A input stream to use etags to detect changed source files

2019-02-27 Thread Ben Roling (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Roling updated HADOOP-15625:

Attachment: HADOOP-15625-009.patch

> S3A input stream to use etags to detect changed source files
> 
>
> Key: HADOOP-15625
> URL: https://issues.apache.org/jira/browse/HADOOP-15625
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>Priority: Major
> Attachments: HADOOP--15625-006.patch, HADOOP-15625-001.patch, 
> HADOOP-15625-002.patch, HADOOP-15625-003.patch, HADOOP-15625-004.patch, 
> HADOOP-15625-005.patch, HADOOP-15625-006.patch, HADOOP-15625-007.patch, 
> HADOOP-15625-008.patch, HADOOP-15625-009.patch
>
>
> S3A input stream doesn't handle changing source files any better than the 
> other cloud store connectors. Specifically: it doesn't noticed it has 
> changed, caches the length from startup, and whenever a seek triggers a new 
> GET, you may get one of: old data, new data, and even perhaps go from new 
> data to old data due to eventual consistency.
> We can't do anything to stop this, but we could detect changes by
> # caching the etag of the first HEAD/GET (we don't get that HEAD on open with 
> S3Guard, BTW)
> # on future GET requests, verify the etag of the response
> # raise an IOE if the remote file changed during the read.
> It's a more dramatic failure, but it stops changes silently corrupting things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15625) S3A input stream to use etags to detect changed source files

2019-02-27 Thread Ben Roling (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Roling updated HADOOP-15625:

Attachment: HADOOP-15625-008.patch

> S3A input stream to use etags to detect changed source files
> 
>
> Key: HADOOP-15625
> URL: https://issues.apache.org/jira/browse/HADOOP-15625
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>Priority: Major
> Attachments: HADOOP--15625-006.patch, HADOOP-15625-001.patch, 
> HADOOP-15625-002.patch, HADOOP-15625-003.patch, HADOOP-15625-004.patch, 
> HADOOP-15625-005.patch, HADOOP-15625-006.patch, HADOOP-15625-007.patch, 
> HADOOP-15625-008.patch
>
>
> S3A input stream doesn't handle changing source files any better than the 
> other cloud store connectors. Specifically: it doesn't noticed it has 
> changed, caches the length from startup, and whenever a seek triggers a new 
> GET, you may get one of: old data, new data, and even perhaps go from new 
> data to old data due to eventual consistency.
> We can't do anything to stop this, but we could detect changes by
> # caching the etag of the first HEAD/GET (we don't get that HEAD on open with 
> S3Guard, BTW)
> # on future GET requests, verify the etag of the response
> # raise an IOE if the remote file changed during the read.
> It's a more dramatic failure, but it stops changes silently corrupting things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15625) S3A input stream to use etags to detect changed source files

2019-02-26 Thread Ben Roling (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Roling updated HADOOP-15625:

Attachment: HADOOP-15625-007.patch

> S3A input stream to use etags to detect changed source files
> 
>
> Key: HADOOP-15625
> URL: https://issues.apache.org/jira/browse/HADOOP-15625
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>Priority: Major
> Attachments: HADOOP--15625-006.patch, HADOOP-15625-001.patch, 
> HADOOP-15625-002.patch, HADOOP-15625-003.patch, HADOOP-15625-004.patch, 
> HADOOP-15625-005.patch, HADOOP-15625-006.patch, HADOOP-15625-007.patch
>
>
> S3A input stream doesn't handle changing source files any better than the 
> other cloud store connectors. Specifically: it doesn't noticed it has 
> changed, caches the length from startup, and whenever a seek triggers a new 
> GET, you may get one of: old data, new data, and even perhaps go from new 
> data to old data due to eventual consistency.
> We can't do anything to stop this, but we could detect changes by
> # caching the etag of the first HEAD/GET (we don't get that HEAD on open with 
> S3Guard, BTW)
> # on future GET requests, verify the etag of the response
> # raise an IOE if the remote file changed during the read.
> It's a more dramatic failure, but it stops changes silently corrupting things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15625) S3A input stream to use etags to detect changed source files

2019-02-25 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15625:

Status: Open  (was: Patch Available)

> S3A input stream to use etags to detect changed source files
> 
>
> Key: HADOOP-15625
> URL: https://issues.apache.org/jira/browse/HADOOP-15625
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>Priority: Major
> Attachments: HADOOP--15625-006.patch, HADOOP-15625-001.patch, 
> HADOOP-15625-002.patch, HADOOP-15625-003.patch, HADOOP-15625-004.patch, 
> HADOOP-15625-005.patch, HADOOP-15625-006.patch
>
>
> S3A input stream doesn't handle changing source files any better than the 
> other cloud store connectors. Specifically: it doesn't noticed it has 
> changed, caches the length from startup, and whenever a seek triggers a new 
> GET, you may get one of: old data, new data, and even perhaps go from new 
> data to old data due to eventual consistency.
> We can't do anything to stop this, but we could detect changes by
> # caching the etag of the first HEAD/GET (we don't get that HEAD on open with 
> S3Guard, BTW)
> # on future GET requests, verify the etag of the response
> # raise an IOE if the remote file changed during the read.
> It's a more dramatic failure, but it stops changes silently corrupting things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15625) S3A input stream to use etags to detect changed source files

2019-02-25 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15625:

Attachment: HADOOP--15625-006.patch

> S3A input stream to use etags to detect changed source files
> 
>
> Key: HADOOP-15625
> URL: https://issues.apache.org/jira/browse/HADOOP-15625
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>Priority: Major
> Attachments: HADOOP--15625-006.patch, HADOOP-15625-001.patch, 
> HADOOP-15625-002.patch, HADOOP-15625-003.patch, HADOOP-15625-004.patch, 
> HADOOP-15625-005.patch, HADOOP-15625-006.patch
>
>
> S3A input stream doesn't handle changing source files any better than the 
> other cloud store connectors. Specifically: it doesn't noticed it has 
> changed, caches the length from startup, and whenever a seek triggers a new 
> GET, you may get one of: old data, new data, and even perhaps go from new 
> data to old data due to eventual consistency.
> We can't do anything to stop this, but we could detect changes by
> # caching the etag of the first HEAD/GET (we don't get that HEAD on open with 
> S3Guard, BTW)
> # on future GET requests, verify the etag of the response
> # raise an IOE if the remote file changed during the read.
> It's a more dramatic failure, but it stops changes silently corrupting things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15625) S3A input stream to use etags to detect changed source files

2019-02-25 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15625:

Status: Patch Available  (was: Open)

Patch 006


* Moved all the implementation classes to a new package fs.s3a.impl. This is 
just because I've been thinking about cleaning up the fs.s3a package at some 
point, and placing files in there makes it clear that this is for 
implementation only.
* new LogExactlyOnce class so that the FS logs on only one input stream if 
versioning = true but versioning isn't supported 
* Marked {{RemoteFileChangedException}} as public/unstable
* added change/revision policy
* Reopen now raises a PathIOException on a null wrapped stream. I don't think 
we've ever got to that codepath, but...

Big: moved all change tracking logic into 
org.apache.hadoop.fs.s3a.impl.ChangeTracker; this makes unit testing the 
failure logic easier, especially version mismatch.
Also handles the special case "null came back but we have no revision ID" with 
a different message. I don't know to how to create that, but it's there anyway 
and can be tested for.


TODO

* document
* create some of the output logs for invalid conditions and include in the 
troubleshooting doc
* Maybe: have the InconsistentAmazonS3Client simulate version inconsistency. 
But with the isolated change tracking, I don't see the point in that
* Now, should we be ruthless and add a "require versionID option" which fails 
fast if version = off? That way: you'll know you've got versioning.


> S3A input stream to use etags to detect changed source files
> 
>
> Key: HADOOP-15625
> URL: https://issues.apache.org/jira/browse/HADOOP-15625
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>Priority: Major
> Attachments: HADOOP--15625-006.patch, HADOOP-15625-001.patch, 
> HADOOP-15625-002.patch, HADOOP-15625-003.patch, HADOOP-15625-004.patch, 
> HADOOP-15625-005.patch, HADOOP-15625-006.patch
>
>
> S3A input stream doesn't handle changing source files any better than the 
> other cloud store connectors. Specifically: it doesn't noticed it has 
> changed, caches the length from startup, and whenever a seek triggers a new 
> GET, you may get one of: old data, new data, and even perhaps go from new 
> data to old data due to eventual consistency.
> We can't do anything to stop this, but we could detect changes by
> # caching the etag of the first HEAD/GET (we don't get that HEAD on open with 
> S3Guard, BTW)
> # on future GET requests, verify the etag of the response
> # raise an IOE if the remote file changed during the read.
> It's a more dramatic failure, but it stops changes silently corrupting things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15625) S3A input stream to use etags to detect changed source files

2019-02-25 Thread Ben Roling (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Roling updated HADOOP-15625:

Attachment: HADOOP-15625-006.patch

> S3A input stream to use etags to detect changed source files
> 
>
> Key: HADOOP-15625
> URL: https://issues.apache.org/jira/browse/HADOOP-15625
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>Priority: Major
> Attachments: HADOOP-15625-001.patch, HADOOP-15625-002.patch, 
> HADOOP-15625-003.patch, HADOOP-15625-004.patch, HADOOP-15625-005.patch, 
> HADOOP-15625-006.patch
>
>
> S3A input stream doesn't handle changing source files any better than the 
> other cloud store connectors. Specifically: it doesn't noticed it has 
> changed, caches the length from startup, and whenever a seek triggers a new 
> GET, you may get one of: old data, new data, and even perhaps go from new 
> data to old data due to eventual consistency.
> We can't do anything to stop this, but we could detect changes by
> # caching the etag of the first HEAD/GET (we don't get that HEAD on open with 
> S3Guard, BTW)
> # on future GET requests, verify the etag of the response
> # raise an IOE if the remote file changed during the read.
> It's a more dramatic failure, but it stops changes silently corrupting things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15625) S3A input stream to use etags to detect changed source files

2019-02-22 Thread Ben Roling (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Roling updated HADOOP-15625:

Attachment: HADOOP-15625-005.patch

> S3A input stream to use etags to detect changed source files
> 
>
> Key: HADOOP-15625
> URL: https://issues.apache.org/jira/browse/HADOOP-15625
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>Priority: Major
> Attachments: HADOOP-15625-001.patch, HADOOP-15625-002.patch, 
> HADOOP-15625-003.patch, HADOOP-15625-004.patch, HADOOP-15625-005.patch
>
>
> S3A input stream doesn't handle changing source files any better than the 
> other cloud store connectors. Specifically: it doesn't noticed it has 
> changed, caches the length from startup, and whenever a seek triggers a new 
> GET, you may get one of: old data, new data, and even perhaps go from new 
> data to old data due to eventual consistency.
> We can't do anything to stop this, but we could detect changes by
> # caching the etag of the first HEAD/GET (we don't get that HEAD on open with 
> S3Guard, BTW)
> # on future GET requests, verify the etag of the response
> # raise an IOE if the remote file changed during the read.
> It's a more dramatic failure, but it stops changes silently corrupting things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15625) S3A input stream to use etags to detect changed source files

2019-02-22 Thread Ben Roling (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Roling updated HADOOP-15625:

Attachment: HADOOP-15625-004.patch

> S3A input stream to use etags to detect changed source files
> 
>
> Key: HADOOP-15625
> URL: https://issues.apache.org/jira/browse/HADOOP-15625
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>Priority: Major
> Attachments: HADOOP-15625-001.patch, HADOOP-15625-002.patch, 
> HADOOP-15625-003.patch, HADOOP-15625-004.patch
>
>
> S3A input stream doesn't handle changing source files any better than the 
> other cloud store connectors. Specifically: it doesn't noticed it has 
> changed, caches the length from startup, and whenever a seek triggers a new 
> GET, you may get one of: old data, new data, and even perhaps go from new 
> data to old data due to eventual consistency.
> We can't do anything to stop this, but we could detect changes by
> # caching the etag of the first HEAD/GET (we don't get that HEAD on open with 
> S3Guard, BTW)
> # on future GET requests, verify the etag of the response
> # raise an IOE if the remote file changed during the read.
> It's a more dramatic failure, but it stops changes silently corrupting things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15625) S3A input stream to use etags to detect changed source files

2019-02-20 Thread Gabor Bota (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Bota updated HADOOP-15625:

Attachment: (was: HADOOP-15999.004.patch)

> S3A input stream to use etags to detect changed source files
> 
>
> Key: HADOOP-15625
> URL: https://issues.apache.org/jira/browse/HADOOP-15625
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>Priority: Major
> Attachments: HADOOP-15625-001.patch, HADOOP-15625-002.patch, 
> HADOOP-15625-003.patch
>
>
> S3A input stream doesn't handle changing source files any better than the 
> other cloud store connectors. Specifically: it doesn't noticed it has 
> changed, caches the length from startup, and whenever a seek triggers a new 
> GET, you may get one of: old data, new data, and even perhaps go from new 
> data to old data due to eventual consistency.
> We can't do anything to stop this, but we could detect changes by
> # caching the etag of the first HEAD/GET (we don't get that HEAD on open with 
> S3Guard, BTW)
> # on future GET requests, verify the etag of the response
> # raise an IOE if the remote file changed during the read.
> It's a more dramatic failure, but it stops changes silently corrupting things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15625) S3A input stream to use etags to detect changed source files

2019-02-20 Thread Gabor Bota (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Bota updated HADOOP-15625:

Attachment: HADOOP-15999.004.patch

> S3A input stream to use etags to detect changed source files
> 
>
> Key: HADOOP-15625
> URL: https://issues.apache.org/jira/browse/HADOOP-15625
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>Priority: Major
> Attachments: HADOOP-15625-001.patch, HADOOP-15625-002.patch, 
> HADOOP-15625-003.patch, HADOOP-15999.004.patch
>
>
> S3A input stream doesn't handle changing source files any better than the 
> other cloud store connectors. Specifically: it doesn't noticed it has 
> changed, caches the length from startup, and whenever a seek triggers a new 
> GET, you may get one of: old data, new data, and even perhaps go from new 
> data to old data due to eventual consistency.
> We can't do anything to stop this, but we could detect changes by
> # caching the etag of the first HEAD/GET (we don't get that HEAD on open with 
> S3Guard, BTW)
> # on future GET requests, verify the etag of the response
> # raise an IOE if the remote file changed during the read.
> It's a more dramatic failure, but it stops changes silently corrupting things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15625) S3A input stream to use etags to detect changed source files

2018-12-14 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15625:

Status: Patch Available  (was: Open)

> S3A input stream to use etags to detect changed source files
> 
>
> Key: HADOOP-15625
> URL: https://issues.apache.org/jira/browse/HADOOP-15625
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>Priority: Major
> Attachments: HADOOP-15625-001.patch, HADOOP-15625-002.patch, 
> HADOOP-15625-003.patch
>
>
> S3A input stream doesn't handle changing source files any better than the 
> other cloud store connectors. Specifically: it doesn't noticed it has 
> changed, caches the length from startup, and whenever a seek triggers a new 
> GET, you may get one of: old data, new data, and even perhaps go from new 
> data to old data due to eventual consistency.
> We can't do anything to stop this, but we could detect changes by
> # caching the etag of the first HEAD/GET (we don't get that HEAD on open with 
> S3Guard, BTW)
> # on future GET requests, verify the etag of the response
> # raise an IOE if the remote file changed during the read.
> It's a more dramatic failure, but it stops changes silently corrupting things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15625) S3A input stream to use etags to detect changed source files

2018-12-09 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15625:

Status: Open  (was: Patch Available)

> S3A input stream to use etags to detect changed source files
> 
>
> Key: HADOOP-15625
> URL: https://issues.apache.org/jira/browse/HADOOP-15625
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>Priority: Major
> Attachments: HADOOP-15625-001.patch, HADOOP-15625-002.patch, 
> HADOOP-15625-003.patch
>
>
> S3A input stream doesn't handle changing source files any better than the 
> other cloud store connectors. Specifically: it doesn't noticed it has 
> changed, caches the length from startup, and whenever a seek triggers a new 
> GET, you may get one of: old data, new data, and even perhaps go from new 
> data to old data due to eventual consistency.
> We can't do anything to stop this, but we could detect changes by
> # caching the etag of the first HEAD/GET (we don't get that HEAD on open with 
> S3Guard, BTW)
> # on future GET requests, verify the etag of the response
> # raise an IOE if the remote file changed during the read.
> It's a more dramatic failure, but it stops changes silently corrupting things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15625) S3A input stream to use etags to detect changed source files

2018-12-09 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15625:

Attachment: HADOOP-15625-003.patch

> S3A input stream to use etags to detect changed source files
> 
>
> Key: HADOOP-15625
> URL: https://issues.apache.org/jira/browse/HADOOP-15625
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>Priority: Major
> Attachments: HADOOP-15625-001.patch, HADOOP-15625-002.patch, 
> HADOOP-15625-003.patch
>
>
> S3A input stream doesn't handle changing source files any better than the 
> other cloud store connectors. Specifically: it doesn't noticed it has 
> changed, caches the length from startup, and whenever a seek triggers a new 
> GET, you may get one of: old data, new data, and even perhaps go from new 
> data to old data due to eventual consistency.
> We can't do anything to stop this, but we could detect changes by
> # caching the etag of the first HEAD/GET (we don't get that HEAD on open with 
> S3Guard, BTW)
> # on future GET requests, verify the etag of the response
> # raise an IOE if the remote file changed during the read.
> It's a more dramatic failure, but it stops changes silently corrupting things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15625) S3A input stream to use etags to detect changed source files

2018-09-13 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15625:

Parent Issue: HADOOP-15620  (was: HADOOP-15220)

> S3A input stream to use etags to detect changed source files
> 
>
> Key: HADOOP-15625
> URL: https://issues.apache.org/jira/browse/HADOOP-15625
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>Priority: Major
> Attachments: HADOOP-15625-001.patch, HADOOP-15625-002.patch
>
>
> S3A input stream doesn't handle changing source files any better than the 
> other cloud store connectors. Specifically: it doesn't noticed it has 
> changed, caches the length from startup, and whenever a seek triggers a new 
> GET, you may get one of: old data, new data, and even perhaps go from new 
> data to old data due to eventual consistency.
> We can't do anything to stop this, but we could detect changes by
> # caching the etag of the first HEAD/GET (we don't get that HEAD on open with 
> S3Guard, BTW)
> # on future GET requests, verify the etag of the response
> # raise an IOE if the remote file changed during the read.
> It's a more dramatic failure, but it stops changes silently corrupting things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15625) S3A input stream to use etags to detect changed source files

2018-08-21 Thread Brahma Reddy Battula (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated HADOOP-15625:
--
Status: Patch Available  (was: Open)

> S3A input stream to use etags to detect changed source files
> 
>
> Key: HADOOP-15625
> URL: https://issues.apache.org/jira/browse/HADOOP-15625
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>Priority: Major
> Attachments: HADOOP-15625-001.patch, HADOOP-15625-002.patch
>
>
> S3A input stream doesn't handle changing source files any better than the 
> other cloud store connectors. Specifically: it doesn't noticed it has 
> changed, caches the length from startup, and whenever a seek triggers a new 
> GET, you may get one of: old data, new data, and even perhaps go from new 
> data to old data due to eventual consistency.
> We can't do anything to stop this, but we could detect changes by
> # caching the etag of the first HEAD/GET (we don't get that HEAD on open with 
> S3Guard, BTW)
> # on future GET requests, verify the etag of the response
> # raise an IOE if the remote file changed during the read.
> It's a more dramatic failure, but it stops changes silently corrupting things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15625) S3A input stream to use etags to detect changed source files

2018-08-03 Thread Brahma Reddy Battula (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated HADOOP-15625:
--
Attachment: HADOOP-15625-002.patch

> S3A input stream to use etags to detect changed source files
> 
>
> Key: HADOOP-15625
> URL: https://issues.apache.org/jira/browse/HADOOP-15625
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>Priority: Major
> Attachments: HADOOP-15625-001.patch, HADOOP-15625-002.patch
>
>
> S3A input stream doesn't handle changing source files any better than the 
> other cloud store connectors. Specifically: it doesn't noticed it has 
> changed, caches the length from startup, and whenever a seek triggers a new 
> GET, you may get one of: old data, new data, and even perhaps go from new 
> data to old data due to eventual consistency.
> We can't do anything to stop this, but we could detect changes by
> # caching the etag of the first HEAD/GET (we don't get that HEAD on open with 
> S3Guard, BTW)
> # on future GET requests, verify the etag of the response
> # raise an IOE if the remote file changed during the read.
> It's a more dramatic failure, but it stops changes silently corrupting things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15625) S3A input stream to use etags to detect changed source files

2018-08-02 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-15625:

Parent Issue: HADOOP-15220  (was: HADOOP-15620)

> S3A input stream to use etags to detect changed source files
> 
>
> Key: HADOOP-15625
> URL: https://issues.apache.org/jira/browse/HADOOP-15625
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Brahma Reddy Battula
>Priority: Major
> Attachments: HADOOP-15625-001.patch
>
>
> S3A input stream doesn't handle changing source files any better than the 
> other cloud store connectors. Specifically: it doesn't noticed it has 
> changed, caches the length from startup, and whenever a seek triggers a new 
> GET, you may get one of: old data, new data, and even perhaps go from new 
> data to old data due to eventual consistency.
> We can't do anything to stop this, but we could detect changes by
> # caching the etag of the first HEAD/GET (we don't get that HEAD on open with 
> S3Guard, BTW)
> # on future GET requests, verify the etag of the response
> # raise an IOE if the remote file changed during the read.
> It's a more dramatic failure, but it stops changes silently corrupting things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15625) S3A input stream to use etags to detect changed source files

2018-07-31 Thread Brahma Reddy Battula (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated HADOOP-15625:
--
Attachment: HADOOP-15625-001.patch

> S3A input stream to use etags to detect changed source files
> 
>
> Key: HADOOP-15625
> URL: https://issues.apache.org/jira/browse/HADOOP-15625
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Brahma Reddy Battula
>Priority: Major
> Attachments: HADOOP-15625-001.patch
>
>
> S3A input stream doesn't handle changing source files any better than the 
> other cloud store connectors. Specifically: it doesn't noticed it has 
> changed, caches the length from startup, and whenever a seek triggers a new 
> GET, you may get one of: old data, new data, and even perhaps go from new 
> data to old data due to eventual consistency.
> We can't do anything to stop this, but we could detect changes by
> # caching the etag of the first HEAD/GET (we don't get that HEAD on open with 
> S3Guard, BTW)
> # on future GET requests, verify the etag of the response
> # raise an IOE if the remote file changed during the read.
> It's a more dramatic failure, but it stops changes silently corrupting things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org