[jira] [Comment Edited] (HADOOP-15621) s3guard: implement time-based (TTL) expiry for DynamoDB Metadata Store

2018-09-21 Thread Gabor Bota (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16624144#comment-16624144
 ] 

Gabor Bota edited comment on HADOOP-15621 at 9/21/18 8:51 PM:
--

Just a note: There is no call from {{S3AFileSytem}} to {{MetadataStore#put}} 
directly, only from {{S3Guard}} (with static methods). 
 Both {{S3Guard}} AND {{S3AFileSytem}} should be analyzed more closely how it 
interacts with {{MetadataStore}}, and all [get, listChildren, put, move] 
interactions should be modified to handle TTL.

Also note: we have usages from S3GuardTool as well, directly calling 
MetadataStore#put. We need to handle those as well.


was (Author: gabor.bota):
Just a note: There is no call from {{S3AFileSytem}} to {{MetadataStore#put}} 
directly, only from {{S3Guard}} (with static methods). 
Both {{S3Guard}} AND {{S3AFileSytem}} should be analyzed more closely how it 
interacts with {{MetadataStore}}, and all [get, listChildren, put, move] 
interactions should be modified to handle TTL.

> s3guard: implement time-based (TTL) expiry for DynamoDB Metadata Store
> --
>
> Key: HADOOP-15621
> URL: https://issues.apache.org/jira/browse/HADOOP-15621
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Gabor Bota
>Priority: Minor
> Attachments: HADOOP-15621.001.patch
>
>
> Similar to HADOOP-13649, I think we should add a TTL (time to live) feature 
> to the Dynamo metadata store (MS) for S3Guard.
> This is a similar concept to an "online algorithm" version of the CLI prune() 
> function, which is the "offline algorithm".
> Why: 
>  1. Self healing (soft state): since we do not implement transactions around 
> modification of the two systems (s3 and metadata store), certain failures can 
> lead to inconsistency between S3 and the metadata store (MS) state. Having a 
> time to live (TTL) on each entry in S3Guard means that any inconsistencies 
> will be time bound. Thus "wait and restart your job" becomes a valid, if 
> ugly, way to get around any issues with FS client failure leaving things in a 
> bad state.
>  2. We could make manual invocation of `hadoop s3guard prune ...` 
> unnecessary, depending on the implementation.
>  3. Makes it possible to fix the problem that dynamo MS prune() doesn't prune 
> directories due to the lack of true modification time.
> How:
>  I think we need a new column in the dynamo table "entry last written time". 
> This is updated each time the entry is written to dynamo.
>  After that we can either
>  1. Have the client simply ignore / elide any entries that are older than the 
> configured TTL.
>  2. Have the client delete entries older than the TTL.
> The issue with #2 is it will increase latency if done inline in the context 
> of an FS operation. We could mitigate this some by using an async helper 
> thread, or probabilistically doing it "some times" to amortize the expense of 
> deleting stale entries (allowing some batching as well).
> Caveats:
>  - Clock synchronization as usual is a concern. Many clusters already keep 
> clocks close enough via NTP. We should at least document the requirement 
> along with the configuration knob that enables the feature.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-15621) s3guard: implement time-based (TTL) expiry for DynamoDB Metadata Store

2018-09-21 Thread Gabor Bota (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16624144#comment-16624144
 ] 

Gabor Bota edited comment on HADOOP-15621 at 9/21/18 8:51 PM:
--

There is no call from {{S3AFileSytem}} to {{MetadataStore#put}} directly, only 
from {{S3Guard}} (with static methods). 
 Both {{S3Guard}} AND {{S3AFileSytem}} should be analyzed more closely how it 
interacts with {{MetadataStore}}, and all [get, listChildren, put, move] 
interactions should be modified to handle TTL.

Also note: we have usages from S3GuardTool as well, directly calling 
MetadataStore#put. We need to handle those as well.


was (Author: gabor.bota):
Just a note: There is no call from {{S3AFileSytem}} to {{MetadataStore#put}} 
directly, only from {{S3Guard}} (with static methods). 
 Both {{S3Guard}} AND {{S3AFileSytem}} should be analyzed more closely how it 
interacts with {{MetadataStore}}, and all [get, listChildren, put, move] 
interactions should be modified to handle TTL.

Also note: we have usages from S3GuardTool as well, directly calling 
MetadataStore#put. We need to handle those as well.

> s3guard: implement time-based (TTL) expiry for DynamoDB Metadata Store
> --
>
> Key: HADOOP-15621
> URL: https://issues.apache.org/jira/browse/HADOOP-15621
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Gabor Bota
>Priority: Minor
> Attachments: HADOOP-15621.001.patch
>
>
> Similar to HADOOP-13649, I think we should add a TTL (time to live) feature 
> to the Dynamo metadata store (MS) for S3Guard.
> This is a similar concept to an "online algorithm" version of the CLI prune() 
> function, which is the "offline algorithm".
> Why: 
>  1. Self healing (soft state): since we do not implement transactions around 
> modification of the two systems (s3 and metadata store), certain failures can 
> lead to inconsistency between S3 and the metadata store (MS) state. Having a 
> time to live (TTL) on each entry in S3Guard means that any inconsistencies 
> will be time bound. Thus "wait and restart your job" becomes a valid, if 
> ugly, way to get around any issues with FS client failure leaving things in a 
> bad state.
>  2. We could make manual invocation of `hadoop s3guard prune ...` 
> unnecessary, depending on the implementation.
>  3. Makes it possible to fix the problem that dynamo MS prune() doesn't prune 
> directories due to the lack of true modification time.
> How:
>  I think we need a new column in the dynamo table "entry last written time". 
> This is updated each time the entry is written to dynamo.
>  After that we can either
>  1. Have the client simply ignore / elide any entries that are older than the 
> configured TTL.
>  2. Have the client delete entries older than the TTL.
> The issue with #2 is it will increase latency if done inline in the context 
> of an FS operation. We could mitigate this some by using an async helper 
> thread, or probabilistically doing it "some times" to amortize the expense of 
> deleting stale entries (allowing some batching as well).
> Caveats:
>  - Clock synchronization as usual is a concern. Many clusters already keep 
> clocks close enough via NTP. We should at least document the requirement 
> along with the configuration knob that enables the feature.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-15621) s3guard: implement time-based (TTL) expiry for DynamoDB Metadata Store

2018-09-21 Thread Gabor Bota (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16624144#comment-16624144
 ] 

Gabor Bota edited comment on HADOOP-15621 at 9/21/18 8:44 PM:
--

Just a note: There is no call from {{S3AFileSytem}} to {{MetadataStore#put}} 
directly, only from {{S3Guard}} (with static methods). 
Both {{S3Guard}} AND {{S3AFileSytem}} should be analyzed more closely how it 
interacts with {{MetadataStore}}, and all {get, listChildren, put, move} 
interactions should be modified to handle TTL.


was (Author: gabor.bota):
Just a note: There is no call from {{S3AFileSytem}} to {{MetadataStore#put}} 
directly, only from {{S3Guard}} (with static methods). 
Both {{S3Guard}} AND {{S3AFileSytem}} should be analyzed more closely how it 
interacts with {{MetadataStore}}, and all interactions should be modified to 
handle TTL.

> s3guard: implement time-based (TTL) expiry for DynamoDB Metadata Store
> --
>
> Key: HADOOP-15621
> URL: https://issues.apache.org/jira/browse/HADOOP-15621
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Gabor Bota
>Priority: Minor
> Attachments: HADOOP-15621.001.patch
>
>
> Similar to HADOOP-13649, I think we should add a TTL (time to live) feature 
> to the Dynamo metadata store (MS) for S3Guard.
> This is a similar concept to an "online algorithm" version of the CLI prune() 
> function, which is the "offline algorithm".
> Why: 
>  1. Self healing (soft state): since we do not implement transactions around 
> modification of the two systems (s3 and metadata store), certain failures can 
> lead to inconsistency between S3 and the metadata store (MS) state. Having a 
> time to live (TTL) on each entry in S3Guard means that any inconsistencies 
> will be time bound. Thus "wait and restart your job" becomes a valid, if 
> ugly, way to get around any issues with FS client failure leaving things in a 
> bad state.
>  2. We could make manual invocation of `hadoop s3guard prune ...` 
> unnecessary, depending on the implementation.
>  3. Makes it possible to fix the problem that dynamo MS prune() doesn't prune 
> directories due to the lack of true modification time.
> How:
>  I think we need a new column in the dynamo table "entry last written time". 
> This is updated each time the entry is written to dynamo.
>  After that we can either
>  1. Have the client simply ignore / elide any entries that are older than the 
> configured TTL.
>  2. Have the client delete entries older than the TTL.
> The issue with #2 is it will increase latency if done inline in the context 
> of an FS operation. We could mitigate this some by using an async helper 
> thread, or probabilistically doing it "some times" to amortize the expense of 
> deleting stale entries (allowing some batching as well).
> Caveats:
>  - Clock synchronization as usual is a concern. Many clusters already keep 
> clocks close enough via NTP. We should at least document the requirement 
> along with the configuration knob that enables the feature.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-15621) s3guard: implement time-based (TTL) expiry for DynamoDB Metadata Store

2018-09-21 Thread Gabor Bota (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16624144#comment-16624144
 ] 

Gabor Bota edited comment on HADOOP-15621 at 9/21/18 8:44 PM:
--

Just a note: There is no call from {{S3AFileSytem}} to {{MetadataStore#put}} 
directly, only from {{S3Guard}} (with static methods). 
Both {{S3Guard}} AND {{S3AFileSytem}} should be analyzed more closely how it 
interacts with {{MetadataStore}}, and all [get, listChildren, put, move] 
interactions should be modified to handle TTL.


was (Author: gabor.bota):
Just a note: There is no call from {{S3AFileSytem}} to {{MetadataStore#put}} 
directly, only from {{S3Guard}} (with static methods). 
Both {{S3Guard}} AND {{S3AFileSytem}} should be analyzed more closely how it 
interacts with {{MetadataStore}}, and all {get, listChildren, put, move} 
interactions should be modified to handle TTL.

> s3guard: implement time-based (TTL) expiry for DynamoDB Metadata Store
> --
>
> Key: HADOOP-15621
> URL: https://issues.apache.org/jira/browse/HADOOP-15621
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Gabor Bota
>Priority: Minor
> Attachments: HADOOP-15621.001.patch
>
>
> Similar to HADOOP-13649, I think we should add a TTL (time to live) feature 
> to the Dynamo metadata store (MS) for S3Guard.
> This is a similar concept to an "online algorithm" version of the CLI prune() 
> function, which is the "offline algorithm".
> Why: 
>  1. Self healing (soft state): since we do not implement transactions around 
> modification of the two systems (s3 and metadata store), certain failures can 
> lead to inconsistency between S3 and the metadata store (MS) state. Having a 
> time to live (TTL) on each entry in S3Guard means that any inconsistencies 
> will be time bound. Thus "wait and restart your job" becomes a valid, if 
> ugly, way to get around any issues with FS client failure leaving things in a 
> bad state.
>  2. We could make manual invocation of `hadoop s3guard prune ...` 
> unnecessary, depending on the implementation.
>  3. Makes it possible to fix the problem that dynamo MS prune() doesn't prune 
> directories due to the lack of true modification time.
> How:
>  I think we need a new column in the dynamo table "entry last written time". 
> This is updated each time the entry is written to dynamo.
>  After that we can either
>  1. Have the client simply ignore / elide any entries that are older than the 
> configured TTL.
>  2. Have the client delete entries older than the TTL.
> The issue with #2 is it will increase latency if done inline in the context 
> of an FS operation. We could mitigate this some by using an async helper 
> thread, or probabilistically doing it "some times" to amortize the expense of 
> deleting stale entries (allowing some batching as well).
> Caveats:
>  - Clock synchronization as usual is a concern. Many clusters already keep 
> clocks close enough via NTP. We should at least document the requirement 
> along with the configuration knob that enables the feature.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-15621) s3guard: implement time-based (TTL) expiry for DynamoDB Metadata Store

2018-09-21 Thread Gabor Bota (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16624144#comment-16624144
 ] 

Gabor Bota edited comment on HADOOP-15621 at 9/21/18 8:40 PM:
--

Just a note: There is no call from {{S3AFileSytem}} to {{MetadataStore#put}} 
directly, only from {{S3Guard}} (with static methods). 
Both {{S3Guard}} AND {{S3AFileSytem}} should be analyzed more closely how it 
interacts with {{MetadataStore}}, and all interactions should be modified to 
handle TTL.


was (Author: gabor.bota):
Just a note: There is no call from {{S3AFileSytem}} to {{MetadataStore#put}} 
directly, only from {{S3Guard}} (with static methods). 
{{S3Guard}} should be analyzed more closely how it interacts with 
{{MetadataStore}}, and all interactions should be modified to handle TTL.

> s3guard: implement time-based (TTL) expiry for DynamoDB Metadata Store
> --
>
> Key: HADOOP-15621
> URL: https://issues.apache.org/jira/browse/HADOOP-15621
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Gabor Bota
>Priority: Minor
> Attachments: HADOOP-15621.001.patch
>
>
> Similar to HADOOP-13649, I think we should add a TTL (time to live) feature 
> to the Dynamo metadata store (MS) for S3Guard.
> This is a similar concept to an "online algorithm" version of the CLI prune() 
> function, which is the "offline algorithm".
> Why: 
>  1. Self healing (soft state): since we do not implement transactions around 
> modification of the two systems (s3 and metadata store), certain failures can 
> lead to inconsistency between S3 and the metadata store (MS) state. Having a 
> time to live (TTL) on each entry in S3Guard means that any inconsistencies 
> will be time bound. Thus "wait and restart your job" becomes a valid, if 
> ugly, way to get around any issues with FS client failure leaving things in a 
> bad state.
>  2. We could make manual invocation of `hadoop s3guard prune ...` 
> unnecessary, depending on the implementation.
>  3. Makes it possible to fix the problem that dynamo MS prune() doesn't prune 
> directories due to the lack of true modification time.
> How:
>  I think we need a new column in the dynamo table "entry last written time". 
> This is updated each time the entry is written to dynamo.
>  After that we can either
>  1. Have the client simply ignore / elide any entries that are older than the 
> configured TTL.
>  2. Have the client delete entries older than the TTL.
> The issue with #2 is it will increase latency if done inline in the context 
> of an FS operation. We could mitigate this some by using an async helper 
> thread, or probabilistically doing it "some times" to amortize the expense of 
> deleting stale entries (allowing some batching as well).
> Caveats:
>  - Clock synchronization as usual is a concern. Many clusters already keep 
> clocks close enough via NTP. We should at least document the requirement 
> along with the configuration knob that enables the feature.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-15621) s3guard: implement time-based (TTL) expiry for DynamoDB Metadata Store

2018-09-21 Thread Gabor Bota (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16607040#comment-16607040
 ] 

Gabor Bota edited comment on HADOOP-15621 at 9/21/18 8:31 PM:
--

Based on your comments [~fabbri] I think we could rename this jira. It's more 
like a TTL for authoritative directory listings and files for all metadata 
store rather than a ttl expiry for dynamodb.

 


was (Author: gabor.bota):
Based on your comments [~fabbri] I think we could rename this jira. It's more 
like a TTL for authoritative directory listings for all metadata store rather 
than a ttl expiry for dynamodb.

 

> s3guard: implement time-based (TTL) expiry for DynamoDB Metadata Store
> --
>
> Key: HADOOP-15621
> URL: https://issues.apache.org/jira/browse/HADOOP-15621
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Gabor Bota
>Priority: Minor
> Attachments: HADOOP-15621.001.patch
>
>
> Similar to HADOOP-13649, I think we should add a TTL (time to live) feature 
> to the Dynamo metadata store (MS) for S3Guard.
> This is a similar concept to an "online algorithm" version of the CLI prune() 
> function, which is the "offline algorithm".
> Why: 
>  1. Self healing (soft state): since we do not implement transactions around 
> modification of the two systems (s3 and metadata store), certain failures can 
> lead to inconsistency between S3 and the metadata store (MS) state. Having a 
> time to live (TTL) on each entry in S3Guard means that any inconsistencies 
> will be time bound. Thus "wait and restart your job" becomes a valid, if 
> ugly, way to get around any issues with FS client failure leaving things in a 
> bad state.
>  2. We could make manual invocation of `hadoop s3guard prune ...` 
> unnecessary, depending on the implementation.
>  3. Makes it possible to fix the problem that dynamo MS prune() doesn't prune 
> directories due to the lack of true modification time.
> How:
>  I think we need a new column in the dynamo table "entry last written time". 
> This is updated each time the entry is written to dynamo.
>  After that we can either
>  1. Have the client simply ignore / elide any entries that are older than the 
> configured TTL.
>  2. Have the client delete entries older than the TTL.
> The issue with #2 is it will increase latency if done inline in the context 
> of an FS operation. We could mitigate this some by using an async helper 
> thread, or probabilistically doing it "some times" to amortize the expense of 
> deleting stale entries (allowing some batching as well).
> Caveats:
>  - Clock synchronization as usual is a concern. Many clusters already keep 
> clocks close enough via NTP. We should at least document the requirement 
> along with the configuration knob that enables the feature.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-15621) s3guard: implement time-based (TTL) expiry for DynamoDB Metadata Store

2018-08-28 Thread Gabor Bota (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16593567#comment-16593567
 ] 

Gabor Bota edited comment on HADOOP-15621 at 8/28/18 9:08 AM:
--

Some comments on the description: 
bq. I think we need a new column in the dynamo table "entry last written time". 
This is updated each time the entry is written to dynamo.
The current implementation uses {{mod_time}} field when using prune. It would 
be wise to use the same because this is an online version of prune. Thus, we 
don't need to add a new field to the item.

{quote}After that we can either
1. Have the client simply ignore / elide any entries that are older than the 
configured TTL.
2. Have the client delete entries older than the TTL.{quote}
There will be a switch for this - the user can set if she wants the item to be 
deleted automatically if the {{current_time - mod_time > ttl}}.

The implementation will behave like the following:
1. {{DynamoDBMetadataStore}} will ignore any entries that are older than the 
configured TTL. That means the entry will not be included in the directory 
listing, the returned {{DirListingMetadata}} won't be authoritative.
There's an issue with this setting: the listing won't be authoritative if the 
entry is older than TTL, but if the client does another full listing, and saves 
it to the metadata store, the next listing request will still be automatically 
{{isAuthoritative=false}} because of the leftover expired entry in the same 
directory. The solution could be that the {{DDBPathMetadata}}, so the dynamo 
entry could store if the file already caused a directory listing to be 
{{isAuthoritative=false}}, so it won't cause the listing to not be 
authoritative again. Of course, if there will be another entry with the same 
name, the flag should be removed (overwritten). This could be a flag named 
{{expired_entry}}.

2. {{DynamoDBMetadataStore}} deletes entries which are older than the TTL. With 
this setting, we won't have the same issue as in 1. setting, and it won't cost 
more than 1. The reason behind it is that we have to modify the entry (add 
{{EXPIRED_ENTRY=true}} flag) in 1., and in this solution, we delete the entry, 
so we have to reach out to dynamo.


was (Author: gabor.bota):
Some comments on the description: 
bq. I think we need a new column in the dynamo table "entry last written time". 
This is updated each time the entry is written to dynamo.
The current implementation uses {{mod_time}} field when using prune. It would 
be wise to use the same because this is an online version of prune. Thus, we 
don't need to add a new field to the item.

{quote}After that we can either
1. Have the client simply ignore / elide any entries that are older than the 
configured TTL.
2. Have the client delete entries older than the TTL.{quote}
There will be a switch for this - the user can set if she wants the item to be 
deleted automatically if the {{current_time - mod_time > ttl}}.

The implementation will behave like the following:
1. {{DynamoDBMetadataStore}} will ignore any entries that are older than the 
configured TTL. That means the entry will not be included in the directory 
listing, the returned {{DirListingMetadata}} won't be authoritative.
There's an issue with this setting: the listing won't be authoritative if the 
entry is older than TTL, but if the client does another full listing, and saves 
it to the metadata store, the next listing request will still be automatically 
{{isAuthoritative=false}} because of the leftover expired entry in the same 
directory. The solution could be that the {{DDBPathMetadata}}, so the dynamo 
entry could store if the file already caused a directory listing to be 
{{isAuthoritative=false}}, so it won't cause the listing to not be 
authoritative again. Of course, if there will be another entry with the same 
name, the flag should be removed (overwritten). This could be a flag named 
{{expired_entry}}.

2. {{DynamoDBMetadataStore}} deletes entries which are older than the TTL. With 
this setting we won't have the same issue as in 1. setting, but will cost more 
as we have to calculate with the deletion round trip time to dynamo.

> s3guard: implement time-based (TTL) expiry for DynamoDB Metadata Store
> --
>
> Key: HADOOP-15621
> URL: https://issues.apache.org/jira/browse/HADOOP-15621
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Gabor Bota
>Priority: Minor
>
> Similar to HADOOP-13649, I think we should add a TTL (time to live) feature 
> to the Dynamo metadata store (MS) for S3Guard.
> Think of this as the "online algorithm" version of the CLI prune() function, 
> wh

[jira] [Comment Edited] (HADOOP-15621) s3guard: implement time-based (TTL) expiry for DynamoDB Metadata Store

2018-08-27 Thread Gabor Bota (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16593567#comment-16593567
 ] 

Gabor Bota edited comment on HADOOP-15621 at 8/27/18 12:28 PM:
---

Some comments on the description: 
bq. I think we need a new column in the dynamo table "entry last written time". 
This is updated each time the entry is written to dynamo.
The current implementation uses {{mod_time}} field when using prune. It would 
be wise to use the same because this is an online version of prune. Thus, we 
don't need to add a new field to the item.

{quote}After that we can either
1. Have the client simply ignore / elide any entries that are older than the 
configured TTL.
2. Have the client delete entries older than the TTL.{quote}
There will be a switch for this - the user can set if she wants the item to be 
deleted automatically if the {{current_time - mod_time > ttl}}.

The implementation will behave like the following:
1. {{DynamoDBMetadataStore}} will ignore any entries that are older than the 
configured TTL. That means the entry will not be included in the directory 
listing, the returned {{DirListingMetadata}} won't be authoritative.
There's an issue with this setting: the listing won't be authoritative if the 
entry is older than TTL, but if the client does another full listing, and saves 
it to the metadata store, the next listing request will still be automatically 
{{isAuthoritative=false}} because of the leftover expired entry in the same 
directory. The solution could be that the {{DDBPathMetadata}}, so the dynamo 
entry could store if the file already caused a directory listing to be 
{{isAuthoritative=false}}, so it won't cause the listing to not be 
authoritative again. Of course, if there will be another entry with the same 
name, the flag should be removed (overwritten). This could be a flag named 
{{expired_entry}}.

2. {{DynamoDBMetadataStore}} deletes entries which are older than the TTL. With 
this setting we won't have the same issue as in 1. setting, but will cost more 
as we have to calculate with the deletion round trip time to dynamo.


was (Author: gabor.bota):
Some comments on the description: 
bq. I think we need a new column in the dynamo table "entry last written time". 
This is updated each time the entry is written to dynamo.
The current implementation uses {{mod_time}} field when using prune. It would 
be wise to use the same because this is an online version of prune. Thus, we 
don't need to add a new field to the item.

{quote}After that we can either
1. Have the client simply ignore / elide any entries that are older than the 
configured TTL.
2. Have the client delete entries older than the TTL.{quote}
There will be a switch for this - the user can set if she wants the item to be 
deleted automatically if the {{current_time - mod_time > ttl}}.

The implementation will behave like the following:
1. {{DynamoDBMetadataStore}} will ignore any entries that are older than the 
configured TTL. That means the entry will not be included in the directory 
listing, the returned {{DirListingMetadata}} won't be authoritative.
There's an issue with this setting: the listing won't be authoritative if the 
entry is older than TTL, but if the client does another full listing, and saves 
it to the metadata store, the next listing request will still be automatically 
{{isAuthoritative=false}} because of the leftover expired entry in the same 
dictionary. The solution could be that the {{DDBPathMetadata}}, so the dynamo 
entry could store if the file already caused a directory listing to be 
{{isAuthoritative=false}}, so it won't cause the listing to not be 
authoritative again. Of course, if there will be another entry with the same 
name, the flag should be removed (overwritten).
2. {{DynamoDBMetadataStore}} deletes entries which are older than the TTL. With 
this setting we won't have the same issue as in 1. setting, but will cost more 
as we have to calculate with the deletion round trip time to dynamo.

> s3guard: implement time-based (TTL) expiry for DynamoDB Metadata Store
> --
>
> Key: HADOOP-15621
> URL: https://issues.apache.org/jira/browse/HADOOP-15621
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Aaron Fabbri
>Assignee: Gabor Bota
>Priority: Minor
>
> Similar to HADOOP-13649, I think we should add a TTL (time to live) feature 
> to the Dynamo metadata store (MS) for S3Guard.
> Think of this as the "online algorithm" version of the CLI prune() function, 
> which is the "offline algorithm".
> Why: 
> 1. Self healing (soft state): since we do not implement transactions around 
> modification of the two systems (s3 and metada