[jira] [Commented] (FLINK-7266) Don't attempt to delete parent directory on S3

2017-11-17 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257146#comment-16257146
 ] 

Aljoscha Krettek commented on FLINK-7266:
-

Fixed on master (to be 1.5) in
b00f1b326c1ab4221a555200a4d5798e1565b821

> Don't attempt to delete parent directory on S3
> --
>
> Key: FLINK-7266
> URL: https://issues.apache.org/jira/browse/FLINK-7266
> Project: Flink
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.3.1
>Reporter: Stephan Ewen
>Assignee: Aljoscha Krettek
>Priority: Blocker
> Fix For: 1.4.0, 1.3.2
>
>
> Currently, every attempted release of an S3 state object also checks if the 
> "parent directory" is empty and then tries to delete it.
> Not only is that unnecessary on S3, but it is prohibitively expensive and for 
> example causes S3 to throttle calls by the JobManager on checkpoint cleanup.
> The {{FileState}} must only attempt parent directory cleanup when operating 
> against real file systems, not when operating against object stores.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-7266) Don't attempt to delete parent directory on S3

2017-11-17 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16256920#comment-16256920
 ] 

Aljoscha Krettek commented on FLINK-7266:
-

We actually have to fix this for 1.4 because we regress from 1.3, otherwise. 
The operation that deletes parent directories is too expensive and will 
basically DDOS s3, making Flink unusable for bigger installations with s3.

> Don't attempt to delete parent directory on S3
> --
>
> Key: FLINK-7266
> URL: https://issues.apache.org/jira/browse/FLINK-7266
> Project: Flink
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.3.1
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
>Priority: Blocker
> Fix For: 1.4.0, 1.3.2, 1.5.0
>
>
> Currently, every attempted release of an S3 state object also checks if the 
> "parent directory" is empty and then tries to delete it.
> Not only is that unnecessary on S3, but it is prohibitively expensive and for 
> example causes S3 to throttle calls by the JobManager on checkpoint cleanup.
> The {{FileState}} must only attempt parent directory cleanup when operating 
> against real file systems, not when operating against object stores.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-7266) Don't attempt to delete parent directory on S3

2017-10-12 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16201823#comment-16201823
 ] 

Aljoscha Krettek commented on FLINK-7266:
-

For 1.4, this will be resolved by only deleting the parent directory on the 
master (in the checkpoint coordinator).

> Don't attempt to delete parent directory on S3
> --
>
> Key: FLINK-7266
> URL: https://issues.apache.org/jira/browse/FLINK-7266
> Project: Flink
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.3.1
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
>Priority: Blocker
> Fix For: 1.4.0, 1.3.2
>
>
> Currently, every attempted release of an S3 state object also checks if the 
> "parent directory" is empty and then tries to delete it.
> Not only is that unnecessary on S3, but it is prohibitively expensive and for 
> example causes S3 to throttle calls by the JobManager on checkpoint cleanup.
> The {{FileState}} must only attempt parent directory cleanup when operating 
> against real file systems, not when operating against object stores.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-7266) Don't attempt to delete parent directory on S3

2017-10-06 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16194915#comment-16194915
 ] 

Stephan Ewen commented on FLINK-7266:
-

True, this is a problem in 1.3.2 - the tradeoff was to either have a very large 
amount of redundant requests for directory emptiness check (which cause the 
checkpointing to stall or be throttled) or to leave the "directories".

In Flink 1.4 we want to fix this by letting the checkpoints understand the file 
structure and make it a single call to drop the directory, as Steve suggested.
The current abstraction is overly generic (just things in arbitrary byte 
chunks) and does not understand that checkpoint files cluster together in 
directories.

> Don't attempt to delete parent directory on S3
> --
>
> Key: FLINK-7266
> URL: https://issues.apache.org/jira/browse/FLINK-7266
> Project: Flink
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.3.1
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
>Priority: Critical
> Fix For: 1.4.0, 1.3.2
>
>
> Currently, every attempted release of an S3 state object also checks if the 
> "parent directory" is empty and then tries to delete it.
> Not only is that unnecessary on S3, but it is prohibitively expensive and for 
> example causes S3 to throttle calls by the JobManager on checkpoint cleanup.
> The {{FileState}} must only attempt parent directory cleanup when operating 
> against real file systems, not when operating against object stores.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-7266) Don't attempt to delete parent directory on S3

2017-09-15 Thread Elias Levy (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16168733#comment-16168733
 ] 

Elias Levy commented on FLINK-7266:
---

I am curious what the state of this is.  It is still a problem on 1.3.2, making 
use of S3 with the file system state backend very imprudent in production.  You 
end up with thousands of empty "directories" in S3 for the checkpoints

{code}
$ $ sudo aws s3 ls --recursive 
s3://bucket/flink/checkpoints/58c7604fbc543b6df75b62601a9b4c9d/ 
2017-09-15 23:03:15  0 
flink/checkpoints/58c7604fbc543b6df75b62601a9b4c9d/chk-1/
2017-09-15 23:04:15  0 
flink/checkpoints/58c7604fbc543b6df75b62601a9b4c9d/chk-10/
2017-09-15 23:14:07  0 
flink/checkpoints/58c7604fbc543b6df75b62601a9b4c9d/chk-100/
2017-09-15 23:14:14  0 
flink/checkpoints/58c7604fbc543b6df75b62601a9b4c9d/chk-101/
2017-09-15 23:14:20  0 
flink/checkpoints/58c7604fbc543b6df75b62601a9b4c9d/chk-102/
2017-09-15 23:15:12  0 
flink/checkpoints/58c7604fbc543b6df75b62601a9b4c9d/chk-103/
2017-09-15 23:15:18  0 
flink/checkpoints/58c7604fbc543b6df75b62601a9b4c9d/chk-104/
...
{code}

> Don't attempt to delete parent directory on S3
> --
>
> Key: FLINK-7266
> URL: https://issues.apache.org/jira/browse/FLINK-7266
> Project: Flink
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.3.1
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
>Priority: Critical
> Fix For: 1.4.0, 1.3.2
>
>
> Currently, every attempted release of an S3 state object also checks if the 
> "parent directory" is empty and then tries to delete it.
> Not only is that unnecessary on S3, but it is prohibitively expensive and for 
> example causes S3 to throttle calls by the JobManager on checkpoint cleanup.
> The {{FileState}} must only attempt parent directory cleanup when operating 
> against real file systems, not when operating against object stores.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-7266) Don't attempt to delete parent directory on S3

2017-09-06 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16155245#comment-16155245
 ] 

Steve Loughran commented on FLINK-7266:
---

if you are using s3a then delete(path, recursive=false) will stop you from 
trying to delete a non-empty dir

> Don't attempt to delete parent directory on S3
> --
>
> Key: FLINK-7266
> URL: https://issues.apache.org/jira/browse/FLINK-7266
> Project: Flink
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.3.1
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
>Priority: Critical
> Fix For: 1.4.0, 1.3.2
>
>
> Currently, every attempted release of an S3 state object also checks if the 
> "parent directory" is empty and then tries to delete it.
> Not only is that unnecessary on S3, but it is prohibitively expensive and for 
> example causes S3 to throttle calls by the JobManager on checkpoint cleanup.
> The {{FileState}} must only attempt parent directory cleanup when operating 
> against real file systems, not when operating against object stores.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-7266) Don't attempt to delete parent directory on S3

2017-09-06 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16155151#comment-16155151
 ] 

Aljoscha Krettek commented on FLINK-7266:
-

I think that's only part of the problem because Flink must check on its own 
whether the directory is empty before we can delete it.

The basic problem is that each state handle is being cleaned up individually. 
If we had global knowledge that all state handles actually reside in on base 
directory then we could shoot of an asynchronous command that deletes that 
whole sub-directory. (Which might still be horribly slow on S3 and not solve 
the problem at all.)

> Don't attempt to delete parent directory on S3
> --
>
> Key: FLINK-7266
> URL: https://issues.apache.org/jira/browse/FLINK-7266
> Project: Flink
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.3.1
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
>Priority: Critical
> Fix For: 1.4.0, 1.3.2
>
>
> Currently, every attempted release of an S3 state object also checks if the 
> "parent directory" is empty and then tries to delete it.
> Not only is that unnecessary on S3, but it is prohibitively expensive and for 
> example causes S3 to throttle calls by the JobManager on checkpoint cleanup.
> The {{FileState}} must only attempt parent directory cleanup when operating 
> against real file systems, not when operating against object stores.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-7266) Don't attempt to delete parent directory on S3

2017-09-05 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16153910#comment-16153910
 ] 

Steve Loughran commented on FLINK-7266:
---

FWIW, in s3a we create a single delete request to rm all parent paths *and 
don't bother doing the existence check*. 

That is, for a file a/b/c.txt, after the file is written in close(), POST a 
delete list of

/a/
/a/b

It's ~O(1)  for depth and as you don't need to wait for the response, even 
something you could being async on.

> Don't attempt to delete parent directory on S3
> --
>
> Key: FLINK-7266
> URL: https://issues.apache.org/jira/browse/FLINK-7266
> Project: Flink
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.3.1
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
>Priority: Critical
> Fix For: 1.4.0, 1.3.2
>
>
> Currently, every attempted release of an S3 state object also checks if the 
> "parent directory" is empty and then tries to delete it.
> Not only is that unnecessary on S3, but it is prohibitively expensive and for 
> example causes S3 to throttle calls by the JobManager on checkpoint cleanup.
> The {{FileState}} must only attempt parent directory cleanup when operating 
> against real file systems, not when operating against object stores.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-7266) Don't attempt to delete parent directory on S3

2017-08-03 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112606#comment-16112606
 ] 

Aljoscha Krettek commented on FLINK-7266:
-

This is actually resolved on {{release-1.3}} for the s3 filesystem.

> Don't attempt to delete parent directory on S3
> --
>
> Key: FLINK-7266
> URL: https://issues.apache.org/jira/browse/FLINK-7266
> Project: Flink
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.3.1
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
>Priority: Critical
> Fix For: 1.4.0, 1.3.2
>
>
> Currently, every attempted release of an S3 state object also checks if the 
> "parent directory" is empty and then tries to delete it.
> Not only is that unnecessary on S3, but it is prohibitively expensive and for 
> example causes S3 to throttle calls by the JobManager on checkpoint cleanup.
> The {{FileState}} must only attempt parent directory cleanup when operating 
> against real file systems, not when operating against object stores.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-7266) Don't attempt to delete parent directory on S3

2017-08-02 Thread Stefan Richter (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16110547#comment-16110547
 ] 

Stefan Richter commented on FLINK-7266:
---

I agree that we should not block the release on this. Just wanted to have this 
recorded with this issue, so that we can improve it for the future.

> Don't attempt to delete parent directory on S3
> --
>
> Key: FLINK-7266
> URL: https://issues.apache.org/jira/browse/FLINK-7266
> Project: Flink
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.3.1
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
>Priority: Critical
> Fix For: 1.4.0, 1.3.2
>
>
> Currently, every attempted release of an S3 state object also checks if the 
> "parent directory" is empty and then tries to delete it.
> Not only is that unnecessary on S3, but it is prohibitively expensive and for 
> example causes S3 to throttle calls by the JobManager on checkpoint cleanup.
> The {{FileState}} must only attempt parent directory cleanup when operating 
> against real file systems, not when operating against object stores.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-7266) Don't attempt to delete parent directory on S3

2017-08-02 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16110535#comment-16110535
 ] 

Stephan Ewen commented on FLINK-7266:
-

We could try and improve that by not dong the {{mkdirs()}} call in the stream 
factory for each state element. That might help with that, but I would consider 
this to not be a release blocker.

I would try and solve that in a more holistic way in 1.4.0, by extending the 
FileSystem abstraction and post-state release hooks in the Checkpoint 
Coordinator (so that there is one call to drop the directory marker file, if we 
cannot find a way for it to not be created in the first place.

> Don't attempt to delete parent directory on S3
> --
>
> Key: FLINK-7266
> URL: https://issues.apache.org/jira/browse/FLINK-7266
> Project: Flink
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.3.1
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
>Priority: Critical
> Fix For: 1.4.0, 1.3.2
>
>
> Currently, every attempted release of an S3 state object also checks if the 
> "parent directory" is empty and then tries to delete it.
> Not only is that unnecessary on S3, but it is prohibitively expensive and for 
> example causes S3 to throttle calls by the JobManager on checkpoint cleanup.
> The {{FileState}} must only attempt parent directory cleanup when operating 
> against real file systems, not when operating against object stores.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-7266) Don't attempt to delete parent directory on S3

2017-08-02 Thread Stefan Richter (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16110453#comment-16110453
 ] 

Stefan Richter commented on FLINK-7266:
---

One comment about the "not necessary on S3" part: during the release 1.3.2 
testing, I observed that I can see some empty directory entries remaining in 
S3. I added a screenshot in the testing document 
[here|https://docs.google.com/document/d/1dN9AM9FUPizIu4hTKAXJSbbAORRdrce-BqQ8AUHlOqE/edit?ts=59807985#].
 If this is not a problem, can the issue be closed or is the merge into 1.4 
still pending?





> Don't attempt to delete parent directory on S3
> --
>
> Key: FLINK-7266
> URL: https://issues.apache.org/jira/browse/FLINK-7266
> Project: Flink
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.3.1
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
>Priority: Critical
> Fix For: 1.4.0, 1.3.2
>
>
> Currently, every attempted release of an S3 state object also checks if the 
> "parent directory" is empty and then tries to delete it.
> Not only is that unnecessary on S3, but it is prohibitively expensive and for 
> example causes S3 to throttle calls by the JobManager on checkpoint cleanup.
> The {{FileState}} must only attempt parent directory cleanup when operating 
> against real file systems, not when operating against object stores.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)