[jira] [Commented] (FLINK-5129) make the BlobServer use a distributed file system

2022-01-12 Thread lupan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-5129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17475041#comment-17475041
 ] 

lupan commented on FLINK-5129:
--

OK. Thanks. I have submitted a new issue. 
https://issues.apache.org/jira/browse/FLINK-25602

> make the BlobServer use a distributed file system
> -
>
> Key: FLINK-5129
> URL: https://issues.apache.org/jira/browse/FLINK-5129
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Network
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
> Fix For: 1.3.0
>
> Attachments: image-2022-01-11-11-27-59-280.png, 
> image-2022-01-13-09-26-15-252.png
>
>
> Currently, the BlobServer uses a local storage and, in addition when the HA 
> mode is set, a distributed file system, e.g. hdfs. This, however, is only 
> used by the JobManager and all TaskManager instances request blobs from the 
> JobManager. By using the distributed file system there as well, we would 
> lower the load on the JobManager and increase scalability.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-5129) make the BlobServer use a distributed file system

2022-01-12 Thread Nico Kruber (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-5129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17474622#comment-17474622
 ] 

Nico Kruber commented on FLINK-5129:


No, blob.storage.directory has to be a local path. I also suggest writing to 
the user/dev mailing list instead of adding comments to an old (and closed) 
ticket

> make the BlobServer use a distributed file system
> -
>
> Key: FLINK-5129
> URL: https://issues.apache.org/jira/browse/FLINK-5129
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Network
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
> Fix For: 1.3.0
>
> Attachments: image-2022-01-11-11-27-59-280.png
>
>
> Currently, the BlobServer uses a local storage and, in addition when the HA 
> mode is set, a distributed file system, e.g. hdfs. This, however, is only 
> used by the JobManager and all TaskManager instances request blobs from the 
> JobManager. By using the distributed file system there as well, we would 
> lower the load on the JobManager and increase scalability.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-5129) make the BlobServer use a distributed file system

2022-01-10 Thread lupan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-5129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17472435#comment-17472435
 ] 

lupan commented on FLINK-5129:
--

Does blob.storage.directory already support AWS S3 ?

When I use the following configuration:
{code:java}
blob.storage.directory: s3://iceberg-bucket/flink/blob {code}
I get the following error:
{code:java}
taskmanager    | 2022-01-11 02:41:11,460 ERROR 
org.apache.flink.runtime.taskexecutor.TaskManagerRunner      [] - Terminating 
TaskManagerRunner with exit code 1.
taskmanager    | org.apache.flink.util.FlinkException: Failed to start the 
TaskManagerRunner.
taskmanager    |        at 
org.apache.flink.runtime.taskexecutor.TaskManagerRunner.runTaskManager(TaskManagerRunner.java:374)
 ~[flink-dist_2.12-1.13.3.jar:1.13.3]
taskmanager    |        at 
org.apache.flink.runtime.taskexecutor.TaskManagerRunner.lambda$runTaskManagerProcessSecurely$3(TaskManagerRunner.java:413)
 ~[flink-dist_2.12-1.13.3.jar:1.13.3]
taskmanager    |        at 
org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28)
 ~[flink-dist_2.12-1.13.3.jar:1.13.3]
taskmanager    |        at 
org.apache.flink.runtime.taskexecutor.TaskManagerRunner.runTaskManagerProcessSecurely(TaskManagerRunner.java:413)
 [flink-dist_2.12-1.13.3.jar:1.13.3]
taskmanager    |        at 
org.apache.flink.runtime.taskexecutor.TaskManagerRunner.runTaskManagerProcessSecurely(TaskManagerRunner.java:396)
 [flink-dist_2.12-1.13.3.jar:1.13.3]
taskmanager    |        at 
org.apache.flink.runtime.taskexecutor.TaskManagerRunner.main(TaskManagerRunner.java:354)
 [flink-dist_2.12-1.13.3.jar:1.13.3]
taskmanager    | Caused by: java.io.IOException: Could not create storage 
directory for BLOB store in 's3:/iceberg-bucket/flink/blob'.
taskmanager    |        at 
org.apache.flink.runtime.blob.BlobUtils.initLocalStorageDirectory(BlobUtils.java:139)
 ~[flink-dist_2.12-1.13.3.jar:1.13.3]
taskmanager    |        at 
org.apache.flink.runtime.blob.AbstractBlobCache.(AbstractBlobCache.java:89)
 ~[flink-dist_2.12-1.13.3.jar:1.13.3]
taskmanager    |        at 
org.apache.flink.runtime.blob.PermanentBlobCache.(PermanentBlobCache.java:93)
 ~[flink-dist_2.12-1.13.3.jar:1.13.3]
taskmanager    |        at 
org.apache.flink.runtime.blob.BlobCacheService.(BlobCacheService.java:55) 
~[flink-dist_2.12-1.13.3.jar:1.13.3]
taskmanager    |        at 
org.apache.flink.runtime.taskexecutor.TaskManagerRunner.(TaskManagerRunner.java:169)
 ~[flink-dist_2.12-1.13.3.jar:1.13.3]
taskmanager    |        at 
org.apache.flink.runtime.taskexecutor.TaskManagerRunner.runTaskManager(TaskManagerRunner.java:367)
 ~[flink-dist_2.12-1.13.3.jar:1.13.3]
taskmanager    |        ... 5 more{code}

> make the BlobServer use a distributed file system
> -
>
> Key: FLINK-5129
> URL: https://issues.apache.org/jira/browse/FLINK-5129
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Network
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
> Fix For: 1.3.0
>
> Attachments: image-2022-01-11-11-27-59-280.png
>
>
> Currently, the BlobServer uses a local storage and, in addition when the HA 
> mode is set, a distributed file system, e.g. hdfs. This, however, is only 
> used by the JobManager and all TaskManager instances request blobs from the 
> JobManager. By using the distributed file system there as well, we would 
> lower the load on the JobManager and increase scalability.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-5129) make the BlobServer use a distributed file system

2017-02-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15873910#comment-15873910
 ] 

ASF GitHub Bot commented on FLINK-5129:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/3084


> make the BlobServer use a distributed file system
> -
>
> Key: FLINK-5129
> URL: https://issues.apache.org/jira/browse/FLINK-5129
> Project: Flink
>  Issue Type: Improvement
>  Components: Network
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>
> Currently, the BlobServer uses a local storage and, in addition when the HA 
> mode is set, a distributed file system, e.g. hdfs. This, however, is only 
> used by the JobManager and all TaskManager instances request blobs from the 
> JobManager. By using the distributed file system there as well, we would 
> lower the load on the JobManager and increase scalability.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-5129) make the BlobServer use a distributed file system

2017-02-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15872122#comment-15872122
 ] 

ASF GitHub Bot commented on FLINK-5129:
---

Github user StephanEwen commented on the issue:

https://github.com/apache/flink/pull/3084
  
Good change, thanks!

Merging this...


> make the BlobServer use a distributed file system
> -
>
> Key: FLINK-5129
> URL: https://issues.apache.org/jira/browse/FLINK-5129
> Project: Flink
>  Issue Type: Improvement
>  Components: Network
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>
> Currently, the BlobServer uses a local storage and, in addition when the HA 
> mode is set, a distributed file system, e.g. hdfs. This, however, is only 
> used by the JobManager and all TaskManager instances request blobs from the 
> JobManager. By using the distributed file system there as well, we would 
> lower the load on the JobManager and increase scalability.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-5129) make the BlobServer use a distributed file system

2017-01-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15812063#comment-15812063
 ] 

ASF GitHub Bot commented on FLINK-5129:
---

Github user NicoK commented on the issue:

https://github.com/apache/flink/pull/3076
  
fixed a typo in the unit test that lead to the tests passing although there 
was still something wrong which is now fixed as well


> make the BlobServer use a distributed file system
> -
>
> Key: FLINK-5129
> URL: https://issues.apache.org/jira/browse/FLINK-5129
> Project: Flink
>  Issue Type: Improvement
>  Components: Network
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>
> Currently, the BlobServer uses a local storage and, in addition when the HA 
> mode is set, a distributed file system, e.g. hdfs. This, however, is only 
> used by the JobManager and all TaskManager instances request blobs from the 
> JobManager. By using the distributed file system there as well, we would 
> lower the load on the JobManager and increase scalability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-5129) make the BlobServer use a distributed file system

2017-01-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15812074#comment-15812074
 ] 

ASF GitHub Bot commented on FLINK-5129:
---

GitHub user NicoK opened a pull request:

https://github.com/apache/flink/pull/3084

[FLINK-5129] make the BlobServer use a distributed file system

Make the BlobCache use the BlobServer's distributed file system in HA mode: 
previously even in HA mode and if the cache has access to the file system, it 
would download BLOBs from one central BlobServer. By using the distributed file 
system beneath we may leverage its scalability and remove a single point of 
(performance) failure. If the distributed file system is not accessible at the 
blob
caches, the old behaviour is used.

@uce can you have a look?
(this is an updated and fixed version of 
https://github.com/apache/flink/pull/3076)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/NicoK/flink FLINK-5129a

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/3084.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3084


commit 464f2c834688507c67acb3ad584827132ebe444e
Author: Nico Kruber 
Date:   2016-11-22T11:49:03Z

[hotfix] remove unused package-private BlobUtils#copyFromRecoveryPath

This was actually the same implementation as
FileSystemBlobStore#get(java.lang.String, java.io.File) and either of the 
two
could have been removed but the implementation makes most sense at the
concrete file system abstraction layer, i.e. in FileSystemBlobStore.

commit 2ebffd4c2d499b61f164b4d54dc86c9d44b9c0ea
Author: Nico Kruber 
Date:   2016-11-23T15:11:35Z

[hotfix] do not create intermediate strings inside String.format in 
BlobUtils

commit 36ab6121e336f63138e442ea48a751ede7fb04c3
Author: Nico Kruber 
Date:   2016-11-24T16:11:19Z

[hotfix] properly shut down the BlobServer in BlobServerRangeTest

commit c8c12c67ae875ca5c96db78375bef880cf2a3c59
Author: Nico Kruber 
Date:   2017-01-05T17:06:01Z

[hotfix] use JUnit's TemporaryFolder in BlobRecoveryITCase, too

This makes cleaning up simpler.

commit a078cb0c26071fe70e3668d23d0c8bef8550892f
Author: Nico Kruber 
Date:   2017-01-05T17:27:00Z

[hotfix] add a missing "'" to the BlobStore class

commit a643f0b989c640a81b112ad14ae27a2a2b1ab257
Author: Nico Kruber 
Date:   2017-01-05T17:07:13Z

[FLINK-5129] BlobServer: include the cluster id in the HA storage path for 
blobs

This applies to the ZookeeperHaServices implementation.

commit 7d832919040059961940fc96d0cdb285bc9f77d3
Author: Nico Kruber 
Date:   2017-01-05T17:18:10Z

[FLINK-5129] unify duplicate code between the BlobServer and 
ZookeeperHaServices

(this was introduced by c64860677f)

commit 19879a01b99c4772a09627eb5f380f794f6c1e27
Author: Nico Kruber 
Date:   2016-11-30T13:52:12Z

[hotfix] add some more documentation in BlobStore-related classes

commit 80c17ef83104d1186c06d8f5d4cde11e4b05f2b8
Author: Nico Kruber 
Date:   2017-01-06T10:55:23Z

[hotfix] minor code beautifications when checking parameters

+ also check the blobService parameter in BlobLibraryCacheManager

commit ff920e48bd69acef280bdef2a12e5f5f9cca3a88
Author: Nico Kruber 
Date:   2017-01-06T13:21:42Z

[FLINK-5129] let BlobUtils#initStorageDirectory() throw a proper IOException

commit c8e2815787338f52e5ad369bcaedb1798284dd29
Author: Nico Kruber 
Date:   2017-01-06T13:59:51Z

[hotfix] simplify code in BlobCache#deleteGlobal()

Also, re-order the code so that a local delete is always tried before 
creating
a connection to the BlobServer. If that fails, the local file is deleted at
least.

commit 5cd1c20aa604a9556c069ab78d8e471fa058499e
Author: Nico Kruber 
Date:   2016-11-29T17:11:06Z

[hotfix] re-use some code in BlobServerDeleteTest

commit d39948a6baa0cd6f68c4dfd8daffdd65e573fbca
Author: Nico Kruber 
Date:   2016-11-30T13:35:38Z

[hotfix] improve some failure messages in the BlobService's HA unit tests

commit dc87ae36088cc48a4122351ebe5b09a31d7fba41
Author: Nico Kruber 
Date:   2017-01-06T14:06:30Z

[FLINK-5129] make the BlobCache also use a distributed file system in HA 
mode

If available (in HA mode), download the jar files from the distributed file
system directly instead of querying the BlobServer. This way the load is 
more
distributed among the nodes of the file system (depending on its 
implementation
of course) compared to putting all the burden on a single BlobServer.

commit 389eaa9779d4bf22cc3972208d4f35ac7a966f5c
Author: Nico Kruber 
Date:   2017-01-06T16:21:05Z

[FLINK-5129] add unit tests for the BlobCache accessing the distributed FS 
directly

[jira] [Commented] (FLINK-5129) make the BlobServer use a distributed file system

2017-01-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15811563#comment-15811563
 ] 

ASF GitHub Bot commented on FLINK-5129:
---

Github user NicoK closed the pull request at:

https://github.com/apache/flink/pull/3076


> make the BlobServer use a distributed file system
> -
>
> Key: FLINK-5129
> URL: https://issues.apache.org/jira/browse/FLINK-5129
> Project: Flink
>  Issue Type: Improvement
>  Components: Network
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>
> Currently, the BlobServer uses a local storage and, in addition when the HA 
> mode is set, a distributed file system, e.g. hdfs. This, however, is only 
> used by the JobManager and all TaskManager instances request blobs from the 
> JobManager. By using the distributed file system there as well, we would 
> lower the load on the JobManager and increase scalability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-5129) make the BlobServer use a distributed file system

2017-01-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15804969#comment-15804969
 ] 

ASF GitHub Bot commented on FLINK-5129:
---

GitHub user NicoK opened a pull request:

https://github.com/apache/flink/pull/3076

[FLINK-5129] make the BlobServer use a distributed file system

Make the BlobCache use the BlobServer's distributed file system in HA mode: 
previously even in HA mode and if the cache has access to the file system, it 
would download BLOBs from one central BlobServer. By using the distributed file 
system beneath we may leverage its scalability and remove a single point of 
(performance) failure. If the distributed file system is not accessible at the 
blob
caches, the old behaviour is used.

@uce can you have a look?

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/NicoK/flink FLINK-5129a

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/3076.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3076


commit 464f2c834688507c67acb3ad584827132ebe444e
Author: Nico Kruber 
Date:   2016-11-22T11:49:03Z

[hotfix] remove unused package-private BlobUtils#copyFromRecoveryPath

This was actually the same implementation as
FileSystemBlobStore#get(java.lang.String, java.io.File) and either of the 
two
could have been removed but the implementation makes most sense at the
concrete file system abstraction layer, i.e. in FileSystemBlobStore.

commit 2ebffd4c2d499b61f164b4d54dc86c9d44b9c0ea
Author: Nico Kruber 
Date:   2016-11-23T15:11:35Z

[hotfix] do not create intermediate strings inside String.format in 
BlobUtils

commit 36ab6121e336f63138e442ea48a751ede7fb04c3
Author: Nico Kruber 
Date:   2016-11-24T16:11:19Z

[hotfix] properly shut down the BlobServer in BlobServerRangeTest

commit c8c12c67ae875ca5c96db78375bef880cf2a3c59
Author: Nico Kruber 
Date:   2017-01-05T17:06:01Z

[hotfix] use JUnit's TemporaryFolder in BlobRecoveryITCase, too

This makes cleaning up simpler.

commit a078cb0c26071fe70e3668d23d0c8bef8550892f
Author: Nico Kruber 
Date:   2017-01-05T17:27:00Z

[hotfix] add a missing "'" to the BlobStore class

commit a643f0b989c640a81b112ad14ae27a2a2b1ab257
Author: Nico Kruber 
Date:   2017-01-05T17:07:13Z

[FLINK-5129] BlobServer: include the cluster id in the HA storage path for 
blobs

This applies to the ZookeeperHaServices implementation.

commit 7d832919040059961940fc96d0cdb285bc9f77d3
Author: Nico Kruber 
Date:   2017-01-05T17:18:10Z

[FLINK-5129] unify duplicate code between the BlobServer and 
ZookeeperHaServices

(this was introduced by c64860677f)

commit 19879a01b99c4772a09627eb5f380f794f6c1e27
Author: Nico Kruber 
Date:   2016-11-30T13:52:12Z

[hotfix] add some more documentation in BlobStore-related classes

commit 80c17ef83104d1186c06d8f5d4cde11e4b05f2b8
Author: Nico Kruber 
Date:   2017-01-06T10:55:23Z

[hotfix] minor code beautifications when checking parameters

+ also check the blobService parameter in BlobLibraryCacheManager

commit ff920e48bd69acef280bdef2a12e5f5f9cca3a88
Author: Nico Kruber 
Date:   2017-01-06T13:21:42Z

[FLINK-5129] let BlobUtils#initStorageDirectory() throw a proper IOException

commit c8e2815787338f52e5ad369bcaedb1798284dd29
Author: Nico Kruber 
Date:   2017-01-06T13:59:51Z

[hotfix] simplify code in BlobCache#deleteGlobal()

Also, re-order the code so that a local delete is always tried before 
creating
a connection to the BlobServer. If that fails, the local file is deleted at
least.

commit 38626a705fd0725a8e54f2ee1c3d0ec410184b8a
Author: Nico Kruber 
Date:   2017-01-06T14:06:30Z

[FLINK-5129] make the BlobCache also use a distributed file system in HA 
mode

If available (in HA mode), download the jar files from the distributed file
system directly instead of querying the BlobServer. This way the load is 
more
distributed among the nodes of the file system (depending on its 
implementation
of course) compared to putting all the burden on a single BlobServer.

commit 1e86c5c92f9ac35c26c1e707d2d840c4edbeefb1
Author: Nico Kruber 
Date:   2016-11-29T17:11:06Z

[hotfix] re-use some code in BlobServerDeleteTest

commit 68d2959b60f6b583cb48de8ed5aee3e18b163082
Author: Nico Kruber 
Date:   2016-11-30T13:35:38Z

[hotfix] improve some failure messages in the BlobService's HA unit tests

commit 7cfbeb7707329cad57604a58f44254d4f8b6c9b3
Author: Nico Kruber 
Date:   2017-01-06T16:21:05Z

[FLINK-5129] add unit tests for the BlobCache accessing the distributed FS 
directly




> make the BlobServer use a distributed file system
> ---

[jira] [Commented] (FLINK-5129) make the BlobServer use a distributed file system

2017-01-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15801970#comment-15801970
 ] 

ASF GitHub Bot commented on FLINK-5129:
---

Github user NicoK commented on the issue:

https://github.com/apache/flink/pull/2891
  
I need to adapt a few things and choose a different approach - I'll re-open 
later


> make the BlobServer use a distributed file system
> -
>
> Key: FLINK-5129
> URL: https://issues.apache.org/jira/browse/FLINK-5129
> Project: Flink
>  Issue Type: Improvement
>  Components: Network
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>
> Currently, the BlobServer uses a local storage and, in addition when the HA 
> mode is set, a distributed file system, e.g. hdfs. This, however, is only 
> used by the JobManager and all TaskManager instances request blobs from the 
> JobManager. By using the distributed file system there as well, we would 
> lower the load on the JobManager and increase scalability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-5129) make the BlobServer use a distributed file system

2017-01-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15801971#comment-15801971
 ] 

ASF GitHub Bot commented on FLINK-5129:
---

Github user NicoK closed the pull request at:

https://github.com/apache/flink/pull/2891


> make the BlobServer use a distributed file system
> -
>
> Key: FLINK-5129
> URL: https://issues.apache.org/jira/browse/FLINK-5129
> Project: Flink
>  Issue Type: Improvement
>  Components: Network
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>
> Currently, the BlobServer uses a local storage and, in addition when the HA 
> mode is set, a distributed file system, e.g. hdfs. This, however, is only 
> used by the JobManager and all TaskManager instances request blobs from the 
> JobManager. By using the distributed file system there as well, we would 
> lower the load on the JobManager and increase scalability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-5129) make the BlobServer use a distributed file system

2016-12-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15769592#comment-15769592
 ] 

ASF GitHub Bot commented on FLINK-5129:
---

Github user NicoK commented on the issue:

https://github.com/apache/flink/pull/2891
  
despite the tests completing successfully, I do still need to check a few 
things:
- `BlobService#getURL()` may now return a URL for a distributed file 
system, however:
- related code, e.g. `java.io.File,` may not know how to handle HDFS URLs, 
for example :(


> make the BlobServer use a distributed file system
> -
>
> Key: FLINK-5129
> URL: https://issues.apache.org/jira/browse/FLINK-5129
> Project: Flink
>  Issue Type: Improvement
>  Components: Network
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>
> Currently, the BlobServer uses a local storage and, in addition when the HA 
> mode is set, a distributed file system, e.g. hdfs. This, however, is only 
> used by the JobManager and all TaskManager instances request blobs from the 
> JobManager. By using the distributed file system there as well, we would 
> lower the load on the JobManager and increase scalability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-5129) make the BlobServer use a distributed file system

2016-11-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15708696#comment-15708696
 ] 

ASF GitHub Bot commented on FLINK-5129:
---

Github user NicoK commented on the issue:

https://github.com/apache/flink/pull/2891
  
Sorry for the hassle, found a regression and added a fix plus an 
appropriate test for it. Should be fine now.


> make the BlobServer use a distributed file system
> -
>
> Key: FLINK-5129
> URL: https://issues.apache.org/jira/browse/FLINK-5129
> Project: Flink
>  Issue Type: Improvement
>  Components: Network
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>
> Currently, the BlobServer uses a local storage and, in addition when the HA 
> mode is set, a distributed file system, e.g. hdfs. This, however, is only 
> used by the JobManager and all TaskManager instances request blobs from the 
> JobManager. By using the distributed file system there as well, we would 
> lower the load on the JobManager and increase scalability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-5129) make the BlobServer use a distributed file system

2016-11-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15702692#comment-15702692
 ] 

ASF GitHub Bot commented on FLINK-5129:
---

GitHub user NicoK opened a pull request:

https://github.com/apache/flink/pull/2891

[FLINK-5129] make the BlobServer use a distributed file system

Previously, the BlobServer held a local copy and in case high availability 
(HA)
is set, it also copied jar files to a distributed file system. Upon restore,
these files were copied to local store from which they are used.

This PR abstracts the BlobServer's backing file system and makes it use the
distributed file system directly in HA mode, i.e. without the local file 
system
copy. Other than that the behaviour should not change.

Secondly, BlobCache instances at the task managers also make use of this
distributed file system and download files from there instead of bothering
the blob server. As before, however, distributed files may only be deleted
by the blob server. If the distributed file system is not accessible at the 
blob
caches, the old behaviour is used.

* BlobServer: include the cluster id in the HA storage path for blobs
* make the BlobServer use the HA filesystem back-end properly:
* make the BlobCache also use a distributed file system in HA mode

@uce can you have a look?

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/NicoK/flink FLINK-5129

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2891.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2891


commit b65e74dd92bdf74b2816a0d8a26a5ebaa25ca586
Author: Nico Kruber 
Date:   2016-11-22T11:49:03Z

[hotfix] remove unused package-private BlobUtils#copyFromRecoveryPath

This was actually the same implementation as
FileSystemBlobStore#get(java.lang.String, java.io.File) and either of the 
two
could have been removed but the implementation makes most sense at the
concrete file system abstraction layer, i.e. in FileSystemBlobStore.

commit 09bdd49e6282268fd9c1b2672f0ea6222e097ca2
Author: Nico Kruber 
Date:   2016-11-23T15:11:35Z

[hotfix] do not create intermediate strings inside String.format in 
BlobUtils

commit 93938ff97fef9e39c17ac795e1e89ca9de25e028
Author: Nico Kruber 
Date:   2016-11-24T16:11:19Z

[hotfix] properly shut down the BlobServer in BlobServerRangeTest

commit c0c9d2239a767154d6071171d4c33e762e01aa62
Author: Nico Kruber 
Date:   2016-11-24T17:50:43Z

[FLINK-5129] BlobServer: include the cluster id in the HA storage path for 
blobs

Also use JUnit's TemporaryFolder in BlobRecoveryITCase, too. This makes
cleaning up simpler.

commit 8b9c7d9fd6e1ab3c7f2175a31d0e29b41b01cc61
Author: Nico Kruber 
Date:   2016-11-23T18:50:52Z

[FLINK-5129] make the BlobCache use the HA filesystem back-end properly

Previously, the BlobServer holds a local copy and in case high availability 
(HA)
is set, it also copies jar files to a distributed file system. Upon restore,
these files are copied to local store from which they are used.

This commit abstracts the BlobServer's backing file system and makes it use 
the
distributed file system directly in HA mode, i.e. without the local file 
system
copy. Other than that the behaviour does not change.

commit 249b2ea48f19c54498faa56ad45d299efaad4521
Author: Nico Kruber 
Date:   2016-11-25T16:42:05Z

[FLINK-5129] make the BlobCache also use a distributed file system in HA 
mode

* re-factor the file system abstraction in FileSystemBlobStore so that it 
can
  be used by the task managers, too, which should not be able to delete 
files
  in a distributed file system shared among different nodes
* only download blobs from the blob server if not in HA mode or the 
distributed
  file system is not accessible by the BlobCache, e.g. at the task managers

commit dd69f65a47205eb55ac8cc2c8f3aa9f7232dc8ba
Author: Nico Kruber 
Date:   2016-11-28T10:42:13Z

[FLINK-5129] restore non-HA mode unique directory setup in the blob server 
and cache

If not in high availability mode, local (and now also distributed) file 
systems
again try to set up a unique directory structure so that other instances 
with
the same configuration file or storage path do not interfere.

This was lost in 8b9c7d9fd6.

commit 76ccc9ffaaa63d6e0bd55ba7f6c08f8c1cff98cb
Author: Nico Kruber 
Date:   2016-11-28T15:19:20Z

[hotfix] add a missing "'" to FileSystemBlobStore

commit 53702add38d1087062e84a7e804b08920dfc0c23
Author: Nico Kruber 
Date:   2016-11-28T15:41:11Z

[FLINK-5129] move path-related methods f