Re: [DISCUSS] Support configure remote flink jar

2019-11-23 Thread Rong Rong
Thanks @Tison for starting the discussion and sorry for joining so late.

Yes, I think this is a very good idea. we already tweak the flink-yarn
package internally to support something similar to what @Thomas mentioned:
to support registering a Jar that has already uploaded to some DFS
(needless to be the Yarn public cache discussed in FLINK-13938).
The reason is that: we provide our internal packaged extension libraries
for our customers. And we've seen good performance improvement in our YARN
cluster during container localization phase after our customer switch to
use pre-uploaded JARs instead of having to upload every time during
deployment.

Looking forward for this feature!

--
Rong


On Tue, Nov 19, 2019 at 10:19 PM tison  wrote:

> Thanks for your participation!
>
> @Yang: Great to hear. I'd like to know whether or not a remote flink jar
> path conflicts with FLINK-13938. IIRC FLINK-13938 auto excludes local
> flink jar from shipping which possibly not works for the remote one.
>
> @Thomas: It inspires a lot URL becomes the unified representation of
> resource. I'm thinking of how to serve a unique process getting resource
> from URL which points to an artifact or distributed file system.
>
> @ouywl & Stephan: Yes this improvement can be migrated to environment like
> k8s, IIRC the k8s proposal already discussed about improvement using "init
> container" and other technologies. However, so far I regard it is an
> improvement different from one storage to another so that we achieve then
> individually.
>
>
> Best,
> tison.
>
>
> Stephan Ewen  于2019年11月20日周三 上午12:34写道:
>
>> Would that be a feature specific to Yarn? (and maybe standalone sessions)
>>
>> For containerized setups, and init container seems like a nice way to
>> solve this. Also more flexible, when it comes to supporting authentication
>> mechanisms for the target storage system, etc.
>>
>> On Tue, Nov 19, 2019 at 5:29 PM ouywl  wrote:
>>
>>> I have implemented this feature in our env, Use ‘Init Container’ of
>>> docker to get URL of a jar file ,It seems a good idea.
>>>
>>> ouywl
>>> ou...@139.com
>>>
>>> 
>>> 签名由 网易邮箱大师  定制
>>>
>>> On 11/19/2019 12:11,Thomas Weise 
>>> wrote:
>>>
>>> There is a related use case (not specific to HDFS) that I came across:
>>>
>>> It would be nice if the jar upload endpoint could accept the URL of a
>>> jar file as alternative to the jar file itself. Such URL could point to an
>>> artifactory or distributed file system.
>>>
>>> Thomas
>>>
>>>
>>> On Mon, Nov 18, 2019 at 7:40 PM Yang Wang  wrote:
>>>
 Hi tison,

 Thanks for your starting this discussion.
 * For user customized flink-dist jar, it is an useful feature. Since it
 could avoid to upload the flink-dist jar
 every time. Especially in production environment, it could accelerate
 the
 submission process.
 * For the standard flink-dist jar, FLINK-13938[1] could solve
 the problem.Upload a official flink release
 binary to distributed storage(hdfs) first, and then all the submission
 could benefit from it. Users could
 also upload the customized flink-dist jar to accelerate their
 submission.

 If the flink-dist jar could be specified to a remote path, maybe the
 user
 jar have the same situation.

 [1]. https://issues.apache.org/jira/browse/FLINK-13938

 tison  于2019年11月19日周二 上午11:17写道:

 > Hi forks,
 >
 > Recently, our customers ask for a feature configuring remote flink
 jar.
 > I'd like to reach to you guys
 > to see whether or not it is a general need.
 >
 > ATM Flink only supports configures local file as flink jar via `-yj`
 > option. If we pass a HDFS file
 > path, due to implementation detail it will fail with
 > IllegalArgumentException. In the story we support
 > configure remote flink jar, this limitation is eliminated. We also
 make
 > use of YARN locality so that
 > reducing uploading overhead, instead, asking YARN to localize the jar
 on
 > AM container started.
 >
 > Besides, it possibly has overlap with FLINK-13938. I'd like to put the
 > discussion on our
 > mailing list first.
 >
 > Are you looking forward to such a feature?
 >
 > @Yang Wang: this feature is different from that we discussed offline,
 it
 > only focuses on flink jar, not
 > all ship files.
 >
 > Best,
 > tison.
 >

>>>


Re: [DISCUSS] Support configure remote flink jar

2019-11-19 Thread tison
Thanks for your participation!

@Yang: Great to hear. I'd like to know whether or not a remote flink jar
path conflicts with FLINK-13938. IIRC FLINK-13938 auto excludes local flink
jar from shipping which possibly not works for the remote one.

@Thomas: It inspires a lot URL becomes the unified representation of
resource. I'm thinking of how to serve a unique process getting resource
from URL which points to an artifact or distributed file system.

@ouywl & Stephan: Yes this improvement can be migrated to environment like
k8s, IIRC the k8s proposal already discussed about improvement using "init
container" and other technologies. However, so far I regard it is an
improvement different from one storage to another so that we achieve then
individually.


Best,
tison.


Stephan Ewen  于2019年11月20日周三 上午12:34写道:

> Would that be a feature specific to Yarn? (and maybe standalone sessions)
>
> For containerized setups, and init container seems like a nice way to
> solve this. Also more flexible, when it comes to supporting authentication
> mechanisms for the target storage system, etc.
>
> On Tue, Nov 19, 2019 at 5:29 PM ouywl  wrote:
>
>> I have implemented this feature in our env, Use ‘Init Container’ of
>> docker to get URL of a jar file ,It seems a good idea.
>>
>> ouywl
>> ou...@139.com
>>
>> 
>> 签名由 网易邮箱大师  定制
>>
>> On 11/19/2019 12:11,Thomas Weise  wrote:
>>
>> There is a related use case (not specific to HDFS) that I came across:
>>
>> It would be nice if the jar upload endpoint could accept the URL of a jar
>> file as alternative to the jar file itself. Such URL could point to an
>> artifactory or distributed file system.
>>
>> Thomas
>>
>>
>> On Mon, Nov 18, 2019 at 7:40 PM Yang Wang  wrote:
>>
>>> Hi tison,
>>>
>>> Thanks for your starting this discussion.
>>> * For user customized flink-dist jar, it is an useful feature. Since it
>>> could avoid to upload the flink-dist jar
>>> every time. Especially in production environment, it could accelerate the
>>> submission process.
>>> * For the standard flink-dist jar, FLINK-13938[1] could solve
>>> the problem.Upload a official flink release
>>> binary to distributed storage(hdfs) first, and then all the submission
>>> could benefit from it. Users could
>>> also upload the customized flink-dist jar to accelerate their submission.
>>>
>>> If the flink-dist jar could be specified to a remote path, maybe the user
>>> jar have the same situation.
>>>
>>> [1]. https://issues.apache.org/jira/browse/FLINK-13938
>>>
>>> tison  于2019年11月19日周二 上午11:17写道:
>>>
>>> > Hi forks,
>>> >
>>> > Recently, our customers ask for a feature configuring remote flink jar.
>>> > I'd like to reach to you guys
>>> > to see whether or not it is a general need.
>>> >
>>> > ATM Flink only supports configures local file as flink jar via `-yj`
>>> > option. If we pass a HDFS file
>>> > path, due to implementation detail it will fail with
>>> > IllegalArgumentException. In the story we support
>>> > configure remote flink jar, this limitation is eliminated. We also make
>>> > use of YARN locality so that
>>> > reducing uploading overhead, instead, asking YARN to localize the jar
>>> on
>>> > AM container started.
>>> >
>>> > Besides, it possibly has overlap with FLINK-13938. I'd like to put the
>>> > discussion on our
>>> > mailing list first.
>>> >
>>> > Are you looking forward to such a feature?
>>> >
>>> > @Yang Wang: this feature is different from that we discussed offline,
>>> it
>>> > only focuses on flink jar, not
>>> > all ship files.
>>> >
>>> > Best,
>>> > tison.
>>> >
>>>
>>


Re: [DISCUSS] Support configure remote flink jar

2019-11-19 Thread Stephan Ewen
Would that be a feature specific to Yarn? (and maybe standalone sessions)

For containerized setups, and init container seems like a nice way to solve
this. Also more flexible, when it comes to supporting authentication
mechanisms for the target storage system, etc.

On Tue, Nov 19, 2019 at 5:29 PM ouywl  wrote:

> I have implemented this feature in our env, Use ‘Init Container’ of
> docker to get URL of a jar file ,It seems a good idea.
>
> ouywl
> ou...@139.com
>
> 
> 签名由 网易邮箱大师  定制
>
> On 11/19/2019 12:11,Thomas Weise  wrote:
>
> There is a related use case (not specific to HDFS) that I came across:
>
> It would be nice if the jar upload endpoint could accept the URL of a jar
> file as alternative to the jar file itself. Such URL could point to an
> artifactory or distributed file system.
>
> Thomas
>
>
> On Mon, Nov 18, 2019 at 7:40 PM Yang Wang  wrote:
>
>> Hi tison,
>>
>> Thanks for your starting this discussion.
>> * For user customized flink-dist jar, it is an useful feature. Since it
>> could avoid to upload the flink-dist jar
>> every time. Especially in production environment, it could accelerate the
>> submission process.
>> * For the standard flink-dist jar, FLINK-13938[1] could solve
>> the problem.Upload a official flink release
>> binary to distributed storage(hdfs) first, and then all the submission
>> could benefit from it. Users could
>> also upload the customized flink-dist jar to accelerate their submission.
>>
>> If the flink-dist jar could be specified to a remote path, maybe the user
>> jar have the same situation.
>>
>> [1]. https://issues.apache.org/jira/browse/FLINK-13938
>>
>> tison  于2019年11月19日周二 上午11:17写道:
>>
>> > Hi forks,
>> >
>> > Recently, our customers ask for a feature configuring remote flink jar.
>> > I'd like to reach to you guys
>> > to see whether or not it is a general need.
>> >
>> > ATM Flink only supports configures local file as flink jar via `-yj`
>> > option. If we pass a HDFS file
>> > path, due to implementation detail it will fail with
>> > IllegalArgumentException. In the story we support
>> > configure remote flink jar, this limitation is eliminated. We also make
>> > use of YARN locality so that
>> > reducing uploading overhead, instead, asking YARN to localize the jar on
>> > AM container started.
>> >
>> > Besides, it possibly has overlap with FLINK-13938. I'd like to put the
>> > discussion on our
>> > mailing list first.
>> >
>> > Are you looking forward to such a feature?
>> >
>> > @Yang Wang: this feature is different from that we discussed offline, it
>> > only focuses on flink jar, not
>> > all ship files.
>> >
>> > Best,
>> > tison.
>> >
>>
>


Re: [DISCUSS] Support configure remote flink jar

2019-11-19 Thread ouywl







I have implemented this feature in our env, Use ‘Init Container’ of docker to get URL of a jar file ,It seems a good idea.






  










ouywl




ou...@139.com








签名由
网易邮箱大师
定制

 


On 11/19/2019 12:11,Thomas Weise wrote: 


There is a related use case (not specific to HDFS) that I came across:It would be nice if the jar upload endpoint could accept the URL of a jar file as alternative to the jar file itself. Such URL could point to an artifactory or distributed file system.ThomasOn Mon, Nov 18, 2019 at 7:40 PM Yang Wang  wrote:Hi tison,

Thanks for your starting this discussion.
* For user customized flink-dist jar, it is an useful feature. Since it
could avoid to upload the flink-dist jar
every time. Especially in production environment, it could accelerate the
submission process.
* For the standard flink-dist jar, FLINK-13938[1] could solve
the problem.Upload a official flink release
binary to distributed storage(hdfs) first, and then all the submission
could benefit from it. Users could
also upload the customized flink-dist jar to accelerate their submission.

If the flink-dist jar could be specified to a remote path, maybe the user
jar have the same situation.

[1]. https://issues.apache.org/jira/browse/FLINK-13938

tison  于2019年11月19日周二 上午11:17写道:

> Hi forks,
>
> Recently, our customers ask for a feature configuring remote flink jar.
> I'd like to reach to you guys
> to see whether or not it is a general need.
>
> ATM Flink only supports configures local file as flink jar via `-yj`
> option. If we pass a HDFS file
> path, due to implementation detail it will fail with
> IllegalArgumentException. In the story we support
> configure remote flink jar, this limitation is eliminated. We also make
> use of YARN locality so that
> reducing uploading overhead, instead, asking YARN to localize the jar on
> AM container started.
>
> Besides, it possibly has overlap with FLINK-13938. I'd like to put the
> discussion on our
> mailing list first.
>
> Are you looking forward to such a feature?
>
> @Yang Wang: this feature is different from that we discussed offline, it
> only focuses on flink jar, not
> all ship files.
>
> Best,
> tison.
>







Re: [DISCUSS] Support configure remote flink jar

2019-11-18 Thread Thomas Weise
There is a related use case (not specific to HDFS) that I came across:

It would be nice if the jar upload endpoint could accept the URL of a jar
file as alternative to the jar file itself. Such URL could point to an
artifactory or distributed file system.

Thomas


On Mon, Nov 18, 2019 at 7:40 PM Yang Wang  wrote:

> Hi tison,
>
> Thanks for your starting this discussion.
> * For user customized flink-dist jar, it is an useful feature. Since it
> could avoid to upload the flink-dist jar
> every time. Especially in production environment, it could accelerate the
> submission process.
> * For the standard flink-dist jar, FLINK-13938[1] could solve
> the problem.Upload a official flink release
> binary to distributed storage(hdfs) first, and then all the submission
> could benefit from it. Users could
> also upload the customized flink-dist jar to accelerate their submission.
>
> If the flink-dist jar could be specified to a remote path, maybe the user
> jar have the same situation.
>
> [1]. https://issues.apache.org/jira/browse/FLINK-13938
>
> tison  于2019年11月19日周二 上午11:17写道:
>
> > Hi forks,
> >
> > Recently, our customers ask for a feature configuring remote flink jar.
> > I'd like to reach to you guys
> > to see whether or not it is a general need.
> >
> > ATM Flink only supports configures local file as flink jar via `-yj`
> > option. If we pass a HDFS file
> > path, due to implementation detail it will fail with
> > IllegalArgumentException. In the story we support
> > configure remote flink jar, this limitation is eliminated. We also make
> > use of YARN locality so that
> > reducing uploading overhead, instead, asking YARN to localize the jar on
> > AM container started.
> >
> > Besides, it possibly has overlap with FLINK-13938. I'd like to put the
> > discussion on our
> > mailing list first.
> >
> > Are you looking forward to such a feature?
> >
> > @Yang Wang: this feature is different from that we discussed offline, it
> > only focuses on flink jar, not
> > all ship files.
> >
> > Best,
> > tison.
> >
>


Re: [DISCUSS] Support configure remote flink jar

2019-11-18 Thread Yang Wang
Hi tison,

Thanks for your starting this discussion.
* For user customized flink-dist jar, it is an useful feature. Since it
could avoid to upload the flink-dist jar
every time. Especially in production environment, it could accelerate the
submission process.
* For the standard flink-dist jar, FLINK-13938[1] could solve
the problem.Upload a official flink release
binary to distributed storage(hdfs) first, and then all the submission
could benefit from it. Users could
also upload the customized flink-dist jar to accelerate their submission.

If the flink-dist jar could be specified to a remote path, maybe the user
jar have the same situation.

[1]. https://issues.apache.org/jira/browse/FLINK-13938

tison  于2019年11月19日周二 上午11:17写道:

> Hi forks,
>
> Recently, our customers ask for a feature configuring remote flink jar.
> I'd like to reach to you guys
> to see whether or not it is a general need.
>
> ATM Flink only supports configures local file as flink jar via `-yj`
> option. If we pass a HDFS file
> path, due to implementation detail it will fail with
> IllegalArgumentException. In the story we support
> configure remote flink jar, this limitation is eliminated. We also make
> use of YARN locality so that
> reducing uploading overhead, instead, asking YARN to localize the jar on
> AM container started.
>
> Besides, it possibly has overlap with FLINK-13938. I'd like to put the
> discussion on our
> mailing list first.
>
> Are you looking forward to such a feature?
>
> @Yang Wang: this feature is different from that we discussed offline, it
> only focuses on flink jar, not
> all ship files.
>
> Best,
> tison.
>


[DISCUSS] Support configure remote flink jar

2019-11-18 Thread tison
Hi forks,

Recently, our customers ask for a feature configuring remote flink jar. I'd
like to reach to you guys
to see whether or not it is a general need.

ATM Flink only supports configures local file as flink jar via `-yj`
option. If we pass a HDFS file
path, due to implementation detail it will fail with
IllegalArgumentException. In the story we support
configure remote flink jar, this limitation is eliminated. We also make use
of YARN locality so that
reducing uploading overhead, instead, asking YARN to localize the jar on AM
container started.

Besides, it possibly has overlap with FLINK-13938. I'd like to put the
discussion on our
mailing list first.

Are you looking forward to such a feature?

@Yang Wang: this feature is different from that we discussed offline, it
only focuses on flink jar, not
all ship files.

Best,
tison.