Re: Shipping Filesystem Plugins with YarnClusterDescriptor

2020-06-11 Thread Kostas Kloudas
Hi John,

I think that using different plugins is not going to be an issue,
assuming that the scheme of your FS's do not collide. This is already
the case for S3 within Flink, where we have 2 implementations, one
based on Presto and one based on Hadoop. For the first you can use the
scheme s3p while for the latter s3a.

Now for different versions of the same plugin, this can be an issue in
the case that all of them are present concurrently in your plugins
directory. But is this the case, or only the latest version of a given
plugin is present?

Keep in mind that after uploading, the "remote" plugins dir is not
shared among applications but it is "private" to each one of them.

Cheers,
Kostas

On Thu, Jun 11, 2020 at 5:12 PM John Mathews  wrote:
>
> So I think that will work, but it has some limitations. Namely, when 
> launching clusters through a service (which is our use case), it can be the 
> case that multiple different clients want clusters with different plugins or 
> different versions of a given plugin, but because the FlinkClusterDescriptor 
> currently reads where to get the plugins to ship from an environment 
> variable, there is a race condition where that directory could contain 
> plugins from multiple different in-flight requests to spin up a cluster.
>
> I think a possible solution is to expose configuration on the 
> YarnClusterDescriptor that is similar to the shipFiles list, but is instead a 
> shipPlugins list, that way, the plugins that get shipping are per yarn 
> cluster request instead of on a global level.
>
> Do you see any workarounds for the issue I described? Also, does the idea I 
> propose make sense as a solution?
>
>
>
> On Wed, Jun 10, 2020 at 9:16 PM Yangze Guo  wrote:
>>
>> Hi, John,
>>
>> AFAIK, Flink will automatically help you to ship the "plugins/"
>> directory of your Flink distribution to Yarn[1]. So, you just need to
>> make a directory in "plugins/" and put your custom jar into it. Do you
>> meet any problem with this approach?
>>
>> [1] 
>> https://github.com/apache/flink/blob/216f65fff10fb0957e324570662d075be66bacdf/flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java#L770
>>
>> Best,
>> Yangze Guo
>>
>> On Wed, Jun 10, 2020 at 11:29 PM John Mathews  wrote:
>> >
>> > Hello,
>> >
>> > I have a custom filesystem that I am trying to migrate to the plugins 
>> > model described here: 
>> > https://ci.apache.org/projects/flink/flink-docs-stable/ops/filesystems/#adding-a-new-pluggable-file-system-implementation,
>> >  but it is unclear to me how to dynamically get the plugins directory to 
>> > be available when launching using a Yarn Cluster Descriptor. One thought 
>> > was to add the plugins to the shipFilesList, but I don't think that would 
>> > result in the plugins being in the correct directory location for Flink to 
>> > discover it.
>> >
>> > Is there another way to get the plugins onto the host when launching the 
>> > cluster? Or is there a different recommended way of doing this? Happy to 
>> > answer any questions if something is unclear.
>> >
>> > Thanks so much for your help!
>> >
>> > John


Re: Shipping Filesystem Plugins with YarnClusterDescriptor

2020-06-11 Thread John Mathews
So I think that will work, but it has some limitations. Namely, when
launching clusters through a service (which is our use case), it can be the
case that multiple different clients want clusters with different plugins
or different versions of a given plugin, but because the
FlinkClusterDescriptor currently reads where to get the plugins to ship
from an environment variable, there is a race condition where that
directory could contain plugins from multiple different in-flight requests
to spin up a cluster.

I think a possible solution is to expose configuration on the
YarnClusterDescriptor that is similar to the shipFiles list, but is instead
a shipPlugins list, that way, the plugins that get shipping are per yarn
cluster request instead of on a global level.

Do you see any workarounds for the issue I described? Also, does the idea I
propose make sense as a solution?



On Wed, Jun 10, 2020 at 9:16 PM Yangze Guo  wrote:

> Hi, John,
>
> AFAIK, Flink will automatically help you to ship the "plugins/"
> directory of your Flink distribution to Yarn[1]. So, you just need to
> make a directory in "plugins/" and put your custom jar into it. Do you
> meet any problem with this approach?
>
> [1]
> https://github.com/apache/flink/blob/216f65fff10fb0957e324570662d075be66bacdf/flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java#L770
>
> Best,
> Yangze Guo
>
> On Wed, Jun 10, 2020 at 11:29 PM John Mathews 
> wrote:
> >
> > Hello,
> >
> > I have a custom filesystem that I am trying to migrate to the plugins
> model described here:
> https://ci.apache.org/projects/flink/flink-docs-stable/ops/filesystems/#adding-a-new-pluggable-file-system-implementation,
> but it is unclear to me how to dynamically get the plugins directory to be
> available when launching using a Yarn Cluster Descriptor. One thought was
> to add the plugins to the shipFilesList, but I don't think that would
> result in the plugins being in the correct directory location for Flink to
> discover it.
> >
> > Is there another way to get the plugins onto the host when launching the
> cluster? Or is there a different recommended way of doing this? Happy to
> answer any questions if something is unclear.
> >
> > Thanks so much for your help!
> >
> > John
>


Re: Shipping Filesystem Plugins with YarnClusterDescriptor

2020-06-10 Thread Yangze Guo
Hi, John,

AFAIK, Flink will automatically help you to ship the "plugins/"
directory of your Flink distribution to Yarn[1]. So, you just need to
make a directory in "plugins/" and put your custom jar into it. Do you
meet any problem with this approach?

[1] 
https://github.com/apache/flink/blob/216f65fff10fb0957e324570662d075be66bacdf/flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java#L770

Best,
Yangze Guo

On Wed, Jun 10, 2020 at 11:29 PM John Mathews  wrote:
>
> Hello,
>
> I have a custom filesystem that I am trying to migrate to the plugins model 
> described here: 
> https://ci.apache.org/projects/flink/flink-docs-stable/ops/filesystems/#adding-a-new-pluggable-file-system-implementation,
>  but it is unclear to me how to dynamically get the plugins directory to be 
> available when launching using a Yarn Cluster Descriptor. One thought was to 
> add the plugins to the shipFilesList, but I don't think that would result in 
> the plugins being in the correct directory location for Flink to discover it.
>
> Is there another way to get the plugins onto the host when launching the 
> cluster? Or is there a different recommended way of doing this? Happy to 
> answer any questions if something is unclear.
>
> Thanks so much for your help!
>
> John


Shipping Filesystem Plugins with YarnClusterDescriptor

2020-06-10 Thread John Mathews
Hello,

I have a custom filesystem that I am trying to migrate to the plugins model
described here:
https://ci.apache.org/projects/flink/flink-docs-stable/ops/filesystems/#adding-a-new-pluggable-file-system-implementation,
but it is unclear to me how to dynamically get the plugins directory to be
available when launching using a Yarn Cluster Descriptor. One thought was
to add the plugins to the shipFilesList, but I don't think that would
result in the plugins being in the correct directory location for Flink to
discover it.

Is there another way to get the plugins onto the host when launching the
cluster? Or is there a different recommended way of doing this? Happy to
answer any questions if something is unclear.

Thanks so much for your help!

John