Re: Detecting resources to stage

2019-11-27 Thread Gleb Kanterov
Agree, this makes sense.

On Wed, Nov 27, 2019 at 6:23 PM Luke Cwik  wrote:

> That looks good as well.
>
> I would suggest that we make the classpath scanning system pluggable using
> PipelineOptions. For example in GcpOptions[1], we use two default instance
> factories. The first one controls which class is used as the factory[2] and
> the second one instantiates an instance of that class and creates the
> credential[3]. The same strategy could be added where there is a default
> instance factory for the set of resources and another option which controls
> which class is instantiated to provide that default.
>
> Do you think that we could make the default always:
> new ClassGraph()
>   .addClassLoader(classLoader)
>   .getClasspathURLs();
>
> 1:
> https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/options/GcpOptions.java#L159
> 2:
> https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/options/GcpOptions.java#L144
> 3:
> https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/options/GcpOptions.java#L159
>
> On Wed, Nov 27, 2019 at 8:19 AM Gleb Kanterov  wrote:
>
>> I didn't think it through, but this is something I have in mind. Keep
>> existing implementation for URLClassLoader, and use URLClassLoader for
>> experimental support of Java 11.
>>
>> List urls;
>> if (classLoader instanceof URLClassLoader) {
>>   urls = Arrays.asList(((URLClassLoader) classLoader).getURLs());
>> } else {
>>   urls = new ClassGraph()
>>   .addClassLoader(classLoader)
>>   .getClasspathURLs();
>> }
>>
>> On Wed, Nov 27, 2019 at 4:16 PM Łukasz Gajowy 
>> wrote:
>>
>>> This looks promising. Do you think you could share your code as well?
>>>
>>> That part sounds very calming:
>>> "ClassGraph is fully compatible with the new JPMS module system (Project
>>> Jigsaw / JDK 9+), i.e. it can scan both the traditional classpath and the
>>> module path. However, the code is also fully backwards compatible with JDK
>>> 7 and JDK 8 (i.e. the code is compiled in Java 7 compatibility mode, and
>>> all interaction with the module system is implemented via reflection for
>>> backwards compatibility)."
>>>
>>> I'm currently working on rebuilding the classpath detection mechanism so
>>> that it scans java.class.path when URLClassLoader cannot be used (as Luke
>>> suggested) but if we decide to use classgraph it should be relatively easy
>>> to do that instead. Moreover, I want to enable the possibility of injecting
>>> any algorithm implementation through pipeline options - this will enable
>>> third-party vendors to inject their custom implementations if needed (SPI
>>> pattern that was mentioned at some point in a jira ticket). I think I'm
>>> pretty close to finishing that.
>>>
>>> Thanks!
>>>
>>> śr., 27 lis 2019 o 15:24 Gleb Kanterov  napisał(a):
>>>
 Today I tried using classgraph [1] library to scan classpath in Java 11
 instead of using URLClassLoader, and after that, the job worked on
 Dataflow. The logic of scanning classpath is pretty sophisticated [2], and
 classgraph doesn't have any dependencies. I'm wondering if we can relocate
 it to java-core jar and use in for non-URLClassLoaders?

 [1]: https://github.com/classgraph/classgraph
 [2]:
 https://github.com/classgraph/classgraph/blob/master/src/main/java/io/github/classgraph/Scanner.java

 On Fri, Nov 8, 2019 at 11:40 PM Luke Cwik  wrote:

> I believe the closest suggestion[1] we had that worked for Java 11 and
> maintained backwards compatibility was to use the URLClassLoader to infer
> the resources and if we couldn't do that then look at the java.class.path
> system property to do the inference otherwise fail and force the users to
> tell us what. There are too many scenarios where we will do it wrong
> because of how people package and deploy their code whether it is an
> embedded application server or some other application container with a
> security manager that will prevent us from doing the right thing.
>
> On Fri, Nov 8, 2019 at 10:31 AM Robert Bradshaw 
> wrote:
>
>> Note that resources are more properly tied to specific operations and
>> stages, not to the entire pipeline. This is especially true in the
>> face of libraries (which should have the ability to declare their own
>> resources) and cross-language.
>>
>> On Fri, Nov 8, 2019 at 10:19 AM Łukasz Gajowy 
>> wrote:
>> >
>> > I figured that it would be good to bump this thread for greater
>> visibility even though I don't have a strong opinion about this (yet -
>> hopefully, I will know more later to 

Re: Detecting resources to stage

2019-11-27 Thread Luke Cwik
That looks good as well.

I would suggest that we make the classpath scanning system pluggable using
PipelineOptions. For example in GcpOptions[1], we use two default instance
factories. The first one controls which class is used as the factory[2] and
the second one instantiates an instance of that class and creates the
credential[3]. The same strategy could be added where there is a default
instance factory for the set of resources and another option which controls
which class is instantiated to provide that default.

Do you think that we could make the default always:
new ClassGraph()
  .addClassLoader(classLoader)
  .getClasspathURLs();

1:
https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/options/GcpOptions.java#L159
2:
https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/options/GcpOptions.java#L144
3:
https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/options/GcpOptions.java#L159

On Wed, Nov 27, 2019 at 8:19 AM Gleb Kanterov  wrote:

> I didn't think it through, but this is something I have in mind. Keep
> existing implementation for URLClassLoader, and use URLClassLoader for
> experimental support of Java 11.
>
> List urls;
> if (classLoader instanceof URLClassLoader) {
>   urls = Arrays.asList(((URLClassLoader) classLoader).getURLs());
> } else {
>   urls = new ClassGraph()
>   .addClassLoader(classLoader)
>   .getClasspathURLs();
> }
>
> On Wed, Nov 27, 2019 at 4:16 PM Łukasz Gajowy 
> wrote:
>
>> This looks promising. Do you think you could share your code as well?
>>
>> That part sounds very calming:
>> "ClassGraph is fully compatible with the new JPMS module system (Project
>> Jigsaw / JDK 9+), i.e. it can scan both the traditional classpath and the
>> module path. However, the code is also fully backwards compatible with JDK
>> 7 and JDK 8 (i.e. the code is compiled in Java 7 compatibility mode, and
>> all interaction with the module system is implemented via reflection for
>> backwards compatibility)."
>>
>> I'm currently working on rebuilding the classpath detection mechanism so
>> that it scans java.class.path when URLClassLoader cannot be used (as Luke
>> suggested) but if we decide to use classgraph it should be relatively easy
>> to do that instead. Moreover, I want to enable the possibility of injecting
>> any algorithm implementation through pipeline options - this will enable
>> third-party vendors to inject their custom implementations if needed (SPI
>> pattern that was mentioned at some point in a jira ticket). I think I'm
>> pretty close to finishing that.
>>
>> Thanks!
>>
>> śr., 27 lis 2019 o 15:24 Gleb Kanterov  napisał(a):
>>
>>> Today I tried using classgraph [1] library to scan classpath in Java 11
>>> instead of using URLClassLoader, and after that, the job worked on
>>> Dataflow. The logic of scanning classpath is pretty sophisticated [2], and
>>> classgraph doesn't have any dependencies. I'm wondering if we can relocate
>>> it to java-core jar and use in for non-URLClassLoaders?
>>>
>>> [1]: https://github.com/classgraph/classgraph
>>> [2]:
>>> https://github.com/classgraph/classgraph/blob/master/src/main/java/io/github/classgraph/Scanner.java
>>>
>>> On Fri, Nov 8, 2019 at 11:40 PM Luke Cwik  wrote:
>>>
 I believe the closest suggestion[1] we had that worked for Java 11 and
 maintained backwards compatibility was to use the URLClassLoader to infer
 the resources and if we couldn't do that then look at the java.class.path
 system property to do the inference otherwise fail and force the users to
 tell us what. There are too many scenarios where we will do it wrong
 because of how people package and deploy their code whether it is an
 embedded application server or some other application container with a
 security manager that will prevent us from doing the right thing.

 On Fri, Nov 8, 2019 at 10:31 AM Robert Bradshaw 
 wrote:

> Note that resources are more properly tied to specific operations and
> stages, not to the entire pipeline. This is especially true in the
> face of libraries (which should have the ability to declare their own
> resources) and cross-language.
>
> On Fri, Nov 8, 2019 at 10:19 AM Łukasz Gajowy 
> wrote:
> >
> > I figured that it would be good to bump this thread for greater
> visibility even though I don't have a strong opinion about this (yet -
> hopefully, I will know more later to be able to share ;) ).
> >
> > Answering the questions Luke asked will unblock this issue:
> https://issues.apache.org/jira/browse/BEAM-5495. Solving it is needed
> for Java 11 

Re: Detecting resources to stage

2019-11-27 Thread Gleb Kanterov
I didn't think it through, but this is something I have in mind. Keep
existing implementation for URLClassLoader, and use URLClassLoader for
experimental support of Java 11.

List urls;
if (classLoader instanceof URLClassLoader) {
  urls = Arrays.asList(((URLClassLoader) classLoader).getURLs());
} else {
  urls = new ClassGraph()
  .addClassLoader(classLoader)
  .getClasspathURLs();
}

On Wed, Nov 27, 2019 at 4:16 PM Łukasz Gajowy 
wrote:

> This looks promising. Do you think you could share your code as well?
>
> That part sounds very calming:
> "ClassGraph is fully compatible with the new JPMS module system (Project
> Jigsaw / JDK 9+), i.e. it can scan both the traditional classpath and the
> module path. However, the code is also fully backwards compatible with JDK
> 7 and JDK 8 (i.e. the code is compiled in Java 7 compatibility mode, and
> all interaction with the module system is implemented via reflection for
> backwards compatibility)."
>
> I'm currently working on rebuilding the classpath detection mechanism so
> that it scans java.class.path when URLClassLoader cannot be used (as Luke
> suggested) but if we decide to use classgraph it should be relatively easy
> to do that instead. Moreover, I want to enable the possibility of injecting
> any algorithm implementation through pipeline options - this will enable
> third-party vendors to inject their custom implementations if needed (SPI
> pattern that was mentioned at some point in a jira ticket). I think I'm
> pretty close to finishing that.
>
> Thanks!
>
> śr., 27 lis 2019 o 15:24 Gleb Kanterov  napisał(a):
>
>> Today I tried using classgraph [1] library to scan classpath in Java 11
>> instead of using URLClassLoader, and after that, the job worked on
>> Dataflow. The logic of scanning classpath is pretty sophisticated [2], and
>> classgraph doesn't have any dependencies. I'm wondering if we can relocate
>> it to java-core jar and use in for non-URLClassLoaders?
>>
>> [1]: https://github.com/classgraph/classgraph
>> [2]:
>> https://github.com/classgraph/classgraph/blob/master/src/main/java/io/github/classgraph/Scanner.java
>>
>> On Fri, Nov 8, 2019 at 11:40 PM Luke Cwik  wrote:
>>
>>> I believe the closest suggestion[1] we had that worked for Java 11 and
>>> maintained backwards compatibility was to use the URLClassLoader to infer
>>> the resources and if we couldn't do that then look at the java.class.path
>>> system property to do the inference otherwise fail and force the users to
>>> tell us what. There are too many scenarios where we will do it wrong
>>> because of how people package and deploy their code whether it is an
>>> embedded application server or some other application container with a
>>> security manager that will prevent us from doing the right thing.
>>>
>>> On Fri, Nov 8, 2019 at 10:31 AM Robert Bradshaw 
>>> wrote:
>>>
 Note that resources are more properly tied to specific operations and
 stages, not to the entire pipeline. This is especially true in the
 face of libraries (which should have the ability to declare their own
 resources) and cross-language.

 On Fri, Nov 8, 2019 at 10:19 AM Łukasz Gajowy 
 wrote:
 >
 > I figured that it would be good to bump this thread for greater
 visibility even though I don't have a strong opinion about this (yet -
 hopefully, I will know more later to be able to share ;) ).
 >
 > Answering the questions Luke asked will unblock this issue:
 https://issues.apache.org/jira/browse/BEAM-5495. Solving it is needed
 for Java 11 migration (current detecting mechanism does not work with java
 > 8).
 >
 >
 >>
 >> That said letting the user resolve the jars to stage can be saner
 instead of assuming it is in the classpath/loader. I already have a few
 cases where it will fail cause the transforms load the jars from outside
 the app classloader (transforms are isolated).
 >
 >
 >
 > If I understand correctly, at least in Dataflow runner, if users want
 to provide custom resources to stage, they can use filesToStage pipeline
 option. Once the option is not null, the runner doesn't detect the
 resources automatically and stages resources enlisted in the option
 instead. I think this should be the approach common for all runners (if it
 is not the case already).

>>>
>>> Your understanding is correct and consistency across runners for a
>>> pipeline option is good for our users.
>>>
>>>
 >
 > Thanks,
 > Łukasz
 >
 >
 >

>>>
>>> 1: https://github.com/apache/beam/pull/8775
>>>
>>


Re: Detecting resources to stage

2019-11-27 Thread Łukasz Gajowy
This looks promising. Do you think you could share your code as well?

That part sounds very calming:
"ClassGraph is fully compatible with the new JPMS module system (Project
Jigsaw / JDK 9+), i.e. it can scan both the traditional classpath and the
module path. However, the code is also fully backwards compatible with JDK
7 and JDK 8 (i.e. the code is compiled in Java 7 compatibility mode, and
all interaction with the module system is implemented via reflection for
backwards compatibility)."

I'm currently working on rebuilding the classpath detection mechanism so
that it scans java.class.path when URLClassLoader cannot be used (as Luke
suggested) but if we decide to use classgraph it should be relatively easy
to do that instead. Moreover, I want to enable the possibility of injecting
any algorithm implementation through pipeline options - this will enable
third-party vendors to inject their custom implementations if needed (SPI
pattern that was mentioned at some point in a jira ticket). I think I'm
pretty close to finishing that.

Thanks!

śr., 27 lis 2019 o 15:24 Gleb Kanterov  napisał(a):

> Today I tried using classgraph [1] library to scan classpath in Java 11
> instead of using URLClassLoader, and after that, the job worked on
> Dataflow. The logic of scanning classpath is pretty sophisticated [2], and
> classgraph doesn't have any dependencies. I'm wondering if we can relocate
> it to java-core jar and use in for non-URLClassLoaders?
>
> [1]: https://github.com/classgraph/classgraph
> [2]:
> https://github.com/classgraph/classgraph/blob/master/src/main/java/io/github/classgraph/Scanner.java
>
> On Fri, Nov 8, 2019 at 11:40 PM Luke Cwik  wrote:
>
>> I believe the closest suggestion[1] we had that worked for Java 11 and
>> maintained backwards compatibility was to use the URLClassLoader to infer
>> the resources and if we couldn't do that then look at the java.class.path
>> system property to do the inference otherwise fail and force the users to
>> tell us what. There are too many scenarios where we will do it wrong
>> because of how people package and deploy their code whether it is an
>> embedded application server or some other application container with a
>> security manager that will prevent us from doing the right thing.
>>
>> On Fri, Nov 8, 2019 at 10:31 AM Robert Bradshaw 
>> wrote:
>>
>>> Note that resources are more properly tied to specific operations and
>>> stages, not to the entire pipeline. This is especially true in the
>>> face of libraries (which should have the ability to declare their own
>>> resources) and cross-language.
>>>
>>> On Fri, Nov 8, 2019 at 10:19 AM Łukasz Gajowy 
>>> wrote:
>>> >
>>> > I figured that it would be good to bump this thread for greater
>>> visibility even though I don't have a strong opinion about this (yet -
>>> hopefully, I will know more later to be able to share ;) ).
>>> >
>>> > Answering the questions Luke asked will unblock this issue:
>>> https://issues.apache.org/jira/browse/BEAM-5495. Solving it is needed
>>> for Java 11 migration (current detecting mechanism does not work with java
>>> > 8).
>>> >
>>> >
>>> >>
>>> >> That said letting the user resolve the jars to stage can be saner
>>> instead of assuming it is in the classpath/loader. I already have a few
>>> cases where it will fail cause the transforms load the jars from outside
>>> the app classloader (transforms are isolated).
>>> >
>>> >
>>> >
>>> > If I understand correctly, at least in Dataflow runner, if users want
>>> to provide custom resources to stage, they can use filesToStage pipeline
>>> option. Once the option is not null, the runner doesn't detect the
>>> resources automatically and stages resources enlisted in the option
>>> instead. I think this should be the approach common for all runners (if it
>>> is not the case already).
>>>
>>
>> Your understanding is correct and consistency across runners for a
>> pipeline option is good for our users.
>>
>>
>>> >
>>> > Thanks,
>>> > Łukasz
>>> >
>>> >
>>> >
>>>
>>
>> 1: https://github.com/apache/beam/pull/8775
>>
>


Re: Detecting resources to stage

2019-11-27 Thread Gleb Kanterov
Today I tried using classgraph [1] library to scan classpath in Java 11
instead of using URLClassLoader, and after that, the job worked on
Dataflow. The logic of scanning classpath is pretty sophisticated [2], and
classgraph doesn't have any dependencies. I'm wondering if we can relocate
it to java-core jar and use in for non-URLClassLoaders?

[1]: https://github.com/classgraph/classgraph
[2]:
https://github.com/classgraph/classgraph/blob/master/src/main/java/io/github/classgraph/Scanner.java

On Fri, Nov 8, 2019 at 11:40 PM Luke Cwik  wrote:

> I believe the closest suggestion[1] we had that worked for Java 11 and
> maintained backwards compatibility was to use the URLClassLoader to infer
> the resources and if we couldn't do that then look at the java.class.path
> system property to do the inference otherwise fail and force the users to
> tell us what. There are too many scenarios where we will do it wrong
> because of how people package and deploy their code whether it is an
> embedded application server or some other application container with a
> security manager that will prevent us from doing the right thing.
>
> On Fri, Nov 8, 2019 at 10:31 AM Robert Bradshaw 
> wrote:
>
>> Note that resources are more properly tied to specific operations and
>> stages, not to the entire pipeline. This is especially true in the
>> face of libraries (which should have the ability to declare their own
>> resources) and cross-language.
>>
>> On Fri, Nov 8, 2019 at 10:19 AM Łukasz Gajowy  wrote:
>> >
>> > I figured that it would be good to bump this thread for greater
>> visibility even though I don't have a strong opinion about this (yet -
>> hopefully, I will know more later to be able to share ;) ).
>> >
>> > Answering the questions Luke asked will unblock this issue:
>> https://issues.apache.org/jira/browse/BEAM-5495. Solving it is needed
>> for Java 11 migration (current detecting mechanism does not work with java
>> > 8).
>> >
>> >
>> >>
>> >> That said letting the user resolve the jars to stage can be saner
>> instead of assuming it is in the classpath/loader. I already have a few
>> cases where it will fail cause the transforms load the jars from outside
>> the app classloader (transforms are isolated).
>> >
>> >
>> >
>> > If I understand correctly, at least in Dataflow runner, if users want
>> to provide custom resources to stage, they can use filesToStage pipeline
>> option. Once the option is not null, the runner doesn't detect the
>> resources automatically and stages resources enlisted in the option
>> instead. I think this should be the approach common for all runners (if it
>> is not the case already).
>>
>
> Your understanding is correct and consistency across runners for a
> pipeline option is good for our users.
>
>
>> >
>> > Thanks,
>> > Łukasz
>> >
>> >
>> >
>>
>
> 1: https://github.com/apache/beam/pull/8775
>


Re: Detecting resources to stage

2019-11-08 Thread Luke Cwik
I believe the closest suggestion[1] we had that worked for Java 11 and
maintained backwards compatibility was to use the URLClassLoader to infer
the resources and if we couldn't do that then look at the java.class.path
system property to do the inference otherwise fail and force the users to
tell us what. There are too many scenarios where we will do it wrong
because of how people package and deploy their code whether it is an
embedded application server or some other application container with a
security manager that will prevent us from doing the right thing.

On Fri, Nov 8, 2019 at 10:31 AM Robert Bradshaw  wrote:

> Note that resources are more properly tied to specific operations and
> stages, not to the entire pipeline. This is especially true in the
> face of libraries (which should have the ability to declare their own
> resources) and cross-language.
>
> On Fri, Nov 8, 2019 at 10:19 AM Łukasz Gajowy  wrote:
> >
> > I figured that it would be good to bump this thread for greater
> visibility even though I don't have a strong opinion about this (yet -
> hopefully, I will know more later to be able to share ;) ).
> >
> > Answering the questions Luke asked will unblock this issue:
> https://issues.apache.org/jira/browse/BEAM-5495. Solving it is needed for
> Java 11 migration (current detecting mechanism does not work with java > 8).
> >
> >
> >>
> >> That said letting the user resolve the jars to stage can be saner
> instead of assuming it is in the classpath/loader. I already have a few
> cases where it will fail cause the transforms load the jars from outside
> the app classloader (transforms are isolated).
> >
> >
> >
> > If I understand correctly, at least in Dataflow runner, if users want to
> provide custom resources to stage, they can use filesToStage pipeline
> option. Once the option is not null, the runner doesn't detect the
> resources automatically and stages resources enlisted in the option
> instead. I think this should be the approach common for all runners (if it
> is not the case already).
>

Your understanding is correct and consistency across runners for a pipeline
option is good for our users.


> >
> > Thanks,
> > Łukasz
> >
> >
> >
>

1: https://github.com/apache/beam/pull/8775


Re: Detecting resources to stage

2019-11-08 Thread Robert Bradshaw
Note that resources are more properly tied to specific operations and
stages, not to the entire pipeline. This is especially true in the
face of libraries (which should have the ability to declare their own
resources) and cross-language.

On Fri, Nov 8, 2019 at 10:19 AM Łukasz Gajowy  wrote:
>
> I figured that it would be good to bump this thread for greater visibility 
> even though I don't have a strong opinion about this (yet - hopefully, I will 
> know more later to be able to share ;) ).
>
> Answering the questions Luke asked will unblock this issue: 
> https://issues.apache.org/jira/browse/BEAM-5495. Solving it is needed for 
> Java 11 migration (current detecting mechanism does not work with java > 8).
>
>
>>
>> That said letting the user resolve the jars to stage can be saner instead of 
>> assuming it is in the classpath/loader. I already have a few cases where it 
>> will fail cause the transforms load the jars from outside the app 
>> classloader (transforms are isolated).
>
>
>
> If I understand correctly, at least in Dataflow runner, if users want to 
> provide custom resources to stage, they can use filesToStage pipeline option. 
> Once the option is not null, the runner doesn't detect the resources 
> automatically and stages resources enlisted in the option instead. I think 
> this should be the approach common for all runners (if it is not the case 
> already).
>
> Thanks,
> Łukasz
>
>
>


Re: Detecting resources to stage

2019-11-08 Thread Łukasz Gajowy
I figured that it would be good to bump this thread for greater visibility
even though I don't have a strong opinion about this (yet - hopefully, I
will know more later to be able to share ;) ).

Answering the questions Luke asked will unblock this issue:
https://issues.apache.org/jira/browse/BEAM-5495. Solving it is needed for
Java 11 migration (current detecting mechanism does not work with java >
8).



> That said letting the user resolve the jars to stage can be saner instead
> of assuming it is in the classpath/loader. I already have a few cases where
> it will fail cause the transforms load the jars from outside the app
> classloader (transforms are isolated).
>


If I understand correctly, at least in Dataflow runner, if users want to
provide custom resources to stage, they can use filesToStage pipeline
option. Once the option is not null, the runner doesn't detect the
resources automatically and stages resources enlisted in the option
instead. I think this should be the approach common for all runners (if it
is not the case already).

Thanks,
Łukasz