Re: Detecting resources to stage
Agree, this makes sense. On Wed, Nov 27, 2019 at 6:23 PM Luke Cwik wrote: > That looks good as well. > > I would suggest that we make the classpath scanning system pluggable using > PipelineOptions. For example in GcpOptions[1], we use two default instance > factories. The first one controls which class is used as the factory[2] and > the second one instantiates an instance of that class and creates the > credential[3]. The same strategy could be added where there is a default > instance factory for the set of resources and another option which controls > which class is instantiated to provide that default. > > Do you think that we could make the default always: > new ClassGraph() > .addClassLoader(classLoader) > .getClasspathURLs(); > > 1: > https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/options/GcpOptions.java#L159 > 2: > https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/options/GcpOptions.java#L144 > 3: > https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/options/GcpOptions.java#L159 > > On Wed, Nov 27, 2019 at 8:19 AM Gleb Kanterov wrote: > >> I didn't think it through, but this is something I have in mind. Keep >> existing implementation for URLClassLoader, and use URLClassLoader for >> experimental support of Java 11. >> >> List urls; >> if (classLoader instanceof URLClassLoader) { >> urls = Arrays.asList(((URLClassLoader) classLoader).getURLs()); >> } else { >> urls = new ClassGraph() >> .addClassLoader(classLoader) >> .getClasspathURLs(); >> } >> >> On Wed, Nov 27, 2019 at 4:16 PM Łukasz Gajowy >> wrote: >> >>> This looks promising. Do you think you could share your code as well? >>> >>> That part sounds very calming: >>> "ClassGraph is fully compatible with the new JPMS module system (Project >>> Jigsaw / JDK 9+), i.e. it can scan both the traditional classpath and the >>> module path. However, the code is also fully backwards compatible with JDK >>> 7 and JDK 8 (i.e. the code is compiled in Java 7 compatibility mode, and >>> all interaction with the module system is implemented via reflection for >>> backwards compatibility)." >>> >>> I'm currently working on rebuilding the classpath detection mechanism so >>> that it scans java.class.path when URLClassLoader cannot be used (as Luke >>> suggested) but if we decide to use classgraph it should be relatively easy >>> to do that instead. Moreover, I want to enable the possibility of injecting >>> any algorithm implementation through pipeline options - this will enable >>> third-party vendors to inject their custom implementations if needed (SPI >>> pattern that was mentioned at some point in a jira ticket). I think I'm >>> pretty close to finishing that. >>> >>> Thanks! >>> >>> śr., 27 lis 2019 o 15:24 Gleb Kanterov napisał(a): >>> Today I tried using classgraph [1] library to scan classpath in Java 11 instead of using URLClassLoader, and after that, the job worked on Dataflow. The logic of scanning classpath is pretty sophisticated [2], and classgraph doesn't have any dependencies. I'm wondering if we can relocate it to java-core jar and use in for non-URLClassLoaders? [1]: https://github.com/classgraph/classgraph [2]: https://github.com/classgraph/classgraph/blob/master/src/main/java/io/github/classgraph/Scanner.java On Fri, Nov 8, 2019 at 11:40 PM Luke Cwik wrote: > I believe the closest suggestion[1] we had that worked for Java 11 and > maintained backwards compatibility was to use the URLClassLoader to infer > the resources and if we couldn't do that then look at the java.class.path > system property to do the inference otherwise fail and force the users to > tell us what. There are too many scenarios where we will do it wrong > because of how people package and deploy their code whether it is an > embedded application server or some other application container with a > security manager that will prevent us from doing the right thing. > > On Fri, Nov 8, 2019 at 10:31 AM Robert Bradshaw > wrote: > >> Note that resources are more properly tied to specific operations and >> stages, not to the entire pipeline. This is especially true in the >> face of libraries (which should have the ability to declare their own >> resources) and cross-language. >> >> On Fri, Nov 8, 2019 at 10:19 AM Łukasz Gajowy >> wrote: >> > >> > I figured that it would be good to bump this thread for greater >> visibility even though I don't have a strong opinion about this (yet - >> hopefully, I will know more later to
Re: Detecting resources to stage
That looks good as well. I would suggest that we make the classpath scanning system pluggable using PipelineOptions. For example in GcpOptions[1], we use two default instance factories. The first one controls which class is used as the factory[2] and the second one instantiates an instance of that class and creates the credential[3]. The same strategy could be added where there is a default instance factory for the set of resources and another option which controls which class is instantiated to provide that default. Do you think that we could make the default always: new ClassGraph() .addClassLoader(classLoader) .getClasspathURLs(); 1: https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/options/GcpOptions.java#L159 2: https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/options/GcpOptions.java#L144 3: https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/options/GcpOptions.java#L159 On Wed, Nov 27, 2019 at 8:19 AM Gleb Kanterov wrote: > I didn't think it through, but this is something I have in mind. Keep > existing implementation for URLClassLoader, and use URLClassLoader for > experimental support of Java 11. > > List urls; > if (classLoader instanceof URLClassLoader) { > urls = Arrays.asList(((URLClassLoader) classLoader).getURLs()); > } else { > urls = new ClassGraph() > .addClassLoader(classLoader) > .getClasspathURLs(); > } > > On Wed, Nov 27, 2019 at 4:16 PM Łukasz Gajowy > wrote: > >> This looks promising. Do you think you could share your code as well? >> >> That part sounds very calming: >> "ClassGraph is fully compatible with the new JPMS module system (Project >> Jigsaw / JDK 9+), i.e. it can scan both the traditional classpath and the >> module path. However, the code is also fully backwards compatible with JDK >> 7 and JDK 8 (i.e. the code is compiled in Java 7 compatibility mode, and >> all interaction with the module system is implemented via reflection for >> backwards compatibility)." >> >> I'm currently working on rebuilding the classpath detection mechanism so >> that it scans java.class.path when URLClassLoader cannot be used (as Luke >> suggested) but if we decide to use classgraph it should be relatively easy >> to do that instead. Moreover, I want to enable the possibility of injecting >> any algorithm implementation through pipeline options - this will enable >> third-party vendors to inject their custom implementations if needed (SPI >> pattern that was mentioned at some point in a jira ticket). I think I'm >> pretty close to finishing that. >> >> Thanks! >> >> śr., 27 lis 2019 o 15:24 Gleb Kanterov napisał(a): >> >>> Today I tried using classgraph [1] library to scan classpath in Java 11 >>> instead of using URLClassLoader, and after that, the job worked on >>> Dataflow. The logic of scanning classpath is pretty sophisticated [2], and >>> classgraph doesn't have any dependencies. I'm wondering if we can relocate >>> it to java-core jar and use in for non-URLClassLoaders? >>> >>> [1]: https://github.com/classgraph/classgraph >>> [2]: >>> https://github.com/classgraph/classgraph/blob/master/src/main/java/io/github/classgraph/Scanner.java >>> >>> On Fri, Nov 8, 2019 at 11:40 PM Luke Cwik wrote: >>> I believe the closest suggestion[1] we had that worked for Java 11 and maintained backwards compatibility was to use the URLClassLoader to infer the resources and if we couldn't do that then look at the java.class.path system property to do the inference otherwise fail and force the users to tell us what. There are too many scenarios where we will do it wrong because of how people package and deploy their code whether it is an embedded application server or some other application container with a security manager that will prevent us from doing the right thing. On Fri, Nov 8, 2019 at 10:31 AM Robert Bradshaw wrote: > Note that resources are more properly tied to specific operations and > stages, not to the entire pipeline. This is especially true in the > face of libraries (which should have the ability to declare their own > resources) and cross-language. > > On Fri, Nov 8, 2019 at 10:19 AM Łukasz Gajowy > wrote: > > > > I figured that it would be good to bump this thread for greater > visibility even though I don't have a strong opinion about this (yet - > hopefully, I will know more later to be able to share ;) ). > > > > Answering the questions Luke asked will unblock this issue: > https://issues.apache.org/jira/browse/BEAM-5495. Solving it is needed > for Java 11
Re: Detecting resources to stage
I didn't think it through, but this is something I have in mind. Keep existing implementation for URLClassLoader, and use URLClassLoader for experimental support of Java 11. List urls; if (classLoader instanceof URLClassLoader) { urls = Arrays.asList(((URLClassLoader) classLoader).getURLs()); } else { urls = new ClassGraph() .addClassLoader(classLoader) .getClasspathURLs(); } On Wed, Nov 27, 2019 at 4:16 PM Łukasz Gajowy wrote: > This looks promising. Do you think you could share your code as well? > > That part sounds very calming: > "ClassGraph is fully compatible with the new JPMS module system (Project > Jigsaw / JDK 9+), i.e. it can scan both the traditional classpath and the > module path. However, the code is also fully backwards compatible with JDK > 7 and JDK 8 (i.e. the code is compiled in Java 7 compatibility mode, and > all interaction with the module system is implemented via reflection for > backwards compatibility)." > > I'm currently working on rebuilding the classpath detection mechanism so > that it scans java.class.path when URLClassLoader cannot be used (as Luke > suggested) but if we decide to use classgraph it should be relatively easy > to do that instead. Moreover, I want to enable the possibility of injecting > any algorithm implementation through pipeline options - this will enable > third-party vendors to inject their custom implementations if needed (SPI > pattern that was mentioned at some point in a jira ticket). I think I'm > pretty close to finishing that. > > Thanks! > > śr., 27 lis 2019 o 15:24 Gleb Kanterov napisał(a): > >> Today I tried using classgraph [1] library to scan classpath in Java 11 >> instead of using URLClassLoader, and after that, the job worked on >> Dataflow. The logic of scanning classpath is pretty sophisticated [2], and >> classgraph doesn't have any dependencies. I'm wondering if we can relocate >> it to java-core jar and use in for non-URLClassLoaders? >> >> [1]: https://github.com/classgraph/classgraph >> [2]: >> https://github.com/classgraph/classgraph/blob/master/src/main/java/io/github/classgraph/Scanner.java >> >> On Fri, Nov 8, 2019 at 11:40 PM Luke Cwik wrote: >> >>> I believe the closest suggestion[1] we had that worked for Java 11 and >>> maintained backwards compatibility was to use the URLClassLoader to infer >>> the resources and if we couldn't do that then look at the java.class.path >>> system property to do the inference otherwise fail and force the users to >>> tell us what. There are too many scenarios where we will do it wrong >>> because of how people package and deploy their code whether it is an >>> embedded application server or some other application container with a >>> security manager that will prevent us from doing the right thing. >>> >>> On Fri, Nov 8, 2019 at 10:31 AM Robert Bradshaw >>> wrote: >>> Note that resources are more properly tied to specific operations and stages, not to the entire pipeline. This is especially true in the face of libraries (which should have the ability to declare their own resources) and cross-language. On Fri, Nov 8, 2019 at 10:19 AM Łukasz Gajowy wrote: > > I figured that it would be good to bump this thread for greater visibility even though I don't have a strong opinion about this (yet - hopefully, I will know more later to be able to share ;) ). > > Answering the questions Luke asked will unblock this issue: https://issues.apache.org/jira/browse/BEAM-5495. Solving it is needed for Java 11 migration (current detecting mechanism does not work with java > 8). > > >> >> That said letting the user resolve the jars to stage can be saner instead of assuming it is in the classpath/loader. I already have a few cases where it will fail cause the transforms load the jars from outside the app classloader (transforms are isolated). > > > > If I understand correctly, at least in Dataflow runner, if users want to provide custom resources to stage, they can use filesToStage pipeline option. Once the option is not null, the runner doesn't detect the resources automatically and stages resources enlisted in the option instead. I think this should be the approach common for all runners (if it is not the case already). >>> >>> Your understanding is correct and consistency across runners for a >>> pipeline option is good for our users. >>> >>> > > Thanks, > Łukasz > > > >>> >>> 1: https://github.com/apache/beam/pull/8775 >>> >>
Re: Detecting resources to stage
This looks promising. Do you think you could share your code as well? That part sounds very calming: "ClassGraph is fully compatible with the new JPMS module system (Project Jigsaw / JDK 9+), i.e. it can scan both the traditional classpath and the module path. However, the code is also fully backwards compatible with JDK 7 and JDK 8 (i.e. the code is compiled in Java 7 compatibility mode, and all interaction with the module system is implemented via reflection for backwards compatibility)." I'm currently working on rebuilding the classpath detection mechanism so that it scans java.class.path when URLClassLoader cannot be used (as Luke suggested) but if we decide to use classgraph it should be relatively easy to do that instead. Moreover, I want to enable the possibility of injecting any algorithm implementation through pipeline options - this will enable third-party vendors to inject their custom implementations if needed (SPI pattern that was mentioned at some point in a jira ticket). I think I'm pretty close to finishing that. Thanks! śr., 27 lis 2019 o 15:24 Gleb Kanterov napisał(a): > Today I tried using classgraph [1] library to scan classpath in Java 11 > instead of using URLClassLoader, and after that, the job worked on > Dataflow. The logic of scanning classpath is pretty sophisticated [2], and > classgraph doesn't have any dependencies. I'm wondering if we can relocate > it to java-core jar and use in for non-URLClassLoaders? > > [1]: https://github.com/classgraph/classgraph > [2]: > https://github.com/classgraph/classgraph/blob/master/src/main/java/io/github/classgraph/Scanner.java > > On Fri, Nov 8, 2019 at 11:40 PM Luke Cwik wrote: > >> I believe the closest suggestion[1] we had that worked for Java 11 and >> maintained backwards compatibility was to use the URLClassLoader to infer >> the resources and if we couldn't do that then look at the java.class.path >> system property to do the inference otherwise fail and force the users to >> tell us what. There are too many scenarios where we will do it wrong >> because of how people package and deploy their code whether it is an >> embedded application server or some other application container with a >> security manager that will prevent us from doing the right thing. >> >> On Fri, Nov 8, 2019 at 10:31 AM Robert Bradshaw >> wrote: >> >>> Note that resources are more properly tied to specific operations and >>> stages, not to the entire pipeline. This is especially true in the >>> face of libraries (which should have the ability to declare their own >>> resources) and cross-language. >>> >>> On Fri, Nov 8, 2019 at 10:19 AM Łukasz Gajowy >>> wrote: >>> > >>> > I figured that it would be good to bump this thread for greater >>> visibility even though I don't have a strong opinion about this (yet - >>> hopefully, I will know more later to be able to share ;) ). >>> > >>> > Answering the questions Luke asked will unblock this issue: >>> https://issues.apache.org/jira/browse/BEAM-5495. Solving it is needed >>> for Java 11 migration (current detecting mechanism does not work with java >>> > 8). >>> > >>> > >>> >> >>> >> That said letting the user resolve the jars to stage can be saner >>> instead of assuming it is in the classpath/loader. I already have a few >>> cases where it will fail cause the transforms load the jars from outside >>> the app classloader (transforms are isolated). >>> > >>> > >>> > >>> > If I understand correctly, at least in Dataflow runner, if users want >>> to provide custom resources to stage, they can use filesToStage pipeline >>> option. Once the option is not null, the runner doesn't detect the >>> resources automatically and stages resources enlisted in the option >>> instead. I think this should be the approach common for all runners (if it >>> is not the case already). >>> >> >> Your understanding is correct and consistency across runners for a >> pipeline option is good for our users. >> >> >>> > >>> > Thanks, >>> > Łukasz >>> > >>> > >>> > >>> >> >> 1: https://github.com/apache/beam/pull/8775 >> >
Re: Detecting resources to stage
Today I tried using classgraph [1] library to scan classpath in Java 11 instead of using URLClassLoader, and after that, the job worked on Dataflow. The logic of scanning classpath is pretty sophisticated [2], and classgraph doesn't have any dependencies. I'm wondering if we can relocate it to java-core jar and use in for non-URLClassLoaders? [1]: https://github.com/classgraph/classgraph [2]: https://github.com/classgraph/classgraph/blob/master/src/main/java/io/github/classgraph/Scanner.java On Fri, Nov 8, 2019 at 11:40 PM Luke Cwik wrote: > I believe the closest suggestion[1] we had that worked for Java 11 and > maintained backwards compatibility was to use the URLClassLoader to infer > the resources and if we couldn't do that then look at the java.class.path > system property to do the inference otherwise fail and force the users to > tell us what. There are too many scenarios where we will do it wrong > because of how people package and deploy their code whether it is an > embedded application server or some other application container with a > security manager that will prevent us from doing the right thing. > > On Fri, Nov 8, 2019 at 10:31 AM Robert Bradshaw > wrote: > >> Note that resources are more properly tied to specific operations and >> stages, not to the entire pipeline. This is especially true in the >> face of libraries (which should have the ability to declare their own >> resources) and cross-language. >> >> On Fri, Nov 8, 2019 at 10:19 AM Łukasz Gajowy wrote: >> > >> > I figured that it would be good to bump this thread for greater >> visibility even though I don't have a strong opinion about this (yet - >> hopefully, I will know more later to be able to share ;) ). >> > >> > Answering the questions Luke asked will unblock this issue: >> https://issues.apache.org/jira/browse/BEAM-5495. Solving it is needed >> for Java 11 migration (current detecting mechanism does not work with java >> > 8). >> > >> > >> >> >> >> That said letting the user resolve the jars to stage can be saner >> instead of assuming it is in the classpath/loader. I already have a few >> cases where it will fail cause the transforms load the jars from outside >> the app classloader (transforms are isolated). >> > >> > >> > >> > If I understand correctly, at least in Dataflow runner, if users want >> to provide custom resources to stage, they can use filesToStage pipeline >> option. Once the option is not null, the runner doesn't detect the >> resources automatically and stages resources enlisted in the option >> instead. I think this should be the approach common for all runners (if it >> is not the case already). >> > > Your understanding is correct and consistency across runners for a > pipeline option is good for our users. > > >> > >> > Thanks, >> > Łukasz >> > >> > >> > >> > > 1: https://github.com/apache/beam/pull/8775 >
Re: Detecting resources to stage
I believe the closest suggestion[1] we had that worked for Java 11 and maintained backwards compatibility was to use the URLClassLoader to infer the resources and if we couldn't do that then look at the java.class.path system property to do the inference otherwise fail and force the users to tell us what. There are too many scenarios where we will do it wrong because of how people package and deploy their code whether it is an embedded application server or some other application container with a security manager that will prevent us from doing the right thing. On Fri, Nov 8, 2019 at 10:31 AM Robert Bradshaw wrote: > Note that resources are more properly tied to specific operations and > stages, not to the entire pipeline. This is especially true in the > face of libraries (which should have the ability to declare their own > resources) and cross-language. > > On Fri, Nov 8, 2019 at 10:19 AM Łukasz Gajowy wrote: > > > > I figured that it would be good to bump this thread for greater > visibility even though I don't have a strong opinion about this (yet - > hopefully, I will know more later to be able to share ;) ). > > > > Answering the questions Luke asked will unblock this issue: > https://issues.apache.org/jira/browse/BEAM-5495. Solving it is needed for > Java 11 migration (current detecting mechanism does not work with java > 8). > > > > > >> > >> That said letting the user resolve the jars to stage can be saner > instead of assuming it is in the classpath/loader. I already have a few > cases where it will fail cause the transforms load the jars from outside > the app classloader (transforms are isolated). > > > > > > > > If I understand correctly, at least in Dataflow runner, if users want to > provide custom resources to stage, they can use filesToStage pipeline > option. Once the option is not null, the runner doesn't detect the > resources automatically and stages resources enlisted in the option > instead. I think this should be the approach common for all runners (if it > is not the case already). > Your understanding is correct and consistency across runners for a pipeline option is good for our users. > > > > Thanks, > > Łukasz > > > > > > > 1: https://github.com/apache/beam/pull/8775
Re: Detecting resources to stage
Note that resources are more properly tied to specific operations and stages, not to the entire pipeline. This is especially true in the face of libraries (which should have the ability to declare their own resources) and cross-language. On Fri, Nov 8, 2019 at 10:19 AM Łukasz Gajowy wrote: > > I figured that it would be good to bump this thread for greater visibility > even though I don't have a strong opinion about this (yet - hopefully, I will > know more later to be able to share ;) ). > > Answering the questions Luke asked will unblock this issue: > https://issues.apache.org/jira/browse/BEAM-5495. Solving it is needed for > Java 11 migration (current detecting mechanism does not work with java > 8). > > >> >> That said letting the user resolve the jars to stage can be saner instead of >> assuming it is in the classpath/loader. I already have a few cases where it >> will fail cause the transforms load the jars from outside the app >> classloader (transforms are isolated). > > > > If I understand correctly, at least in Dataflow runner, if users want to > provide custom resources to stage, they can use filesToStage pipeline option. > Once the option is not null, the runner doesn't detect the resources > automatically and stages resources enlisted in the option instead. I think > this should be the approach common for all runners (if it is not the case > already). > > Thanks, > Łukasz > > >
Re: Detecting resources to stage
I figured that it would be good to bump this thread for greater visibility even though I don't have a strong opinion about this (yet - hopefully, I will know more later to be able to share ;) ). Answering the questions Luke asked will unblock this issue: https://issues.apache.org/jira/browse/BEAM-5495. Solving it is needed for Java 11 migration (current detecting mechanism does not work with java > 8). > That said letting the user resolve the jars to stage can be saner instead > of assuming it is in the classpath/loader. I already have a few cases where > it will fail cause the transforms load the jars from outside the app > classloader (transforms are isolated). > If I understand correctly, at least in Dataflow runner, if users want to provide custom resources to stage, they can use filesToStage pipeline option. Once the option is not null, the runner doesn't detect the resources automatically and stages resources enlisted in the option instead. I think this should be the approach common for all runners (if it is not the case already). Thanks, Łukasz