Re: [Portability] Turn off artifact staging?
Ah didn't see your pull request yet Thomas. Will take a look later. On Mon, Nov 25, 2019 at 10:23 AM Thomas Weise wrote: > Thanks, I would prefer to solve this in a way where the user does not need > to configure anything extra though. > > > On Mon, Nov 25, 2019 at 10:21 AM Kyle Weaver wrote: > >> When we added the class loader artifact stager, we introduced artifact >> retrieval service type as a pipeline option. It would make sense to put a >> "none" option there. >> >> >> https://github.com/apache/beam/blob/5fd93af49e6cb86ff52b20f103371df7e0447b7f/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PortablePipelineOptions.java#L107 >> >> RetrievalServiceType getRetrievalServiceType(); >> >> >> On Mon, Nov 25, 2019 at 10:05 AM Robert Bradshaw >> wrote: >> >>> boot.go could be updated to recognize NO_ARTIFACTS_STAGED_TOKEN as >>> well. (Should this constant be put in a common location?) >>> >>> On Sat, Nov 23, 2019 at 9:16 AM Thomas Weise wrote: >>> > >>> > JIRA: https://issues.apache.org/jira/browse/BEAM-8815 >>> > >>> > >>> > On Fri, Nov 22, 2019 at 5:31 PM Thomas Weise wrote: >>> >> >>> >> I'm running into the issue Kyle points out when I try to run a >>> pipeline that does not use artifact staging: >>> >> >>> >> 2019-11-23 01:09:18,442 WARN >>> org.apache.beam.runners.fnexecution.artifact.AbstractArtifactRetrievalService >>> - GetManifest for >>> /tmp/beam-artifact-staging/job_53cad419-a8c0-472c-8486-f795cc88a80f/MANIFEST >>> failed. >>> >> java.util.concurrent.ExecutionException: >>> java.io.FileNotFoundException: >>> /tmp/beam-artifact-staging/job_53cad419-a8c0-472c-8486-f795cc88a80f/MANIFEST >>> (No such file or directory) >>> >> at >>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:531) >>> >> at >>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:492) >>> >> at >>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:83) >>> >> at >>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:196) >>> >> at >>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2312) >>> >> >>> >> This happens when I use /opt/apache/beam/boot to start the worker in >>> process environment, as it will attempt to retrieve artifacts. The same >>> would be the case for worker pool also. >>> >> >>> >> Thomas >>> >> >>> >> >>> >> On Tue, Nov 12, 2019 at 5:07 PM Robert Bradshaw >>> wrote: >>> >>> >>> >>> FWIW, there are also discussions of adding a preparation phase for >>> sdk >>> >>> harness (docker) images, such that artifacts could be staged (and >>> >>> installed, compiled etc.) ahead of time and shipped as part of the >>> sdk >>> >>> image rather than via a side channel (and on every worker). Anyone >>> not >>> >>> using these images is probably shipping dependencies in another way >>> >>> anyways. >>> >>> >>> >>> On Tue, Nov 12, 2019 at 5:03 PM Robert Bradshaw >>> wrote: >>> >>> > >>> >>> > Certainly there's a lot to be re-thought in terms of artifact >>> staging, >>> >>> > especially when it comes to cross-langauge pipelines. I think it >>> would >>> >>> > makes sense to have a special retrieval token for the "empty" >>> >>> > manifest, which would mean a staging directory would never have to >>> be >>> >>> > set up if no artifacts happened to be staged. >>> >>> > >>> >>> > The UberJar avoids any artifact staging overhead as well. >>> >>> > >>> >>> > On Tue, Nov 12, 2019 at 3:30 PM Kyle Weaver >>> wrote: >>> >>> > > >>> >>> > > Hi Beamers, >>> >>> > > >>> >>> > > We can use artifact staging to make sure SDK workers have access >>> to a pipeline's dependencies. However, artifact staging is not always >>> necessary. For example, one can make sure that the environment contains all >>> the dependencies ahead of time. However, regardless of whether or not >>> artifacts are used, my understanding is an artifact manifest will be >>> written and read anyway. For example: >>> >>> > > >>> >>> > > INFO AbstractArtifactRetrievalService: GetManifest for >>> /tmp/beam-artifact-staging/.../MANIFEST -> 0 artifacts >>> >>> > > >>> >>> > > This can be a hassle, because users must set up a staging >>> directory that all workers can access, even if it isn't used aside from the >>> (empty) manifest [1]. Thomas mentioned that at Lyft they bypass artifact >>> staging altogether [2]. So I was wondering, do you all think it would be >>> reasonable or useful to create an "off switch" for artifact staging? >>> >>> > > >>> >>> > > Thanks, >>> >>> > > Kyle >>> >>> > > >>> >>> > > [1] >>> https://lists.apache.org/thread.html/d293b4158f266be1cb6c99c968535706f491fdfcd4bb20c4e30939bb@%3Cdev.beam.apache.org%3E >>> >>> > > [2] >>> https://issues.apache.org/jira/browse/BEAM-5187?focusedCom
Re: [Portability] Turn off artifact staging?
Thanks, I would prefer to solve this in a way where the user does not need to configure anything extra though. On Mon, Nov 25, 2019 at 10:21 AM Kyle Weaver wrote: > When we added the class loader artifact stager, we introduced artifact > retrieval service type as a pipeline option. It would make sense to put a > "none" option there. > > > https://github.com/apache/beam/blob/5fd93af49e6cb86ff52b20f103371df7e0447b7f/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PortablePipelineOptions.java#L107 > > RetrievalServiceType getRetrievalServiceType(); > > > On Mon, Nov 25, 2019 at 10:05 AM Robert Bradshaw > wrote: > >> boot.go could be updated to recognize NO_ARTIFACTS_STAGED_TOKEN as >> well. (Should this constant be put in a common location?) >> >> On Sat, Nov 23, 2019 at 9:16 AM Thomas Weise wrote: >> > >> > JIRA: https://issues.apache.org/jira/browse/BEAM-8815 >> > >> > >> > On Fri, Nov 22, 2019 at 5:31 PM Thomas Weise wrote: >> >> >> >> I'm running into the issue Kyle points out when I try to run a >> pipeline that does not use artifact staging: >> >> >> >> 2019-11-23 01:09:18,442 WARN >> org.apache.beam.runners.fnexecution.artifact.AbstractArtifactRetrievalService >> - GetManifest for >> /tmp/beam-artifact-staging/job_53cad419-a8c0-472c-8486-f795cc88a80f/MANIFEST >> failed. >> >> java.util.concurrent.ExecutionException: >> java.io.FileNotFoundException: >> /tmp/beam-artifact-staging/job_53cad419-a8c0-472c-8486-f795cc88a80f/MANIFEST >> (No such file or directory) >> >> at >> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:531) >> >> at >> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:492) >> >> at >> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:83) >> >> at >> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:196) >> >> at >> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2312) >> >> >> >> This happens when I use /opt/apache/beam/boot to start the worker in >> process environment, as it will attempt to retrieve artifacts. The same >> would be the case for worker pool also. >> >> >> >> Thomas >> >> >> >> >> >> On Tue, Nov 12, 2019 at 5:07 PM Robert Bradshaw >> wrote: >> >>> >> >>> FWIW, there are also discussions of adding a preparation phase for sdk >> >>> harness (docker) images, such that artifacts could be staged (and >> >>> installed, compiled etc.) ahead of time and shipped as part of the sdk >> >>> image rather than via a side channel (and on every worker). Anyone not >> >>> using these images is probably shipping dependencies in another way >> >>> anyways. >> >>> >> >>> On Tue, Nov 12, 2019 at 5:03 PM Robert Bradshaw >> wrote: >> >>> > >> >>> > Certainly there's a lot to be re-thought in terms of artifact >> staging, >> >>> > especially when it comes to cross-langauge pipelines. I think it >> would >> >>> > makes sense to have a special retrieval token for the "empty" >> >>> > manifest, which would mean a staging directory would never have to >> be >> >>> > set up if no artifacts happened to be staged. >> >>> > >> >>> > The UberJar avoids any artifact staging overhead as well. >> >>> > >> >>> > On Tue, Nov 12, 2019 at 3:30 PM Kyle Weaver >> wrote: >> >>> > > >> >>> > > Hi Beamers, >> >>> > > >> >>> > > We can use artifact staging to make sure SDK workers have access >> to a pipeline's dependencies. However, artifact staging is not always >> necessary. For example, one can make sure that the environment contains all >> the dependencies ahead of time. However, regardless of whether or not >> artifacts are used, my understanding is an artifact manifest will be >> written and read anyway. For example: >> >>> > > >> >>> > > INFO AbstractArtifactRetrievalService: GetManifest for >> /tmp/beam-artifact-staging/.../MANIFEST -> 0 artifacts >> >>> > > >> >>> > > This can be a hassle, because users must set up a staging >> directory that all workers can access, even if it isn't used aside from the >> (empty) manifest [1]. Thomas mentioned that at Lyft they bypass artifact >> staging altogether [2]. So I was wondering, do you all think it would be >> reasonable or useful to create an "off switch" for artifact staging? >> >>> > > >> >>> > > Thanks, >> >>> > > Kyle >> >>> > > >> >>> > > [1] >> https://lists.apache.org/thread.html/d293b4158f266be1cb6c99c968535706f491fdfcd4bb20c4e30939bb@%3Cdev.beam.apache.org%3E >> >>> > > [2] >> https://issues.apache.org/jira/browse/BEAM-5187?focusedCommentId=16972715&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16972715 >> >
Re: [Portability] Turn off artifact staging?
When we added the class loader artifact stager, we introduced artifact retrieval service type as a pipeline option. It would make sense to put a "none" option there. https://github.com/apache/beam/blob/5fd93af49e6cb86ff52b20f103371df7e0447b7f/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PortablePipelineOptions.java#L107 RetrievalServiceType getRetrievalServiceType(); On Mon, Nov 25, 2019 at 10:05 AM Robert Bradshaw wrote: > boot.go could be updated to recognize NO_ARTIFACTS_STAGED_TOKEN as > well. (Should this constant be put in a common location?) > > On Sat, Nov 23, 2019 at 9:16 AM Thomas Weise wrote: > > > > JIRA: https://issues.apache.org/jira/browse/BEAM-8815 > > > > > > On Fri, Nov 22, 2019 at 5:31 PM Thomas Weise wrote: > >> > >> I'm running into the issue Kyle points out when I try to run a pipeline > that does not use artifact staging: > >> > >> 2019-11-23 01:09:18,442 WARN > org.apache.beam.runners.fnexecution.artifact.AbstractArtifactRetrievalService > - GetManifest for > /tmp/beam-artifact-staging/job_53cad419-a8c0-472c-8486-f795cc88a80f/MANIFEST > failed. > >> java.util.concurrent.ExecutionException: java.io.FileNotFoundException: > /tmp/beam-artifact-staging/job_53cad419-a8c0-472c-8486-f795cc88a80f/MANIFEST > (No such file or directory) > >> at > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:531) > >> at > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:492) > >> at > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:83) > >> at > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:196) > >> at > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2312) > >> > >> This happens when I use /opt/apache/beam/boot to start the worker in > process environment, as it will attempt to retrieve artifacts. The same > would be the case for worker pool also. > >> > >> Thomas > >> > >> > >> On Tue, Nov 12, 2019 at 5:07 PM Robert Bradshaw > wrote: > >>> > >>> FWIW, there are also discussions of adding a preparation phase for sdk > >>> harness (docker) images, such that artifacts could be staged (and > >>> installed, compiled etc.) ahead of time and shipped as part of the sdk > >>> image rather than via a side channel (and on every worker). Anyone not > >>> using these images is probably shipping dependencies in another way > >>> anyways. > >>> > >>> On Tue, Nov 12, 2019 at 5:03 PM Robert Bradshaw > wrote: > >>> > > >>> > Certainly there's a lot to be re-thought in terms of artifact > staging, > >>> > especially when it comes to cross-langauge pipelines. I think it > would > >>> > makes sense to have a special retrieval token for the "empty" > >>> > manifest, which would mean a staging directory would never have to be > >>> > set up if no artifacts happened to be staged. > >>> > > >>> > The UberJar avoids any artifact staging overhead as well. > >>> > > >>> > On Tue, Nov 12, 2019 at 3:30 PM Kyle Weaver > wrote: > >>> > > > >>> > > Hi Beamers, > >>> > > > >>> > > We can use artifact staging to make sure SDK workers have access > to a pipeline's dependencies. However, artifact staging is not always > necessary. For example, one can make sure that the environment contains all > the dependencies ahead of time. However, regardless of whether or not > artifacts are used, my understanding is an artifact manifest will be > written and read anyway. For example: > >>> > > > >>> > > INFO AbstractArtifactRetrievalService: GetManifest for > /tmp/beam-artifact-staging/.../MANIFEST -> 0 artifacts > >>> > > > >>> > > This can be a hassle, because users must set up a staging > directory that all workers can access, even if it isn't used aside from the > (empty) manifest [1]. Thomas mentioned that at Lyft they bypass artifact > staging altogether [2]. So I was wondering, do you all think it would be > reasonable or useful to create an "off switch" for artifact staging? > >>> > > > >>> > > Thanks, > >>> > > Kyle > >>> > > > >>> > > [1] > https://lists.apache.org/thread.html/d293b4158f266be1cb6c99c968535706f491fdfcd4bb20c4e30939bb@%3Cdev.beam.apache.org%3E > >>> > > [2] > https://issues.apache.org/jira/browse/BEAM-5187?focusedCommentId=16972715&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16972715 >
Re: [Portability] Turn off artifact staging?
boot.go could be updated to recognize NO_ARTIFACTS_STAGED_TOKEN as well. (Should this constant be put in a common location?) On Sat, Nov 23, 2019 at 9:16 AM Thomas Weise wrote: > > JIRA: https://issues.apache.org/jira/browse/BEAM-8815 > > > On Fri, Nov 22, 2019 at 5:31 PM Thomas Weise wrote: >> >> I'm running into the issue Kyle points out when I try to run a pipeline that >> does not use artifact staging: >> >> 2019-11-23 01:09:18,442 WARN >> org.apache.beam.runners.fnexecution.artifact.AbstractArtifactRetrievalService >> - GetManifest for >> /tmp/beam-artifact-staging/job_53cad419-a8c0-472c-8486-f795cc88a80f/MANIFEST >> failed. >> java.util.concurrent.ExecutionException: java.io.FileNotFoundException: >> /tmp/beam-artifact-staging/job_53cad419-a8c0-472c-8486-f795cc88a80f/MANIFEST >> (No such file or directory) >> at >> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:531) >> at >> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:492) >> at >> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:83) >> at >> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:196) >> at >> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2312) >> >> This happens when I use /opt/apache/beam/boot to start the worker in process >> environment, as it will attempt to retrieve artifacts. The same would be the >> case for worker pool also. >> >> Thomas >> >> >> On Tue, Nov 12, 2019 at 5:07 PM Robert Bradshaw wrote: >>> >>> FWIW, there are also discussions of adding a preparation phase for sdk >>> harness (docker) images, such that artifacts could be staged (and >>> installed, compiled etc.) ahead of time and shipped as part of the sdk >>> image rather than via a side channel (and on every worker). Anyone not >>> using these images is probably shipping dependencies in another way >>> anyways. >>> >>> On Tue, Nov 12, 2019 at 5:03 PM Robert Bradshaw wrote: >>> > >>> > Certainly there's a lot to be re-thought in terms of artifact staging, >>> > especially when it comes to cross-langauge pipelines. I think it would >>> > makes sense to have a special retrieval token for the "empty" >>> > manifest, which would mean a staging directory would never have to be >>> > set up if no artifacts happened to be staged. >>> > >>> > The UberJar avoids any artifact staging overhead as well. >>> > >>> > On Tue, Nov 12, 2019 at 3:30 PM Kyle Weaver wrote: >>> > > >>> > > Hi Beamers, >>> > > >>> > > We can use artifact staging to make sure SDK workers have access to a >>> > > pipeline's dependencies. However, artifact staging is not always >>> > > necessary. For example, one can make sure that the environment contains >>> > > all the dependencies ahead of time. However, regardless of whether or >>> > > not artifacts are used, my understanding is an artifact manifest will >>> > > be written and read anyway. For example: >>> > > >>> > > INFO AbstractArtifactRetrievalService: GetManifest for >>> > > /tmp/beam-artifact-staging/.../MANIFEST -> 0 artifacts >>> > > >>> > > This can be a hassle, because users must set up a staging directory >>> > > that all workers can access, even if it isn't used aside from the >>> > > (empty) manifest [1]. Thomas mentioned that at Lyft they bypass >>> > > artifact staging altogether [2]. So I was wondering, do you all think >>> > > it would be reasonable or useful to create an "off switch" for artifact >>> > > staging? >>> > > >>> > > Thanks, >>> > > Kyle >>> > > >>> > > [1] >>> > > https://lists.apache.org/thread.html/d293b4158f266be1cb6c99c968535706f491fdfcd4bb20c4e30939bb@%3Cdev.beam.apache.org%3E >>> > > [2] >>> > > https://issues.apache.org/jira/browse/BEAM-5187?focusedCommentId=16972715&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16972715
Re: [Portability] Turn off artifact staging?
JIRA: https://issues.apache.org/jira/browse/BEAM-8815 On Fri, Nov 22, 2019 at 5:31 PM Thomas Weise wrote: > I'm running into the issue Kyle points out when I try to run a pipeline > that does not use artifact staging: > > 2019-11-23 01:09:18,442 WARN > org.apache.beam.runners.fnexecution.artifact.AbstractArtifactRetrievalService > - GetManifest for > /tmp/beam-artifact-staging/job_53cad419-a8c0-472c-8486-f795cc88a80f/MANIFEST > failed. > java.util.concurrent.ExecutionException: java.io.FileNotFoundException: > /tmp/beam-artifact-staging/job_53cad419-a8c0-472c-8486-f795cc88a80f/MANIFEST > (No such file or directory) > at > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:531) > at > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:492) > at > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:83) > at > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:196) > at > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2312) > > This happens when I use /opt/apache/beam/boot to start the worker in > process environment, as it will attempt to retrieve artifacts. The same > would be the case for worker pool also. > > Thomas > > > On Tue, Nov 12, 2019 at 5:07 PM Robert Bradshaw > wrote: > >> FWIW, there are also discussions of adding a preparation phase for sdk >> harness (docker) images, such that artifacts could be staged (and >> installed, compiled etc.) ahead of time and shipped as part of the sdk >> image rather than via a side channel (and on every worker). Anyone not >> using these images is probably shipping dependencies in another way >> anyways. >> >> On Tue, Nov 12, 2019 at 5:03 PM Robert Bradshaw >> wrote: >> > >> > Certainly there's a lot to be re-thought in terms of artifact staging, >> > especially when it comes to cross-langauge pipelines. I think it would >> > makes sense to have a special retrieval token for the "empty" >> > manifest, which would mean a staging directory would never have to be >> > set up if no artifacts happened to be staged. >> > >> > The UberJar avoids any artifact staging overhead as well. >> > >> > On Tue, Nov 12, 2019 at 3:30 PM Kyle Weaver >> wrote: >> > > >> > > Hi Beamers, >> > > >> > > We can use artifact staging to make sure SDK workers have access to a >> pipeline's dependencies. However, artifact staging is not always necessary. >> For example, one can make sure that the environment contains all the >> dependencies ahead of time. However, regardless of whether or not artifacts >> are used, my understanding is an artifact manifest will be written and read >> anyway. For example: >> > > >> > > INFO AbstractArtifactRetrievalService: GetManifest for >> /tmp/beam-artifact-staging/.../MANIFEST -> 0 artifacts >> > > >> > > This can be a hassle, because users must set up a staging directory >> that all workers can access, even if it isn't used aside from the (empty) >> manifest [1]. Thomas mentioned that at Lyft they bypass artifact staging >> altogether [2]. So I was wondering, do you all think it would be reasonable >> or useful to create an "off switch" for artifact staging? >> > > >> > > Thanks, >> > > Kyle >> > > >> > > [1] >> https://lists.apache.org/thread.html/d293b4158f266be1cb6c99c968535706f491fdfcd4bb20c4e30939bb@%3Cdev.beam.apache.org%3E >> > > [2] >> https://issues.apache.org/jira/browse/BEAM-5187?focusedCommentId=16972715&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16972715 >> >
Re: [Portability] Turn off artifact staging?
I'm running into the issue Kyle points out when I try to run a pipeline that does not use artifact staging: 2019-11-23 01:09:18,442 WARN org.apache.beam.runners.fnexecution.artifact.AbstractArtifactRetrievalService - GetManifest for /tmp/beam-artifact-staging/job_53cad419-a8c0-472c-8486-f795cc88a80f/MANIFEST failed. java.util.concurrent.ExecutionException: java.io.FileNotFoundException: /tmp/beam-artifact-staging/job_53cad419-a8c0-472c-8486-f795cc88a80f/MANIFEST (No such file or directory) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:531) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:492) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:83) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:196) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2312) This happens when I use /opt/apache/beam/boot to start the worker in process environment, as it will attempt to retrieve artifacts. The same would be the case for worker pool also. Thomas On Tue, Nov 12, 2019 at 5:07 PM Robert Bradshaw wrote: > FWIW, there are also discussions of adding a preparation phase for sdk > harness (docker) images, such that artifacts could be staged (and > installed, compiled etc.) ahead of time and shipped as part of the sdk > image rather than via a side channel (and on every worker). Anyone not > using these images is probably shipping dependencies in another way > anyways. > > On Tue, Nov 12, 2019 at 5:03 PM Robert Bradshaw > wrote: > > > > Certainly there's a lot to be re-thought in terms of artifact staging, > > especially when it comes to cross-langauge pipelines. I think it would > > makes sense to have a special retrieval token for the "empty" > > manifest, which would mean a staging directory would never have to be > > set up if no artifacts happened to be staged. > > > > The UberJar avoids any artifact staging overhead as well. > > > > On Tue, Nov 12, 2019 at 3:30 PM Kyle Weaver wrote: > > > > > > Hi Beamers, > > > > > > We can use artifact staging to make sure SDK workers have access to a > pipeline's dependencies. However, artifact staging is not always necessary. > For example, one can make sure that the environment contains all the > dependencies ahead of time. However, regardless of whether or not artifacts > are used, my understanding is an artifact manifest will be written and read > anyway. For example: > > > > > > INFO AbstractArtifactRetrievalService: GetManifest for > /tmp/beam-artifact-staging/.../MANIFEST -> 0 artifacts > > > > > > This can be a hassle, because users must set up a staging directory > that all workers can access, even if it isn't used aside from the (empty) > manifest [1]. Thomas mentioned that at Lyft they bypass artifact staging > altogether [2]. So I was wondering, do you all think it would be reasonable > or useful to create an "off switch" for artifact staging? > > > > > > Thanks, > > > Kyle > > > > > > [1] > https://lists.apache.org/thread.html/d293b4158f266be1cb6c99c968535706f491fdfcd4bb20c4e30939bb@%3Cdev.beam.apache.org%3E > > > [2] > https://issues.apache.org/jira/browse/BEAM-5187?focusedCommentId=16972715&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16972715 >
Re: [Portability] Turn off artifact staging?
FWIW, there are also discussions of adding a preparation phase for sdk harness (docker) images, such that artifacts could be staged (and installed, compiled etc.) ahead of time and shipped as part of the sdk image rather than via a side channel (and on every worker). Anyone not using these images is probably shipping dependencies in another way anyways. On Tue, Nov 12, 2019 at 5:03 PM Robert Bradshaw wrote: > > Certainly there's a lot to be re-thought in terms of artifact staging, > especially when it comes to cross-langauge pipelines. I think it would > makes sense to have a special retrieval token for the "empty" > manifest, which would mean a staging directory would never have to be > set up if no artifacts happened to be staged. > > The UberJar avoids any artifact staging overhead as well. > > On Tue, Nov 12, 2019 at 3:30 PM Kyle Weaver wrote: > > > > Hi Beamers, > > > > We can use artifact staging to make sure SDK workers have access to a > > pipeline's dependencies. However, artifact staging is not always necessary. > > For example, one can make sure that the environment contains all the > > dependencies ahead of time. However, regardless of whether or not artifacts > > are used, my understanding is an artifact manifest will be written and read > > anyway. For example: > > > > INFO AbstractArtifactRetrievalService: GetManifest for > > /tmp/beam-artifact-staging/.../MANIFEST -> 0 artifacts > > > > This can be a hassle, because users must set up a staging directory that > > all workers can access, even if it isn't used aside from the (empty) > > manifest [1]. Thomas mentioned that at Lyft they bypass artifact staging > > altogether [2]. So I was wondering, do you all think it would be reasonable > > or useful to create an "off switch" for artifact staging? > > > > Thanks, > > Kyle > > > > [1] > > https://lists.apache.org/thread.html/d293b4158f266be1cb6c99c968535706f491fdfcd4bb20c4e30939bb@%3Cdev.beam.apache.org%3E > > [2] > > https://issues.apache.org/jira/browse/BEAM-5187?focusedCommentId=16972715&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16972715
Re: [Portability] Turn off artifact staging?
Certainly there's a lot to be re-thought in terms of artifact staging, especially when it comes to cross-langauge pipelines. I think it would makes sense to have a special retrieval token for the "empty" manifest, which would mean a staging directory would never have to be set up if no artifacts happened to be staged. The UberJar avoids any artifact staging overhead as well. On Tue, Nov 12, 2019 at 3:30 PM Kyle Weaver wrote: > > Hi Beamers, > > We can use artifact staging to make sure SDK workers have access to a > pipeline's dependencies. However, artifact staging is not always necessary. > For example, one can make sure that the environment contains all the > dependencies ahead of time. However, regardless of whether or not artifacts > are used, my understanding is an artifact manifest will be written and read > anyway. For example: > > INFO AbstractArtifactRetrievalService: GetManifest for > /tmp/beam-artifact-staging/.../MANIFEST -> 0 artifacts > > This can be a hassle, because users must set up a staging directory that all > workers can access, even if it isn't used aside from the (empty) manifest > [1]. Thomas mentioned that at Lyft they bypass artifact staging altogether > [2]. So I was wondering, do you all think it would be reasonable or useful to > create an "off switch" for artifact staging? > > Thanks, > Kyle > > [1] > https://lists.apache.org/thread.html/d293b4158f266be1cb6c99c968535706f491fdfcd4bb20c4e30939bb@%3Cdev.beam.apache.org%3E > [2] > https://issues.apache.org/jira/browse/BEAM-5187?focusedCommentId=16972715&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16972715
[Portability] Turn off artifact staging?
Hi Beamers, We can use artifact staging to make sure SDK workers have access to a pipeline's dependencies. However, artifact staging is not always necessary. For example, one can make sure that the environment contains all the dependencies ahead of time. However, regardless of whether or not artifacts are used, my understanding is an artifact manifest will be written and read anyway. For example: INFO AbstractArtifactRetrievalService: GetManifest for /tmp/beam-artifact-staging/.../MANIFEST -> 0 artifacts This can be a hassle, because users must set up a staging directory that all workers can access, even if it isn't used aside from the (empty) manifest [1]. Thomas mentioned that at Lyft they bypass artifact staging altogether [2]. So I was wondering, do you all think it would be reasonable or useful to create an "off switch" for artifact staging? Thanks, Kyle [1] https://lists.apache.org/thread.html/d293b4158f266be1cb6c99c968535706f491fdfcd4bb20c4e30939bb@%3Cdev.beam.apache.org%3E [2] https://issues.apache.org/jira/browse/BEAM-5187?focusedCommentId=16972715&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16972715