Re: [Portability] Turn off artifact staging?

2019-11-25 Thread Kyle Weaver
Ah didn't see your pull request yet Thomas. Will take a look later.

On Mon, Nov 25, 2019 at 10:23 AM Thomas Weise  wrote:

> Thanks, I would prefer to solve this in a way where the user does not need
> to configure anything extra though.
>
>
> On Mon, Nov 25, 2019 at 10:21 AM Kyle Weaver  wrote:
>
>> When we added the class loader artifact stager, we introduced artifact
>> retrieval service type as a pipeline option. It would make sense to put a
>> "none" option there.
>>
>>
>> https://github.com/apache/beam/blob/5fd93af49e6cb86ff52b20f103371df7e0447b7f/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PortablePipelineOptions.java#L107
>>
>>   RetrievalServiceType getRetrievalServiceType();
>>
>>
>> On Mon, Nov 25, 2019 at 10:05 AM Robert Bradshaw 
>> wrote:
>>
>>> boot.go could be updated to recognize NO_ARTIFACTS_STAGED_TOKEN as
>>> well. (Should this constant be put in a common location?)
>>>
>>> On Sat, Nov 23, 2019 at 9:16 AM Thomas Weise  wrote:
>>> >
>>> > JIRA: https://issues.apache.org/jira/browse/BEAM-8815
>>> >
>>> >
>>> > On Fri, Nov 22, 2019 at 5:31 PM Thomas Weise  wrote:
>>> >>
>>> >> I'm running into the issue Kyle points out when I try to run a
>>> pipeline that does not use artifact staging:
>>> >>
>>> >> 2019-11-23 01:09:18,442 WARN
>>> org.apache.beam.runners.fnexecution.artifact.AbstractArtifactRetrievalService
>>> - GetManifest for
>>> /tmp/beam-artifact-staging/job_53cad419-a8c0-472c-8486-f795cc88a80f/MANIFEST
>>> failed.
>>> >> java.util.concurrent.ExecutionException:
>>> java.io.FileNotFoundException:
>>> /tmp/beam-artifact-staging/job_53cad419-a8c0-472c-8486-f795cc88a80f/MANIFEST
>>> (No such file or directory)
>>> >> at
>>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:531)
>>> >> at
>>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:492)
>>> >> at
>>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:83)
>>> >> at
>>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:196)
>>> >> at
>>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2312)
>>> >>
>>> >> This happens when I use /opt/apache/beam/boot to start the worker in
>>> process environment, as it will attempt to retrieve artifacts. The same
>>> would be the case for worker pool also.
>>> >>
>>> >> Thomas
>>> >>
>>> >>
>>> >> On Tue, Nov 12, 2019 at 5:07 PM Robert Bradshaw 
>>> wrote:
>>> >>>
>>> >>> FWIW, there are also discussions of adding a preparation phase for
>>> sdk
>>> >>> harness (docker) images, such that artifacts could be staged (and
>>> >>> installed, compiled etc.) ahead of time and shipped as part of the
>>> sdk
>>> >>> image rather than via a side channel (and on every worker). Anyone
>>> not
>>> >>> using these images is probably shipping dependencies in another way
>>> >>> anyways.
>>> >>>
>>> >>> On Tue, Nov 12, 2019 at 5:03 PM Robert Bradshaw 
>>> wrote:
>>> >>> >
>>> >>> > Certainly there's a lot to be re-thought in terms of artifact
>>> staging,
>>> >>> > especially when it comes to cross-langauge pipelines. I think it
>>> would
>>> >>> > makes sense to have a special retrieval token for the "empty"
>>> >>> > manifest, which would mean a staging directory would never have to
>>> be
>>> >>> > set up if no artifacts happened to be staged.
>>> >>> >
>>> >>> > The UberJar avoids any artifact staging overhead as well.
>>> >>> >
>>> >>> > On Tue, Nov 12, 2019 at 3:30 PM Kyle Weaver 
>>> wrote:
>>> >>> > >
>>> >>> > > Hi Beamers,
>>> >>> > >
>>> >>> > > We can use artifact staging to make sure SDK workers have access
>>> to a pipeline's dependencies. However, artifact staging is not always
>>> necessary. For example, one can make sure that the environment contains all
>>> the dependencies ahead of time. However, regardless of whether or not
>>> artifacts are used, my understanding is an artifact manifest will be
>>> written and read anyway. For example:
>>> >>> > >
>>> >>> > > INFO AbstractArtifactRetrievalService: GetManifest for
>>> /tmp/beam-artifact-staging/.../MANIFEST -> 0 artifacts
>>> >>> > >
>>> >>> > > This can be a hassle, because users must set up a staging
>>> directory that all workers can access, even if it isn't used aside from the
>>> (empty) manifest [1]. Thomas mentioned that at Lyft they bypass artifact
>>> staging altogether [2]. So I was wondering, do you all think it would be
>>> reasonable or useful to create an "off switch" for artifact staging?
>>> >>> > >
>>> >>> > > Thanks,
>>> >>> > > Kyle
>>> >>> > >
>>> >>> > > [1]
>>> https://lists.apache.org/thread.html/d293b4158f266be1cb6c99c968535706f491fdfcd4bb20c4e30939bb@%3Cdev.beam.apache.org%3E
>>> >>> > > [2]
>>> https://issues.apache.org/jira/browse/BEAM-5187?focusedCom

Re: [Portability] Turn off artifact staging?

2019-11-25 Thread Thomas Weise
Thanks, I would prefer to solve this in a way where the user does not need
to configure anything extra though.


On Mon, Nov 25, 2019 at 10:21 AM Kyle Weaver  wrote:

> When we added the class loader artifact stager, we introduced artifact
> retrieval service type as a pipeline option. It would make sense to put a
> "none" option there.
>
>
> https://github.com/apache/beam/blob/5fd93af49e6cb86ff52b20f103371df7e0447b7f/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PortablePipelineOptions.java#L107
>
>   RetrievalServiceType getRetrievalServiceType();
>
>
> On Mon, Nov 25, 2019 at 10:05 AM Robert Bradshaw 
> wrote:
>
>> boot.go could be updated to recognize NO_ARTIFACTS_STAGED_TOKEN as
>> well. (Should this constant be put in a common location?)
>>
>> On Sat, Nov 23, 2019 at 9:16 AM Thomas Weise  wrote:
>> >
>> > JIRA: https://issues.apache.org/jira/browse/BEAM-8815
>> >
>> >
>> > On Fri, Nov 22, 2019 at 5:31 PM Thomas Weise  wrote:
>> >>
>> >> I'm running into the issue Kyle points out when I try to run a
>> pipeline that does not use artifact staging:
>> >>
>> >> 2019-11-23 01:09:18,442 WARN
>> org.apache.beam.runners.fnexecution.artifact.AbstractArtifactRetrievalService
>> - GetManifest for
>> /tmp/beam-artifact-staging/job_53cad419-a8c0-472c-8486-f795cc88a80f/MANIFEST
>> failed.
>> >> java.util.concurrent.ExecutionException:
>> java.io.FileNotFoundException:
>> /tmp/beam-artifact-staging/job_53cad419-a8c0-472c-8486-f795cc88a80f/MANIFEST
>> (No such file or directory)
>> >> at
>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:531)
>> >> at
>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:492)
>> >> at
>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:83)
>> >> at
>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:196)
>> >> at
>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2312)
>> >>
>> >> This happens when I use /opt/apache/beam/boot to start the worker in
>> process environment, as it will attempt to retrieve artifacts. The same
>> would be the case for worker pool also.
>> >>
>> >> Thomas
>> >>
>> >>
>> >> On Tue, Nov 12, 2019 at 5:07 PM Robert Bradshaw 
>> wrote:
>> >>>
>> >>> FWIW, there are also discussions of adding a preparation phase for sdk
>> >>> harness (docker) images, such that artifacts could be staged (and
>> >>> installed, compiled etc.) ahead of time and shipped as part of the sdk
>> >>> image rather than via a side channel (and on every worker). Anyone not
>> >>> using these images is probably shipping dependencies in another way
>> >>> anyways.
>> >>>
>> >>> On Tue, Nov 12, 2019 at 5:03 PM Robert Bradshaw 
>> wrote:
>> >>> >
>> >>> > Certainly there's a lot to be re-thought in terms of artifact
>> staging,
>> >>> > especially when it comes to cross-langauge pipelines. I think it
>> would
>> >>> > makes sense to have a special retrieval token for the "empty"
>> >>> > manifest, which would mean a staging directory would never have to
>> be
>> >>> > set up if no artifacts happened to be staged.
>> >>> >
>> >>> > The UberJar avoids any artifact staging overhead as well.
>> >>> >
>> >>> > On Tue, Nov 12, 2019 at 3:30 PM Kyle Weaver 
>> wrote:
>> >>> > >
>> >>> > > Hi Beamers,
>> >>> > >
>> >>> > > We can use artifact staging to make sure SDK workers have access
>> to a pipeline's dependencies. However, artifact staging is not always
>> necessary. For example, one can make sure that the environment contains all
>> the dependencies ahead of time. However, regardless of whether or not
>> artifacts are used, my understanding is an artifact manifest will be
>> written and read anyway. For example:
>> >>> > >
>> >>> > > INFO AbstractArtifactRetrievalService: GetManifest for
>> /tmp/beam-artifact-staging/.../MANIFEST -> 0 artifacts
>> >>> > >
>> >>> > > This can be a hassle, because users must set up a staging
>> directory that all workers can access, even if it isn't used aside from the
>> (empty) manifest [1]. Thomas mentioned that at Lyft they bypass artifact
>> staging altogether [2]. So I was wondering, do you all think it would be
>> reasonable or useful to create an "off switch" for artifact staging?
>> >>> > >
>> >>> > > Thanks,
>> >>> > > Kyle
>> >>> > >
>> >>> > > [1]
>> https://lists.apache.org/thread.html/d293b4158f266be1cb6c99c968535706f491fdfcd4bb20c4e30939bb@%3Cdev.beam.apache.org%3E
>> >>> > > [2]
>> https://issues.apache.org/jira/browse/BEAM-5187?focusedCommentId=16972715&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16972715
>>
>


Re: [Portability] Turn off artifact staging?

2019-11-25 Thread Kyle Weaver
When we added the class loader artifact stager, we introduced artifact
retrieval service type as a pipeline option. It would make sense to put a
"none" option there.

https://github.com/apache/beam/blob/5fd93af49e6cb86ff52b20f103371df7e0447b7f/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PortablePipelineOptions.java#L107

  RetrievalServiceType getRetrievalServiceType();


On Mon, Nov 25, 2019 at 10:05 AM Robert Bradshaw 
wrote:

> boot.go could be updated to recognize NO_ARTIFACTS_STAGED_TOKEN as
> well. (Should this constant be put in a common location?)
>
> On Sat, Nov 23, 2019 at 9:16 AM Thomas Weise  wrote:
> >
> > JIRA: https://issues.apache.org/jira/browse/BEAM-8815
> >
> >
> > On Fri, Nov 22, 2019 at 5:31 PM Thomas Weise  wrote:
> >>
> >> I'm running into the issue Kyle points out when I try to run a pipeline
> that does not use artifact staging:
> >>
> >> 2019-11-23 01:09:18,442 WARN
> org.apache.beam.runners.fnexecution.artifact.AbstractArtifactRetrievalService
> - GetManifest for
> /tmp/beam-artifact-staging/job_53cad419-a8c0-472c-8486-f795cc88a80f/MANIFEST
> failed.
> >> java.util.concurrent.ExecutionException: java.io.FileNotFoundException:
> /tmp/beam-artifact-staging/job_53cad419-a8c0-472c-8486-f795cc88a80f/MANIFEST
> (No such file or directory)
> >> at
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:531)
> >> at
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:492)
> >> at
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:83)
> >> at
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:196)
> >> at
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2312)
> >>
> >> This happens when I use /opt/apache/beam/boot to start the worker in
> process environment, as it will attempt to retrieve artifacts. The same
> would be the case for worker pool also.
> >>
> >> Thomas
> >>
> >>
> >> On Tue, Nov 12, 2019 at 5:07 PM Robert Bradshaw 
> wrote:
> >>>
> >>> FWIW, there are also discussions of adding a preparation phase for sdk
> >>> harness (docker) images, such that artifacts could be staged (and
> >>> installed, compiled etc.) ahead of time and shipped as part of the sdk
> >>> image rather than via a side channel (and on every worker). Anyone not
> >>> using these images is probably shipping dependencies in another way
> >>> anyways.
> >>>
> >>> On Tue, Nov 12, 2019 at 5:03 PM Robert Bradshaw 
> wrote:
> >>> >
> >>> > Certainly there's a lot to be re-thought in terms of artifact
> staging,
> >>> > especially when it comes to cross-langauge pipelines. I think it
> would
> >>> > makes sense to have a special retrieval token for the "empty"
> >>> > manifest, which would mean a staging directory would never have to be
> >>> > set up if no artifacts happened to be staged.
> >>> >
> >>> > The UberJar avoids any artifact staging overhead as well.
> >>> >
> >>> > On Tue, Nov 12, 2019 at 3:30 PM Kyle Weaver 
> wrote:
> >>> > >
> >>> > > Hi Beamers,
> >>> > >
> >>> > > We can use artifact staging to make sure SDK workers have access
> to a pipeline's dependencies. However, artifact staging is not always
> necessary. For example, one can make sure that the environment contains all
> the dependencies ahead of time. However, regardless of whether or not
> artifacts are used, my understanding is an artifact manifest will be
> written and read anyway. For example:
> >>> > >
> >>> > > INFO AbstractArtifactRetrievalService: GetManifest for
> /tmp/beam-artifact-staging/.../MANIFEST -> 0 artifacts
> >>> > >
> >>> > > This can be a hassle, because users must set up a staging
> directory that all workers can access, even if it isn't used aside from the
> (empty) manifest [1]. Thomas mentioned that at Lyft they bypass artifact
> staging altogether [2]. So I was wondering, do you all think it would be
> reasonable or useful to create an "off switch" for artifact staging?
> >>> > >
> >>> > > Thanks,
> >>> > > Kyle
> >>> > >
> >>> > > [1]
> https://lists.apache.org/thread.html/d293b4158f266be1cb6c99c968535706f491fdfcd4bb20c4e30939bb@%3Cdev.beam.apache.org%3E
> >>> > > [2]
> https://issues.apache.org/jira/browse/BEAM-5187?focusedCommentId=16972715&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16972715
>


Re: [Portability] Turn off artifact staging?

2019-11-25 Thread Robert Bradshaw
boot.go could be updated to recognize NO_ARTIFACTS_STAGED_TOKEN as
well. (Should this constant be put in a common location?)

On Sat, Nov 23, 2019 at 9:16 AM Thomas Weise  wrote:
>
> JIRA: https://issues.apache.org/jira/browse/BEAM-8815
>
>
> On Fri, Nov 22, 2019 at 5:31 PM Thomas Weise  wrote:
>>
>> I'm running into the issue Kyle points out when I try to run a pipeline that 
>> does not use artifact staging:
>>
>> 2019-11-23 01:09:18,442 WARN  
>> org.apache.beam.runners.fnexecution.artifact.AbstractArtifactRetrievalService
>>   - GetManifest for 
>> /tmp/beam-artifact-staging/job_53cad419-a8c0-472c-8486-f795cc88a80f/MANIFEST 
>> failed.
>> java.util.concurrent.ExecutionException: java.io.FileNotFoundException: 
>> /tmp/beam-artifact-staging/job_53cad419-a8c0-472c-8486-f795cc88a80f/MANIFEST 
>> (No such file or directory)
>> at 
>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:531)
>> at 
>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:492)
>> at 
>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:83)
>> at 
>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:196)
>> at 
>> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2312)
>>
>> This happens when I use /opt/apache/beam/boot to start the worker in process 
>> environment, as it will attempt to retrieve artifacts. The same would be the 
>> case for worker pool also.
>>
>> Thomas
>>
>>
>> On Tue, Nov 12, 2019 at 5:07 PM Robert Bradshaw  wrote:
>>>
>>> FWIW, there are also discussions of adding a preparation phase for sdk
>>> harness (docker) images, such that artifacts could be staged (and
>>> installed, compiled etc.) ahead of time and shipped as part of the sdk
>>> image rather than via a side channel (and on every worker). Anyone not
>>> using these images is probably shipping dependencies in another way
>>> anyways.
>>>
>>> On Tue, Nov 12, 2019 at 5:03 PM Robert Bradshaw  wrote:
>>> >
>>> > Certainly there's a lot to be re-thought in terms of artifact staging,
>>> > especially when it comes to cross-langauge pipelines. I think it would
>>> > makes sense to have a special retrieval token for the "empty"
>>> > manifest, which would mean a staging directory would never have to be
>>> > set up if no artifacts happened to be staged.
>>> >
>>> > The UberJar avoids any artifact staging overhead as well.
>>> >
>>> > On Tue, Nov 12, 2019 at 3:30 PM Kyle Weaver  wrote:
>>> > >
>>> > > Hi Beamers,
>>> > >
>>> > > We can use artifact staging to make sure SDK workers have access to a 
>>> > > pipeline's dependencies. However, artifact staging is not always 
>>> > > necessary. For example, one can make sure that the environment contains 
>>> > > all the dependencies ahead of time. However, regardless of whether or 
>>> > > not artifacts are used, my understanding is an artifact manifest will 
>>> > > be written and read anyway. For example:
>>> > >
>>> > > INFO AbstractArtifactRetrievalService: GetManifest for 
>>> > > /tmp/beam-artifact-staging/.../MANIFEST -> 0 artifacts
>>> > >
>>> > > This can be a hassle, because users must set up a staging directory 
>>> > > that all workers can access, even if it isn't used aside from the 
>>> > > (empty) manifest [1]. Thomas mentioned that at Lyft they bypass 
>>> > > artifact staging altogether [2]. So I was wondering, do you all think 
>>> > > it would be reasonable or useful to create an "off switch" for artifact 
>>> > > staging?
>>> > >
>>> > > Thanks,
>>> > > Kyle
>>> > >
>>> > > [1] 
>>> > > https://lists.apache.org/thread.html/d293b4158f266be1cb6c99c968535706f491fdfcd4bb20c4e30939bb@%3Cdev.beam.apache.org%3E
>>> > > [2] 
>>> > > https://issues.apache.org/jira/browse/BEAM-5187?focusedCommentId=16972715&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16972715


Re: [Portability] Turn off artifact staging?

2019-11-23 Thread Thomas Weise
JIRA: https://issues.apache.org/jira/browse/BEAM-8815


On Fri, Nov 22, 2019 at 5:31 PM Thomas Weise  wrote:

> I'm running into the issue Kyle points out when I try to run a pipeline
> that does not use artifact staging:
>
> 2019-11-23 01:09:18,442 WARN
>  org.apache.beam.runners.fnexecution.artifact.AbstractArtifactRetrievalService
>  - GetManifest for
> /tmp/beam-artifact-staging/job_53cad419-a8c0-472c-8486-f795cc88a80f/MANIFEST
> failed.
> java.util.concurrent.ExecutionException: java.io.FileNotFoundException:
> /tmp/beam-artifact-staging/job_53cad419-a8c0-472c-8486-f795cc88a80f/MANIFEST
> (No such file or directory)
> at
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:531)
> at
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:492)
> at
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:83)
> at
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:196)
> at
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2312)
>
> This happens when I use /opt/apache/beam/boot to start the worker in
> process environment, as it will attempt to retrieve artifacts. The same
> would be the case for worker pool also.
>
> Thomas
>
>
> On Tue, Nov 12, 2019 at 5:07 PM Robert Bradshaw 
> wrote:
>
>> FWIW, there are also discussions of adding a preparation phase for sdk
>> harness (docker) images, such that artifacts could be staged (and
>> installed, compiled etc.) ahead of time and shipped as part of the sdk
>> image rather than via a side channel (and on every worker). Anyone not
>> using these images is probably shipping dependencies in another way
>> anyways.
>>
>> On Tue, Nov 12, 2019 at 5:03 PM Robert Bradshaw 
>> wrote:
>> >
>> > Certainly there's a lot to be re-thought in terms of artifact staging,
>> > especially when it comes to cross-langauge pipelines. I think it would
>> > makes sense to have a special retrieval token for the "empty"
>> > manifest, which would mean a staging directory would never have to be
>> > set up if no artifacts happened to be staged.
>> >
>> > The UberJar avoids any artifact staging overhead as well.
>> >
>> > On Tue, Nov 12, 2019 at 3:30 PM Kyle Weaver 
>> wrote:
>> > >
>> > > Hi Beamers,
>> > >
>> > > We can use artifact staging to make sure SDK workers have access to a
>> pipeline's dependencies. However, artifact staging is not always necessary.
>> For example, one can make sure that the environment contains all the
>> dependencies ahead of time. However, regardless of whether or not artifacts
>> are used, my understanding is an artifact manifest will be written and read
>> anyway. For example:
>> > >
>> > > INFO AbstractArtifactRetrievalService: GetManifest for
>> /tmp/beam-artifact-staging/.../MANIFEST -> 0 artifacts
>> > >
>> > > This can be a hassle, because users must set up a staging directory
>> that all workers can access, even if it isn't used aside from the (empty)
>> manifest [1]. Thomas mentioned that at Lyft they bypass artifact staging
>> altogether [2]. So I was wondering, do you all think it would be reasonable
>> or useful to create an "off switch" for artifact staging?
>> > >
>> > > Thanks,
>> > > Kyle
>> > >
>> > > [1]
>> https://lists.apache.org/thread.html/d293b4158f266be1cb6c99c968535706f491fdfcd4bb20c4e30939bb@%3Cdev.beam.apache.org%3E
>> > > [2]
>> https://issues.apache.org/jira/browse/BEAM-5187?focusedCommentId=16972715&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16972715
>>
>


Re: [Portability] Turn off artifact staging?

2019-11-22 Thread Thomas Weise
I'm running into the issue Kyle points out when I try to run a pipeline
that does not use artifact staging:

2019-11-23 01:09:18,442 WARN
 org.apache.beam.runners.fnexecution.artifact.AbstractArtifactRetrievalService
 - GetManifest for
/tmp/beam-artifact-staging/job_53cad419-a8c0-472c-8486-f795cc88a80f/MANIFEST
failed.
java.util.concurrent.ExecutionException: java.io.FileNotFoundException:
/tmp/beam-artifact-staging/job_53cad419-a8c0-472c-8486-f795cc88a80f/MANIFEST
(No such file or directory)
at
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:531)
at
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:492)
at
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:83)
at
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:196)
at
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2312)

This happens when I use /opt/apache/beam/boot to start the worker in
process environment, as it will attempt to retrieve artifacts. The same
would be the case for worker pool also.

Thomas


On Tue, Nov 12, 2019 at 5:07 PM Robert Bradshaw  wrote:

> FWIW, there are also discussions of adding a preparation phase for sdk
> harness (docker) images, such that artifacts could be staged (and
> installed, compiled etc.) ahead of time and shipped as part of the sdk
> image rather than via a side channel (and on every worker). Anyone not
> using these images is probably shipping dependencies in another way
> anyways.
>
> On Tue, Nov 12, 2019 at 5:03 PM Robert Bradshaw 
> wrote:
> >
> > Certainly there's a lot to be re-thought in terms of artifact staging,
> > especially when it comes to cross-langauge pipelines. I think it would
> > makes sense to have a special retrieval token for the "empty"
> > manifest, which would mean a staging directory would never have to be
> > set up if no artifacts happened to be staged.
> >
> > The UberJar avoids any artifact staging overhead as well.
> >
> > On Tue, Nov 12, 2019 at 3:30 PM Kyle Weaver  wrote:
> > >
> > > Hi Beamers,
> > >
> > > We can use artifact staging to make sure SDK workers have access to a
> pipeline's dependencies. However, artifact staging is not always necessary.
> For example, one can make sure that the environment contains all the
> dependencies ahead of time. However, regardless of whether or not artifacts
> are used, my understanding is an artifact manifest will be written and read
> anyway. For example:
> > >
> > > INFO AbstractArtifactRetrievalService: GetManifest for
> /tmp/beam-artifact-staging/.../MANIFEST -> 0 artifacts
> > >
> > > This can be a hassle, because users must set up a staging directory
> that all workers can access, even if it isn't used aside from the (empty)
> manifest [1]. Thomas mentioned that at Lyft they bypass artifact staging
> altogether [2]. So I was wondering, do you all think it would be reasonable
> or useful to create an "off switch" for artifact staging?
> > >
> > > Thanks,
> > > Kyle
> > >
> > > [1]
> https://lists.apache.org/thread.html/d293b4158f266be1cb6c99c968535706f491fdfcd4bb20c4e30939bb@%3Cdev.beam.apache.org%3E
> > > [2]
> https://issues.apache.org/jira/browse/BEAM-5187?focusedCommentId=16972715&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16972715
>


Re: [Portability] Turn off artifact staging?

2019-11-12 Thread Robert Bradshaw
FWIW, there are also discussions of adding a preparation phase for sdk
harness (docker) images, such that artifacts could be staged (and
installed, compiled etc.) ahead of time and shipped as part of the sdk
image rather than via a side channel (and on every worker). Anyone not
using these images is probably shipping dependencies in another way
anyways.

On Tue, Nov 12, 2019 at 5:03 PM Robert Bradshaw  wrote:
>
> Certainly there's a lot to be re-thought in terms of artifact staging,
> especially when it comes to cross-langauge pipelines. I think it would
> makes sense to have a special retrieval token for the "empty"
> manifest, which would mean a staging directory would never have to be
> set up if no artifacts happened to be staged.
>
> The UberJar avoids any artifact staging overhead as well.
>
> On Tue, Nov 12, 2019 at 3:30 PM Kyle Weaver  wrote:
> >
> > Hi Beamers,
> >
> > We can use artifact staging to make sure SDK workers have access to a 
> > pipeline's dependencies. However, artifact staging is not always necessary. 
> > For example, one can make sure that the environment contains all the 
> > dependencies ahead of time. However, regardless of whether or not artifacts 
> > are used, my understanding is an artifact manifest will be written and read 
> > anyway. For example:
> >
> > INFO AbstractArtifactRetrievalService: GetManifest for 
> > /tmp/beam-artifact-staging/.../MANIFEST -> 0 artifacts
> >
> > This can be a hassle, because users must set up a staging directory that 
> > all workers can access, even if it isn't used aside from the (empty) 
> > manifest [1]. Thomas mentioned that at Lyft they bypass artifact staging 
> > altogether [2]. So I was wondering, do you all think it would be reasonable 
> > or useful to create an "off switch" for artifact staging?
> >
> > Thanks,
> > Kyle
> >
> > [1] 
> > https://lists.apache.org/thread.html/d293b4158f266be1cb6c99c968535706f491fdfcd4bb20c4e30939bb@%3Cdev.beam.apache.org%3E
> > [2] 
> > https://issues.apache.org/jira/browse/BEAM-5187?focusedCommentId=16972715&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16972715


Re: [Portability] Turn off artifact staging?

2019-11-12 Thread Robert Bradshaw
Certainly there's a lot to be re-thought in terms of artifact staging,
especially when it comes to cross-langauge pipelines. I think it would
makes sense to have a special retrieval token for the "empty"
manifest, which would mean a staging directory would never have to be
set up if no artifacts happened to be staged.

The UberJar avoids any artifact staging overhead as well.

On Tue, Nov 12, 2019 at 3:30 PM Kyle Weaver  wrote:
>
> Hi Beamers,
>
> We can use artifact staging to make sure SDK workers have access to a 
> pipeline's dependencies. However, artifact staging is not always necessary. 
> For example, one can make sure that the environment contains all the 
> dependencies ahead of time. However, regardless of whether or not artifacts 
> are used, my understanding is an artifact manifest will be written and read 
> anyway. For example:
>
> INFO AbstractArtifactRetrievalService: GetManifest for 
> /tmp/beam-artifact-staging/.../MANIFEST -> 0 artifacts
>
> This can be a hassle, because users must set up a staging directory that all 
> workers can access, even if it isn't used aside from the (empty) manifest 
> [1]. Thomas mentioned that at Lyft they bypass artifact staging altogether 
> [2]. So I was wondering, do you all think it would be reasonable or useful to 
> create an "off switch" for artifact staging?
>
> Thanks,
> Kyle
>
> [1] 
> https://lists.apache.org/thread.html/d293b4158f266be1cb6c99c968535706f491fdfcd4bb20c4e30939bb@%3Cdev.beam.apache.org%3E
> [2] 
> https://issues.apache.org/jira/browse/BEAM-5187?focusedCommentId=16972715&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16972715


[Portability] Turn off artifact staging?

2019-11-12 Thread Kyle Weaver
Hi Beamers,

We can use artifact staging to make sure SDK workers have access to a
pipeline's dependencies. However, artifact staging is not always necessary.
For example, one can make sure that the environment contains all the
dependencies ahead of time. However, regardless of whether or not artifacts
are used, my understanding is an artifact manifest will be written and read
anyway. For example:

INFO AbstractArtifactRetrievalService: GetManifest for
/tmp/beam-artifact-staging/.../MANIFEST -> 0 artifacts

This can be a hassle, because users must set up a staging directory that
all workers can access, even if it isn't used aside from the (empty)
manifest [1]. Thomas mentioned that at Lyft they bypass artifact staging
altogether [2]. So I was wondering, do you all think it would be reasonable
or useful to create an "off switch" for artifact staging?

Thanks,
Kyle

[1]
https://lists.apache.org/thread.html/d293b4158f266be1cb6c99c968535706f491fdfcd4bb20c4e30939bb@%3Cdev.beam.apache.org%3E
[2]
https://issues.apache.org/jira/browse/BEAM-5187?focusedCommentId=16972715&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16972715