Re: Artifact staging in cross-language pipelines

2019-04-18 Thread Thomas Weise
Good discussion :)

Initially the expansion service was considered a user responsibility, but I
think that isn't necessarily the case. I can also see the expansion service
provided as part of the infrastructure and the user not wanting to deal
with it at all. For example, users may want to write Python transforms and
use external IOs, without being concerned how these IOs are provided. Under
such scenario it would be good if:

* Expansion service(s) can be auto-discovered via the job service endpoint
* Available external transforms can be discovered via the expansion
service(s)
* Dependencies for external transforms are part of the metadata returned by
expansion service

Dependencies could then be staged either by the SDK client or the expansion
service. The expansion service could provide the locations to stage to the
SDK, it would still be transparent to the user.

I also agree with Luke regarding the environments. Docker is the choice for
generic deployment. Other environments are used when the flexibility
offered by Docker isn't needed (or gets into the way). Then the
dependencies are provided in different ways. Whether these are Python
packages or jar files, by opting out of Docker the decision is made to
manage dependencies externally.

Thomas



On Thu, Apr 18, 2019 at 6:01 PM Chamikara Jayalath 
wrote:

>
>
> On Thu, Apr 18, 2019 at 5:21 PM Chamikara Jayalath 
> wrote:
>
>> Thanks for raising the concern about credentials Ankur, I agree that this
>> is a significant issue.
>>
>> On Thu, Apr 18, 2019 at 4:23 PM Lukasz Cwik  wrote:
>>
>>> I can understand the concern about credentials, the same access concern
>>> will exist for several cross language transforms (mostly IOs) since some
>>> will need access to credentials to read/write to an external service.
>>>
>>> Are there any ideas on how credential propagation could work to these
>>> IOs?
>>>
>>
>> There are some cases where existing IO transforms need credentials to
>> access remote resources, for example, size estimation, validation, etc. But
>> usually these are optional (or transform can be configured to not perform
>> these functions).
>>
>
> To clarify, I'm only talking about transform expansion here. Many IO
> transforms need read/write access to remote services at run time. So
> probably we need to figure out a way to propagate these credentials anyways.
>
>
>>
>>
>>> Can we use these mechanisms for staging?
>>>
>>
>> I think we'll have to find a way to do one of (1) propagate credentials
>> to other SDKs (2) allow users to configure SDK containers to have necessary
>> credentials (3) do the artifact staging from the pipeline SDK environment
>> which already have credentials. I prefer (1) or (2) since this will given a
>> transform same feature set whether used directly (in the same SDK language
>> as the transform) or remotely but it might be hard to do this for an
>> arbitrary service that a transform might connect to considering the number
>> of ways users can configure credentials (after an offline discussion with
>> Ankur).
>>
>>
>>>
>>>
>>
>>> On Thu, Apr 18, 2019 at 3:47 PM Ankur Goenka  wrote:
>>>
 I agree that the Expansion service knows about the artifacts required
 for a cross language transform and having a prepackage folder/Zip for
 transforms based on language makes sense.

 One think to note here is that expansion service might not have the
 same access privilege as the pipeline author and hence might not be able to
 stage artifacts by itself.
 Keeping this in mind I am leaning towards making Expansion service
 provide all the required artifacts to the user and let the user stage the
 artifacts as regular artifacts.
 At this time, we only have Beam File System based artifact staging
 which users local credentials to access different file systems. Even a
 docker based expansion service running on local machine might not have the
 same access privileges.

 In brief this is what I am leaning toward.
 User call for pipeline submission -> Expansion service provide cross
 language transforms and relevant artifacts to the Sdk -> Sdk Submits the
 pipeline to Jobserver and Stages user and cross language artifacts to
 artifacts staging service


 On Thu, Apr 18, 2019 at 2:33 PM Chamikara Jayalath <
 chamik...@google.com> wrote:

>
>
> On Thu, Apr 18, 2019 at 2:12 PM Lukasz Cwik  wrote:
>
>> Note that Max did ask whether making the expansion service do the
>> staging made sense, and my first line was agreeing with that direction 
>> and
>> expanding on how it could be done (so this is really Max's idea or from
>> whomever he got the idea from).
>>
>
> +1 to what Max said then :)
>
>
>>
>> I believe a lot of the value of the expansion service is not having
>> users need to be aware of all the SDK specific dependencies when they are
>> trying to create a pipeline, 

Re: investigating python precommit wordcount_it failure

2019-04-18 Thread Valentyn Tymofieiev
I am working on a postcommit worcount it failure in BEAM-7063.

On Thu, Apr 18, 2019 at 6:05 PM Udi Meiri  wrote:

> Correction: it's a postcommit failure
>
> On Thu, Apr 18, 2019 at 5:43 PM Udi Meiri  wrote:
>
>> in https://issues.apache.org/jira/browse/BEAM-7111
>>
>> If anyone has state please lmk
>>
>


Re: investigating python precommit wordcount_it failure

2019-04-18 Thread Udi Meiri
Correction: it's a postcommit failure

On Thu, Apr 18, 2019 at 5:43 PM Udi Meiri  wrote:

> in https://issues.apache.org/jira/browse/BEAM-7111
>
> If anyone has state please lmk
>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Artifact staging in cross-language pipelines

2019-04-18 Thread Chamikara Jayalath
On Thu, Apr 18, 2019 at 5:21 PM Chamikara Jayalath 
wrote:

> Thanks for raising the concern about credentials Ankur, I agree that this
> is a significant issue.
>
> On Thu, Apr 18, 2019 at 4:23 PM Lukasz Cwik  wrote:
>
>> I can understand the concern about credentials, the same access concern
>> will exist for several cross language transforms (mostly IOs) since some
>> will need access to credentials to read/write to an external service.
>>
>> Are there any ideas on how credential propagation could work to these IOs?
>>
>
> There are some cases where existing IO transforms need credentials to
> access remote resources, for example, size estimation, validation, etc. But
> usually these are optional (or transform can be configured to not perform
> these functions).
>

To clarify, I'm only talking about transform expansion here. Many IO
transforms need read/write access to remote services at run time. So
probably we need to figure out a way to propagate these credentials anyways.


>
>
>> Can we use these mechanisms for staging?
>>
>
> I think we'll have to find a way to do one of (1) propagate credentials to
> other SDKs (2) allow users to configure SDK containers to have necessary
> credentials (3) do the artifact staging from the pipeline SDK environment
> which already have credentials. I prefer (1) or (2) since this will given a
> transform same feature set whether used directly (in the same SDK language
> as the transform) or remotely but it might be hard to do this for an
> arbitrary service that a transform might connect to considering the number
> of ways users can configure credentials (after an offline discussion with
> Ankur).
>
>
>>
>>
>
>> On Thu, Apr 18, 2019 at 3:47 PM Ankur Goenka  wrote:
>>
>>> I agree that the Expansion service knows about the artifacts required
>>> for a cross language transform and having a prepackage folder/Zip for
>>> transforms based on language makes sense.
>>>
>>> One think to note here is that expansion service might not have the same
>>> access privilege as the pipeline author and hence might not be able to
>>> stage artifacts by itself.
>>> Keeping this in mind I am leaning towards making Expansion service
>>> provide all the required artifacts to the user and let the user stage the
>>> artifacts as regular artifacts.
>>> At this time, we only have Beam File System based artifact staging which
>>> users local credentials to access different file systems. Even a docker
>>> based expansion service running on local machine might not have the same
>>> access privileges.
>>>
>>> In brief this is what I am leaning toward.
>>> User call for pipeline submission -> Expansion service provide cross
>>> language transforms and relevant artifacts to the Sdk -> Sdk Submits the
>>> pipeline to Jobserver and Stages user and cross language artifacts to
>>> artifacts staging service
>>>
>>>
>>> On Thu, Apr 18, 2019 at 2:33 PM Chamikara Jayalath 
>>> wrote:
>>>


 On Thu, Apr 18, 2019 at 2:12 PM Lukasz Cwik  wrote:

> Note that Max did ask whether making the expansion service do the
> staging made sense, and my first line was agreeing with that direction and
> expanding on how it could be done (so this is really Max's idea or from
> whomever he got the idea from).
>

 +1 to what Max said then :)


>
> I believe a lot of the value of the expansion service is not having
> users need to be aware of all the SDK specific dependencies when they are
> trying to create a pipeline, only the "user" who is launching the 
> expansion
> service may need to. And in that case we can have a prepackaged expansion
> service application that does what most users would want (e.g. expansion
> service as a docker container, a single bundled jar, ...). We (the Apache
> Beam community) could choose to host a default implementation of the
> expansion service as well.
>

 I'm not against this. But I think this is a secondary more advanced
 use-case. For a Beam users that needs to use a Java transform that they
 already have in a Python pipeline, we should provide a way to allow
 starting up a expansion service (with dependencies needed for that) and
 running a pipeline that uses this external Java transform (with
 dependencies that are needed at runtime). Probably, it'll be enough to
 allow providing all dependencies when starting up the expansion service and
 allow expansion service to do the staging of jars are well. I don't see a
 need to include the list of jars in the ExpansionResponse sent to the
 Python SDK.


>
> On Thu, Apr 18, 2019 at 2:02 PM Chamikara Jayalath <
> chamik...@google.com> wrote:
>
>> I think there are two kind of dependencies we have to consider.
>>
>> (1) Dependencies that are needed to expand the transform.
>>
>> These have to be provided when we start the expansion service so that
>> available 

investigating python precommit wordcount_it failure

2019-04-18 Thread Udi Meiri
in https://issues.apache.org/jira/browse/BEAM-7111

If anyone has state please lmk


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Artifact staging in cross-language pipelines

2019-04-18 Thread Chamikara Jayalath
Thanks for raising the concern about credentials Ankur, I agree that this
is a significant issue.

On Thu, Apr 18, 2019 at 4:23 PM Lukasz Cwik  wrote:

> I can understand the concern about credentials, the same access concern
> will exist for several cross language transforms (mostly IOs) since some
> will need access to credentials to read/write to an external service.
>
> Are there any ideas on how credential propagation could work to these IOs?
>

There are some cases where existing IO transforms need credentials to
access remote resources, for example, size estimation, validation, etc. But
usually these are optional (or transform can be configured to not perform
these functions).


> Can we use these mechanisms for staging?
>

I think we'll have to find a way to do one of (1) propagate credentials to
other SDKs (2) allow users to configure SDK containers to have necessary
credentials (3) do the artifact staging from the pipeline SDK environment
which already have credentials. I prefer (1) or (2) since this will given a
transform same feature set whether used directly (in the same SDK language
as the transform) or remotely but it might be hard to do this for an
arbitrary service that a transform might connect to considering the number
of ways users can configure credentials (after an offline discussion with
Ankur).


>
>

> On Thu, Apr 18, 2019 at 3:47 PM Ankur Goenka  wrote:
>
>> I agree that the Expansion service knows about the artifacts required for
>> a cross language transform and having a prepackage folder/Zip for
>> transforms based on language makes sense.
>>
>> One think to note here is that expansion service might not have the same
>> access privilege as the pipeline author and hence might not be able to
>> stage artifacts by itself.
>> Keeping this in mind I am leaning towards making Expansion service
>> provide all the required artifacts to the user and let the user stage the
>> artifacts as regular artifacts.
>> At this time, we only have Beam File System based artifact staging which
>> users local credentials to access different file systems. Even a docker
>> based expansion service running on local machine might not have the same
>> access privileges.
>>
>> In brief this is what I am leaning toward.
>> User call for pipeline submission -> Expansion service provide cross
>> language transforms and relevant artifacts to the Sdk -> Sdk Submits the
>> pipeline to Jobserver and Stages user and cross language artifacts to
>> artifacts staging service
>>
>>
>> On Thu, Apr 18, 2019 at 2:33 PM Chamikara Jayalath 
>> wrote:
>>
>>>
>>>
>>> On Thu, Apr 18, 2019 at 2:12 PM Lukasz Cwik  wrote:
>>>
 Note that Max did ask whether making the expansion service do the
 staging made sense, and my first line was agreeing with that direction and
 expanding on how it could be done (so this is really Max's idea or from
 whomever he got the idea from).

>>>
>>> +1 to what Max said then :)
>>>
>>>

 I believe a lot of the value of the expansion service is not having
 users need to be aware of all the SDK specific dependencies when they are
 trying to create a pipeline, only the "user" who is launching the expansion
 service may need to. And in that case we can have a prepackaged expansion
 service application that does what most users would want (e.g. expansion
 service as a docker container, a single bundled jar, ...). We (the Apache
 Beam community) could choose to host a default implementation of the
 expansion service as well.

>>>
>>> I'm not against this. But I think this is a secondary more advanced
>>> use-case. For a Beam users that needs to use a Java transform that they
>>> already have in a Python pipeline, we should provide a way to allow
>>> starting up a expansion service (with dependencies needed for that) and
>>> running a pipeline that uses this external Java transform (with
>>> dependencies that are needed at runtime). Probably, it'll be enough to
>>> allow providing all dependencies when starting up the expansion service and
>>> allow expansion service to do the staging of jars are well. I don't see a
>>> need to include the list of jars in the ExpansionResponse sent to the
>>> Python SDK.
>>>
>>>

 On Thu, Apr 18, 2019 at 2:02 PM Chamikara Jayalath <
 chamik...@google.com> wrote:

> I think there are two kind of dependencies we have to consider.
>
> (1) Dependencies that are needed to expand the transform.
>
> These have to be provided when we start the expansion service so that
> available external transforms are correctly registered with the expansion
> service.
>
> (2) Dependencies that are not needed at expansion but may be needed at
> runtime.
>
> I think in both cases, users have to provide these dependencies either
> when expansion service is started or when a pipeline is being executed.
>
> Max, I'm not sure why expansion service will need to provide

Re: Artifact staging in cross-language pipelines

2019-04-18 Thread Lukasz Cwik
I can understand the concern about credentials, the same access concern
will exist for several cross language transforms (mostly IOs) since some
will need access to credentials to read/write to an external service.

Are there any ideas on how credential propagation could work to these IOs?
Can we use these mechanisms for staging?

On Thu, Apr 18, 2019 at 3:47 PM Ankur Goenka  wrote:

> I agree that the Expansion service knows about the artifacts required for
> a cross language transform and having a prepackage folder/Zip for
> transforms based on language makes sense.
>
> One think to note here is that expansion service might not have the same
> access privilege as the pipeline author and hence might not be able to
> stage artifacts by itself.
> Keeping this in mind I am leaning towards making Expansion service provide
> all the required artifacts to the user and let the user stage the artifacts
> as regular artifacts.
> At this time, we only have Beam File System based artifact staging which
> users local credentials to access different file systems. Even a docker
> based expansion service running on local machine might not have the same
> access privileges.
>
> In brief this is what I am leaning toward.
> User call for pipeline submission -> Expansion service provide cross
> language transforms and relevant artifacts to the Sdk -> Sdk Submits the
> pipeline to Jobserver and Stages user and cross language artifacts to
> artifacts staging service
>
>
> On Thu, Apr 18, 2019 at 2:33 PM Chamikara Jayalath 
> wrote:
>
>>
>>
>> On Thu, Apr 18, 2019 at 2:12 PM Lukasz Cwik  wrote:
>>
>>> Note that Max did ask whether making the expansion service do the
>>> staging made sense, and my first line was agreeing with that direction and
>>> expanding on how it could be done (so this is really Max's idea or from
>>> whomever he got the idea from).
>>>
>>
>> +1 to what Max said then :)
>>
>>
>>>
>>> I believe a lot of the value of the expansion service is not having
>>> users need to be aware of all the SDK specific dependencies when they are
>>> trying to create a pipeline, only the "user" who is launching the expansion
>>> service may need to. And in that case we can have a prepackaged expansion
>>> service application that does what most users would want (e.g. expansion
>>> service as a docker container, a single bundled jar, ...). We (the Apache
>>> Beam community) could choose to host a default implementation of the
>>> expansion service as well.
>>>
>>
>> I'm not against this. But I think this is a secondary more advanced
>> use-case. For a Beam users that needs to use a Java transform that they
>> already have in a Python pipeline, we should provide a way to allow
>> starting up a expansion service (with dependencies needed for that) and
>> running a pipeline that uses this external Java transform (with
>> dependencies that are needed at runtime). Probably, it'll be enough to
>> allow providing all dependencies when starting up the expansion service and
>> allow expansion service to do the staging of jars are well. I don't see a
>> need to include the list of jars in the ExpansionResponse sent to the
>> Python SDK.
>>
>>
>>>
>>> On Thu, Apr 18, 2019 at 2:02 PM Chamikara Jayalath 
>>> wrote:
>>>
 I think there are two kind of dependencies we have to consider.

 (1) Dependencies that are needed to expand the transform.

 These have to be provided when we start the expansion service so that
 available external transforms are correctly registered with the expansion
 service.

 (2) Dependencies that are not needed at expansion but may be needed at
 runtime.

 I think in both cases, users have to provide these dependencies either
 when expansion service is started or when a pipeline is being executed.

 Max, I'm not sure why expansion service will need to provide
 dependencies to the user since user will already be aware of these. Are you
 talking about a expansion service that is readily available that will be
 used by many Beam users ? I think such a (possibly long running) service
 will have to maintain a repository of transforms and should have mechanism
 for registering new transforms and discovering already registered
 transforms etc. I think there's more design work needed to make transform
 expansion service support such use-cases. Currently, I think allowing
 pipeline author to provide the jars when starting the expansion service and
 when executing the pipeline will be adequate.

 Regarding the entity that will perform the staging, I like Luke's idea
 of allowing expansion service to do the staging (of jars provided by the
 user). Notion of artifacts and how they are extracted/represented is SDK
 dependent. So if the pipeline SDK tries to do this we have to add n x (n
 -1) configurations (for n SDKs).

 - Cham

 On Thu, Apr 18, 2019 at 11:45 AM Lukasz Cwik  wrote:

> 

Re: Artifact staging in cross-language pipelines

2019-04-18 Thread Ankur Goenka
I agree that the Expansion service knows about the artifacts required for a
cross language transform and having a prepackage folder/Zip for transforms
based on language makes sense.

One think to note here is that expansion service might not have the same
access privilege as the pipeline author and hence might not be able to
stage artifacts by itself.
Keeping this in mind I am leaning towards making Expansion service provide
all the required artifacts to the user and let the user stage the artifacts
as regular artifacts.
At this time, we only have Beam File System based artifact staging which
users local credentials to access different file systems. Even a docker
based expansion service running on local machine might not have the same
access privileges.

In brief this is what I am leaning toward.
User call for pipeline submission -> Expansion service provide cross
language transforms and relevant artifacts to the Sdk -> Sdk Submits the
pipeline to Jobserver and Stages user and cross language artifacts to
artifacts staging service


On Thu, Apr 18, 2019 at 2:33 PM Chamikara Jayalath 
wrote:

>
>
> On Thu, Apr 18, 2019 at 2:12 PM Lukasz Cwik  wrote:
>
>> Note that Max did ask whether making the expansion service do the staging
>> made sense, and my first line was agreeing with that direction and
>> expanding on how it could be done (so this is really Max's idea or from
>> whomever he got the idea from).
>>
>
> +1 to what Max said then :)
>
>
>>
>> I believe a lot of the value of the expansion service is not having users
>> need to be aware of all the SDK specific dependencies when they are trying
>> to create a pipeline, only the "user" who is launching the expansion
>> service may need to. And in that case we can have a prepackaged expansion
>> service application that does what most users would want (e.g. expansion
>> service as a docker container, a single bundled jar, ...). We (the Apache
>> Beam community) could choose to host a default implementation of the
>> expansion service as well.
>>
>
> I'm not against this. But I think this is a secondary more advanced
> use-case. For a Beam users that needs to use a Java transform that they
> already have in a Python pipeline, we should provide a way to allow
> starting up a expansion service (with dependencies needed for that) and
> running a pipeline that uses this external Java transform (with
> dependencies that are needed at runtime). Probably, it'll be enough to
> allow providing all dependencies when starting up the expansion service and
> allow expansion service to do the staging of jars are well. I don't see a
> need to include the list of jars in the ExpansionResponse sent to the
> Python SDK.
>
>
>>
>> On Thu, Apr 18, 2019 at 2:02 PM Chamikara Jayalath 
>> wrote:
>>
>>> I think there are two kind of dependencies we have to consider.
>>>
>>> (1) Dependencies that are needed to expand the transform.
>>>
>>> These have to be provided when we start the expansion service so that
>>> available external transforms are correctly registered with the expansion
>>> service.
>>>
>>> (2) Dependencies that are not needed at expansion but may be needed at
>>> runtime.
>>>
>>> I think in both cases, users have to provide these dependencies either
>>> when expansion service is started or when a pipeline is being executed.
>>>
>>> Max, I'm not sure why expansion service will need to provide
>>> dependencies to the user since user will already be aware of these. Are you
>>> talking about a expansion service that is readily available that will be
>>> used by many Beam users ? I think such a (possibly long running) service
>>> will have to maintain a repository of transforms and should have mechanism
>>> for registering new transforms and discovering already registered
>>> transforms etc. I think there's more design work needed to make transform
>>> expansion service support such use-cases. Currently, I think allowing
>>> pipeline author to provide the jars when starting the expansion service and
>>> when executing the pipeline will be adequate.
>>>
>>> Regarding the entity that will perform the staging, I like Luke's idea
>>> of allowing expansion service to do the staging (of jars provided by the
>>> user). Notion of artifacts and how they are extracted/represented is SDK
>>> dependent. So if the pipeline SDK tries to do this we have to add n x (n
>>> -1) configurations (for n SDKs).
>>>
>>> - Cham
>>>
>>> On Thu, Apr 18, 2019 at 11:45 AM Lukasz Cwik  wrote:
>>>
 We can expose the artifact staging endpoint and artifact token to allow
 the expansion service to upload any resources its environment may need. For
 example, the expansion service for the Beam Java SDK would be able to
 upload jars.

 In the "docker" environment, the Apache Beam Java SDK harness container
 would fetch the relevant artifacts for itself and be able to execute the
 pipeline. (Note that a docker environment could skip all this artifact
 staging if the docker 

Re: Artifact staging in cross-language pipelines

2019-04-18 Thread Chamikara Jayalath
On Thu, Apr 18, 2019 at 2:12 PM Lukasz Cwik  wrote:

> Note that Max did ask whether making the expansion service do the staging
> made sense, and my first line was agreeing with that direction and
> expanding on how it could be done (so this is really Max's idea or from
> whomever he got the idea from).
>

+1 to what Max said then :)


>
> I believe a lot of the value of the expansion service is not having users
> need to be aware of all the SDK specific dependencies when they are trying
> to create a pipeline, only the "user" who is launching the expansion
> service may need to. And in that case we can have a prepackaged expansion
> service application that does what most users would want (e.g. expansion
> service as a docker container, a single bundled jar, ...). We (the Apache
> Beam community) could choose to host a default implementation of the
> expansion service as well.
>

I'm not against this. But I think this is a secondary more advanced
use-case. For a Beam users that needs to use a Java transform that they
already have in a Python pipeline, we should provide a way to allow
starting up a expansion service (with dependencies needed for that) and
running a pipeline that uses this external Java transform (with
dependencies that are needed at runtime). Probably, it'll be enough to
allow providing all dependencies when starting up the expansion service and
allow expansion service to do the staging of jars are well. I don't see a
need to include the list of jars in the ExpansionResponse sent to the
Python SDK.


>
> On Thu, Apr 18, 2019 at 2:02 PM Chamikara Jayalath 
> wrote:
>
>> I think there are two kind of dependencies we have to consider.
>>
>> (1) Dependencies that are needed to expand the transform.
>>
>> These have to be provided when we start the expansion service so that
>> available external transforms are correctly registered with the expansion
>> service.
>>
>> (2) Dependencies that are not needed at expansion but may be needed at
>> runtime.
>>
>> I think in both cases, users have to provide these dependencies either
>> when expansion service is started or when a pipeline is being executed.
>>
>> Max, I'm not sure why expansion service will need to provide dependencies
>> to the user since user will already be aware of these. Are you talking
>> about a expansion service that is readily available that will be used by
>> many Beam users ? I think such a (possibly long running) service will have
>> to maintain a repository of transforms and should have mechanism for
>> registering new transforms and discovering already registered transforms
>> etc. I think there's more design work needed to make transform expansion
>> service support such use-cases. Currently, I think allowing pipeline author
>> to provide the jars when starting the expansion service and when executing
>> the pipeline will be adequate.
>>
>> Regarding the entity that will perform the staging, I like Luke's idea of
>> allowing expansion service to do the staging (of jars provided by the
>> user). Notion of artifacts and how they are extracted/represented is SDK
>> dependent. So if the pipeline SDK tries to do this we have to add n x (n
>> -1) configurations (for n SDKs).
>>
>> - Cham
>>
>> On Thu, Apr 18, 2019 at 11:45 AM Lukasz Cwik  wrote:
>>
>>> We can expose the artifact staging endpoint and artifact token to allow
>>> the expansion service to upload any resources its environment may need. For
>>> example, the expansion service for the Beam Java SDK would be able to
>>> upload jars.
>>>
>>> In the "docker" environment, the Apache Beam Java SDK harness container
>>> would fetch the relevant artifacts for itself and be able to execute the
>>> pipeline. (Note that a docker environment could skip all this artifact
>>> staging if the docker environment contained all necessary artifacts).
>>>
>>> For the existing "external" environment, it should already come with all
>>> the resources prepackaged wherever "external" points to. The "process"
>>> based environment could choose to use the artifact staging service to fetch
>>> those resources associated with its process or it could follow the same
>>> pattern that "external" would do and already contain all the prepackaged
>>> resources. Note that both "external" and "process" will require the
>>> instance of the expansion service to be specialized for those environments
>>> which is why the default should for the expansion service to be the
>>> "docker" environment.
>>>
>>> Note that a major reason for going with docker containers as the
>>> environment that all runners should support is that containers provides a
>>> solution for this exact issue. Both the "process" and "external"
>>> environments are explicitly limiting and expanding their capabilities will
>>> quickly have us building something like a docker container because we'll
>>> quickly find ourselves solving the same problems that docker containers
>>> provide (resources, file layout, permissions, ...)
>>>
>>>
>>>

Re: Artifact staging in cross-language pipelines

2019-04-18 Thread Lukasz Cwik
Note that Max did ask whether making the expansion service do the staging
made sense, and my first line was agreeing with that direction and
expanding on how it could be done (so this is really Max's idea or from
whomever he got the idea from).

I believe a lot of the value of the expansion service is not having users
need to be aware of all the SDK specific dependencies when they are trying
to create a pipeline, only the "user" who is launching the expansion
service may need to. And in that case we can have a prepackaged expansion
service application that does what most users would want (e.g. expansion
service as a docker container, a single bundled jar, ...). We (the Apache
Beam community) could choose to host a default implementation of the
expansion service as well.

On Thu, Apr 18, 2019 at 2:02 PM Chamikara Jayalath 
wrote:

> I think there are two kind of dependencies we have to consider.
>
> (1) Dependencies that are needed to expand the transform.
>
> These have to be provided when we start the expansion service so that
> available external transforms are correctly registered with the expansion
> service.
>
> (2) Dependencies that are not needed at expansion but may be needed at
> runtime.
>
> I think in both cases, users have to provide these dependencies either
> when expansion service is started or when a pipeline is being executed.
>
> Max, I'm not sure why expansion service will need to provide dependencies
> to the user since user will already be aware of these. Are you talking
> about a expansion service that is readily available that will be used by
> many Beam users ? I think such a (possibly long running) service will have
> to maintain a repository of transforms and should have mechanism for
> registering new transforms and discovering already registered transforms
> etc. I think there's more design work needed to make transform expansion
> service support such use-cases. Currently, I think allowing pipeline author
> to provide the jars when starting the expansion service and when executing
> the pipeline will be adequate.
>
> Regarding the entity that will perform the staging, I like Luke's idea of
> allowing expansion service to do the staging (of jars provided by the
> user). Notion of artifacts and how they are extracted/represented is SDK
> dependent. So if the pipeline SDK tries to do this we have to add n x (n
> -1) configurations (for n SDKs).
>
> - Cham
>
> On Thu, Apr 18, 2019 at 11:45 AM Lukasz Cwik  wrote:
>
>> We can expose the artifact staging endpoint and artifact token to allow
>> the expansion service to upload any resources its environment may need. For
>> example, the expansion service for the Beam Java SDK would be able to
>> upload jars.
>>
>> In the "docker" environment, the Apache Beam Java SDK harness container
>> would fetch the relevant artifacts for itself and be able to execute the
>> pipeline. (Note that a docker environment could skip all this artifact
>> staging if the docker environment contained all necessary artifacts).
>>
>> For the existing "external" environment, it should already come with all
>> the resources prepackaged wherever "external" points to. The "process"
>> based environment could choose to use the artifact staging service to fetch
>> those resources associated with its process or it could follow the same
>> pattern that "external" would do and already contain all the prepackaged
>> resources. Note that both "external" and "process" will require the
>> instance of the expansion service to be specialized for those environments
>> which is why the default should for the expansion service to be the
>> "docker" environment.
>>
>> Note that a major reason for going with docker containers as the
>> environment that all runners should support is that containers provides a
>> solution for this exact issue. Both the "process" and "external"
>> environments are explicitly limiting and expanding their capabilities will
>> quickly have us building something like a docker container because we'll
>> quickly find ourselves solving the same problems that docker containers
>> provide (resources, file layout, permissions, ...)
>>
>>
>>
>>
>> On Thu, Apr 18, 2019 at 11:21 AM Maximilian Michels 
>> wrote:
>>
>>> Hi everyone,
>>>
>>> We have previously merged support for configuring transforms across
>>> languages. Please see Cham's summary on the discussion [1]. There is
>>> also a design document [2].
>>>
>>> Subsequently, we've added wrappers for cross-language transforms to the
>>> Python SDK, i.e. GenerateSequence, ReadFromKafka, and there is a pending
>>> PR [1] for WriteToKafka. All of them utilize Java transforms via
>>> cross-language configuration.
>>>
>>> That is all pretty exciting :)
>>>
>>> We still have some issues to solve, one being how to stage artifact from
>>> a foreign environment. When we run external transforms which are part of
>>> Beam's core (e.g. GenerateSequence), we have them available in the SDK
>>> Harness. However, when they 

Re: Artifact staging in cross-language pipelines

2019-04-18 Thread Chamikara Jayalath
I think there are two kind of dependencies we have to consider.

(1) Dependencies that are needed to expand the transform.

These have to be provided when we start the expansion service so that
available external transforms are correctly registered with the expansion
service.

(2) Dependencies that are not needed at expansion but may be needed at
runtime.

I think in both cases, users have to provide these dependencies either when
expansion service is started or when a pipeline is being executed.

Max, I'm not sure why expansion service will need to provide dependencies
to the user since user will already be aware of these. Are you talking
about a expansion service that is readily available that will be used by
many Beam users ? I think such a (possibly long running) service will have
to maintain a repository of transforms and should have mechanism for
registering new transforms and discovering already registered transforms
etc. I think there's more design work needed to make transform expansion
service support such use-cases. Currently, I think allowing pipeline author
to provide the jars when starting the expansion service and when executing
the pipeline will be adequate.

Regarding the entity that will perform the staging, I like Luke's idea of
allowing expansion service to do the staging (of jars provided by the
user). Notion of artifacts and how they are extracted/represented is SDK
dependent. So if the pipeline SDK tries to do this we have to add n x (n
-1) configurations (for n SDKs).

- Cham

On Thu, Apr 18, 2019 at 11:45 AM Lukasz Cwik  wrote:

> We can expose the artifact staging endpoint and artifact token to allow
> the expansion service to upload any resources its environment may need. For
> example, the expansion service for the Beam Java SDK would be able to
> upload jars.
>
> In the "docker" environment, the Apache Beam Java SDK harness container
> would fetch the relevant artifacts for itself and be able to execute the
> pipeline. (Note that a docker environment could skip all this artifact
> staging if the docker environment contained all necessary artifacts).
>
> For the existing "external" environment, it should already come with all
> the resources prepackaged wherever "external" points to. The "process"
> based environment could choose to use the artifact staging service to fetch
> those resources associated with its process or it could follow the same
> pattern that "external" would do and already contain all the prepackaged
> resources. Note that both "external" and "process" will require the
> instance of the expansion service to be specialized for those environments
> which is why the default should for the expansion service to be the
> "docker" environment.
>
> Note that a major reason for going with docker containers as the
> environment that all runners should support is that containers provides a
> solution for this exact issue. Both the "process" and "external"
> environments are explicitly limiting and expanding their capabilities will
> quickly have us building something like a docker container because we'll
> quickly find ourselves solving the same problems that docker containers
> provide (resources, file layout, permissions, ...)
>
>
>
>
> On Thu, Apr 18, 2019 at 11:21 AM Maximilian Michels 
> wrote:
>
>> Hi everyone,
>>
>> We have previously merged support for configuring transforms across
>> languages. Please see Cham's summary on the discussion [1]. There is
>> also a design document [2].
>>
>> Subsequently, we've added wrappers for cross-language transforms to the
>> Python SDK, i.e. GenerateSequence, ReadFromKafka, and there is a pending
>> PR [1] for WriteToKafka. All of them utilize Java transforms via
>> cross-language configuration.
>>
>> That is all pretty exciting :)
>>
>> We still have some issues to solve, one being how to stage artifact from
>> a foreign environment. When we run external transforms which are part of
>> Beam's core (e.g. GenerateSequence), we have them available in the SDK
>> Harness. However, when they are not (e.g. KafkaIO) we need to stage the
>> necessary files.
>>
>> For my PR [3] I've naively added ":beam-sdks-java-io-kafka" to the SDK
>> Harness which caused dependency problems [4]. Those could be resolved
>> but the bigger question is how to stage artifacts for external
>> transforms programmatically?
>>
>> Heejong has solved this by adding a "--jar_package" option to the Python
>> SDK to stage Java files [5]. I think that is a better solution than
>> adding required Jars to the SDK Harness directly, but it is not very
>> convenient for users.
>>
>> I've discussed this today with Thomas and we both figured that the
>> expansion service needs to provide a list of required Jars with the
>> ExpansionResponse it provides. It's not entirely clear, how we determine
>> which artifacts are necessary for an external transform. We could just
>> dump the entire classpath like we do in PipelineResources for Java
>> pipelines. This provides many 

Hazelcast Jet Runner

2019-04-18 Thread Jozsef Bartok
Hi. We at Hazelcast Jet have been working for a while now to implement a
Java Beam Runner (non-portable) based on Hazelcast Jet (
https://jet.hazelcast.org/). The process is still ongoing (
https://github.com/hazelcast/hazelcast-jet-beam-runner), but we are aiming
for a fully functional, reliable Runner which can proudly join the
Capability Matrix. For that purpose I would like to ask what’s your process
of validating runners? We are already running the @ValidatesRunner tests
and the Nexmark test suite, but beyond that what other steps do we need to
take to get our Runner to the level it needs to be at?


Re: Artifact staging in cross-language pipelines

2019-04-18 Thread Lukasz Cwik
We can expose the artifact staging endpoint and artifact token to allow the
expansion service to upload any resources its environment may need. For
example, the expansion service for the Beam Java SDK would be able to
upload jars.

In the "docker" environment, the Apache Beam Java SDK harness container
would fetch the relevant artifacts for itself and be able to execute the
pipeline. (Note that a docker environment could skip all this artifact
staging if the docker environment contained all necessary artifacts).

For the existing "external" environment, it should already come with all
the resources prepackaged wherever "external" points to. The "process"
based environment could choose to use the artifact staging service to fetch
those resources associated with its process or it could follow the same
pattern that "external" would do and already contain all the prepackaged
resources. Note that both "external" and "process" will require the
instance of the expansion service to be specialized for those environments
which is why the default should for the expansion service to be the
"docker" environment.

Note that a major reason for going with docker containers as the
environment that all runners should support is that containers provides a
solution for this exact issue. Both the "process" and "external"
environments are explicitly limiting and expanding their capabilities will
quickly have us building something like a docker container because we'll
quickly find ourselves solving the same problems that docker containers
provide (resources, file layout, permissions, ...)




On Thu, Apr 18, 2019 at 11:21 AM Maximilian Michels  wrote:

> Hi everyone,
>
> We have previously merged support for configuring transforms across
> languages. Please see Cham's summary on the discussion [1]. There is
> also a design document [2].
>
> Subsequently, we've added wrappers for cross-language transforms to the
> Python SDK, i.e. GenerateSequence, ReadFromKafka, and there is a pending
> PR [1] for WriteToKafka. All of them utilize Java transforms via
> cross-language configuration.
>
> That is all pretty exciting :)
>
> We still have some issues to solve, one being how to stage artifact from
> a foreign environment. When we run external transforms which are part of
> Beam's core (e.g. GenerateSequence), we have them available in the SDK
> Harness. However, when they are not (e.g. KafkaIO) we need to stage the
> necessary files.
>
> For my PR [3] I've naively added ":beam-sdks-java-io-kafka" to the SDK
> Harness which caused dependency problems [4]. Those could be resolved
> but the bigger question is how to stage artifacts for external
> transforms programmatically?
>
> Heejong has solved this by adding a "--jar_package" option to the Python
> SDK to stage Java files [5]. I think that is a better solution than
> adding required Jars to the SDK Harness directly, but it is not very
> convenient for users.
>
> I've discussed this today with Thomas and we both figured that the
> expansion service needs to provide a list of required Jars with the
> ExpansionResponse it provides. It's not entirely clear, how we determine
> which artifacts are necessary for an external transform. We could just
> dump the entire classpath like we do in PipelineResources for Java
> pipelines. This provides many unneeded classes but would work.
>
> Do you think it makes sense for the expansion service to provide the
> artifacts? Perhaps you have a better idea how to resolve the staging
> problem in cross-language pipelines?
>
> Thanks,
> Max
>
> [1]
>
> https://lists.apache.org/thread.html/b99ba8527422e31ec7bb7ad9dc3a6583551ea392ebdc5527b5fb4a67@%3Cdev.beam.apache.org%3E
>
> [2] https://s.apache.org/beam-cross-language-io
>
> [3] https://github.com/apache/beam/pull/8322#discussion_r276336748
>
> [4] Dependency graph for beam-runners-direct-java:
>
> beam-runners-direct-java -> sdks-java-harness -> beam-sdks-java-io-kafka
> -> beam-runners-direct-java ... the cycle continues
>
> Beam-runners-direct-java depends on sdks-java-harness due
> to the infamous Universal Local Runner. Beam-sdks-java-io-kafka depends
> on beam-runners-direct-java for running tests.
>
> [5] https://github.com/apache/beam/pull/8340
>


Artifact staging in cross-language pipelines

2019-04-18 Thread Maximilian Michels

Hi everyone,

We have previously merged support for configuring transforms across 
languages. Please see Cham's summary on the discussion [1]. There is 
also a design document [2].


Subsequently, we've added wrappers for cross-language transforms to the 
Python SDK, i.e. GenerateSequence, ReadFromKafka, and there is a pending 
PR [1] for WriteToKafka. All of them utilize Java transforms via 
cross-language configuration.


That is all pretty exciting :)

We still have some issues to solve, one being how to stage artifact from 
a foreign environment. When we run external transforms which are part of 
Beam's core (e.g. GenerateSequence), we have them available in the SDK 
Harness. However, when they are not (e.g. KafkaIO) we need to stage the 
necessary files.


For my PR [3] I've naively added ":beam-sdks-java-io-kafka" to the SDK 
Harness which caused dependency problems [4]. Those could be resolved 
but the bigger question is how to stage artifacts for external 
transforms programmatically?


Heejong has solved this by adding a "--jar_package" option to the Python 
SDK to stage Java files [5]. I think that is a better solution than 
adding required Jars to the SDK Harness directly, but it is not very 
convenient for users.


I've discussed this today with Thomas and we both figured that the 
expansion service needs to provide a list of required Jars with the 
ExpansionResponse it provides. It's not entirely clear, how we determine 
which artifacts are necessary for an external transform. We could just 
dump the entire classpath like we do in PipelineResources for Java 
pipelines. This provides many unneeded classes but would work.


Do you think it makes sense for the expansion service to provide the 
artifacts? Perhaps you have a better idea how to resolve the staging 
problem in cross-language pipelines?


Thanks,
Max

[1] 
https://lists.apache.org/thread.html/b99ba8527422e31ec7bb7ad9dc3a6583551ea392ebdc5527b5fb4a67@%3Cdev.beam.apache.org%3E


[2] https://s.apache.org/beam-cross-language-io

[3] https://github.com/apache/beam/pull/8322#discussion_r276336748

[4] Dependency graph for beam-runners-direct-java:

beam-runners-direct-java -> sdks-java-harness -> beam-sdks-java-io-kafka 
-> beam-runners-direct-java ... the cycle continues


Beam-runners-direct-java depends on sdks-java-harness due
to the infamous Universal Local Runner. Beam-sdks-java-io-kafka depends 
on beam-runners-direct-java for running tests.


[5] https://github.com/apache/beam/pull/8340


Re: SNAPSHOTS have not been updated since february

2019-04-18 Thread Yifan Zou
The origin build nodes were updated in Jan 24 and the nexus credentials
were removed from the filesystem because they are not supposed to be on
external build nodes (nodes Infra does not own). We now need to set up the
role account on the new Beam JNLP nodes. I am still contacting Infra to
bring the snapshot back.

Yifan

On Thu, Apr 18, 2019 at 10:09 AM Lukasz Cwik  wrote:

> The permissions issue is that the credentials needed to publish to the
> maven repository are only deployed on machines managed by Apache Infra. Now
> that the machines have been given back to each project to manage Yifan was
> investigating some other way to get the permissions on to the machine.
>
> On Thu, Apr 18, 2019 at 10:06 AM Boyuan Zhang  wrote:
>
>> There is a test target
>> https://builds.apache.org/job/beam_Release_NightlySnapshot/ in beam,
>> which builds and pushes snapshot to maven every day. Current failure is
>> like, the jenkin machine cannot publish artifacts into maven owing to some
>> weird permission issue. I think +Yifan Zou   is
>> working on it actively.
>>
>> On Thu, Apr 18, 2019 at 9:44 AM Ismaël Mejía  wrote:
>>
>>> And is there a way we can detect SNAPSHOTS not been published daily in
>>> the future?
>>>
>>> On Thu, Apr 18, 2019 at 6:37 PM Ismaël Mejía  wrote:
>>> >
>>> > Any progress on this?
>>> >
>>> > On Wed, Mar 27, 2019 at 5:38 AM Daniel Oliveira <
>>> danolive...@google.com> wrote:
>>> > >
>>> > > I made a bug for this specific issue (artifacts not publishing to
>>> the Apache Maven repo): https://issues.apache.org/jira/browse/BEAM-6919
>>> > >
>>> > > While I was gathering info for the bug report I also noticed +Yifan
>>> Zou has an experimental PR testing a fix:
>>> https://github.com/apache/beam/pull/8148
>>> > >
>>> > > On Tue, Mar 26, 2019 at 11:42 AM Boyuan Zhang 
>>> wrote:
>>> > >>
>>> > >> +Daniel Oliveira
>>> > >>
>>> > >> On Tue, Mar 26, 2019 at 9:57 AM Boyuan Zhang 
>>> wrote:
>>> > >>>
>>> > >>> Sorry for the typo. Ideally, the snapshot publish is independent
>>> from postrelease_snapshot.
>>> > >>>
>>> > >>> On Tue, Mar 26, 2019 at 9:55 AM Boyuan Zhang 
>>> wrote:
>>> > 
>>> >  Hey,
>>> > 
>>> >  I'm trying to publish the artifacts by commenting "Run Gradle
>>> Publish" in my PR, but there are several errors saying "cannot write
>>> artifacts into dir", anyone has idea on it? Ideally, the snapshot publish
>>> is dependent from postrelease_snapshot. The publish task is to build and
>>> publish artifacts and the postrelease_snapshot is to verify whether the
>>> snapshot works.
>>> > 
>>> >  On Tue, Mar 26, 2019 at 8:45 AM Ahmet Altay 
>>> wrote:
>>> > >
>>> > > I believe this is related to
>>> https://issues.apache.org/jira/browse/BEAM-6840 and +Boyuan Zhang has a
>>> fix in progress https://github.com/apache/beam/pull/8132
>>> > >
>>> > > On Tue, Mar 26, 2019 at 7:09 AM Ismaël Mejía 
>>> wrote:
>>> > >>
>>> > >> I was trying to validate a fix on the Spark runner and realized
>>> that
>>> > >> Beam SNAPSHOTS have not been updated since February 24 !
>>> > >>
>>> > >>
>>> https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-core/2.12.0-SNAPSHOT/
>>> > >>
>>> > >> Can somebody please take a look at why this is not been updated?
>>> > >>
>>> > >> Thanks,
>>> > >> Ismaël
>>>
>>


Re: SNAPSHOTS have not been updated since february

2019-04-18 Thread Lukasz Cwik
The permissions issue is that the credentials needed to publish to the
maven repository are only deployed on machines managed by Apache Infra. Now
that the machines have been given back to each project to manage Yifan was
investigating some other way to get the permissions on to the machine.

On Thu, Apr 18, 2019 at 10:06 AM Boyuan Zhang  wrote:

> There is a test target
> https://builds.apache.org/job/beam_Release_NightlySnapshot/ in beam,
> which builds and pushes snapshot to maven every day. Current failure is
> like, the jenkin machine cannot publish artifacts into maven owing to some
> weird permission issue. I think +Yifan Zou   is
> working on it actively.
>
> On Thu, Apr 18, 2019 at 9:44 AM Ismaël Mejía  wrote:
>
>> And is there a way we can detect SNAPSHOTS not been published daily in
>> the future?
>>
>> On Thu, Apr 18, 2019 at 6:37 PM Ismaël Mejía  wrote:
>> >
>> > Any progress on this?
>> >
>> > On Wed, Mar 27, 2019 at 5:38 AM Daniel Oliveira 
>> wrote:
>> > >
>> > > I made a bug for this specific issue (artifacts not publishing to the
>> Apache Maven repo): https://issues.apache.org/jira/browse/BEAM-6919
>> > >
>> > > While I was gathering info for the bug report I also noticed +Yifan
>> Zou has an experimental PR testing a fix:
>> https://github.com/apache/beam/pull/8148
>> > >
>> > > On Tue, Mar 26, 2019 at 11:42 AM Boyuan Zhang 
>> wrote:
>> > >>
>> > >> +Daniel Oliveira
>> > >>
>> > >> On Tue, Mar 26, 2019 at 9:57 AM Boyuan Zhang 
>> wrote:
>> > >>>
>> > >>> Sorry for the typo. Ideally, the snapshot publish is independent
>> from postrelease_snapshot.
>> > >>>
>> > >>> On Tue, Mar 26, 2019 at 9:55 AM Boyuan Zhang 
>> wrote:
>> > 
>> >  Hey,
>> > 
>> >  I'm trying to publish the artifacts by commenting "Run Gradle
>> Publish" in my PR, but there are several errors saying "cannot write
>> artifacts into dir", anyone has idea on it? Ideally, the snapshot publish
>> is dependent from postrelease_snapshot. The publish task is to build and
>> publish artifacts and the postrelease_snapshot is to verify whether the
>> snapshot works.
>> > 
>> >  On Tue, Mar 26, 2019 at 8:45 AM Ahmet Altay 
>> wrote:
>> > >
>> > > I believe this is related to
>> https://issues.apache.org/jira/browse/BEAM-6840 and +Boyuan Zhang has a
>> fix in progress https://github.com/apache/beam/pull/8132
>> > >
>> > > On Tue, Mar 26, 2019 at 7:09 AM Ismaël Mejía 
>> wrote:
>> > >>
>> > >> I was trying to validate a fix on the Spark runner and realized
>> that
>> > >> Beam SNAPSHOTS have not been updated since February 24 !
>> > >>
>> > >>
>> https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-core/2.12.0-SNAPSHOT/
>> > >>
>> > >> Can somebody please take a look at why this is not been updated?
>> > >>
>> > >> Thanks,
>> > >> Ismaël
>>
>


Re: SNAPSHOTS have not been updated since february

2019-04-18 Thread Boyuan Zhang
There is a test target
https://builds.apache.org/job/beam_Release_NightlySnapshot/ in beam, which
builds and pushes snapshot to maven every day. Current failure is like, the
jenkin machine cannot publish artifacts into maven owing to some
weird permission issue. I think +Yifan Zou   is
working on it actively.

On Thu, Apr 18, 2019 at 9:44 AM Ismaël Mejía  wrote:

> And is there a way we can detect SNAPSHOTS not been published daily in
> the future?
>
> On Thu, Apr 18, 2019 at 6:37 PM Ismaël Mejía  wrote:
> >
> > Any progress on this?
> >
> > On Wed, Mar 27, 2019 at 5:38 AM Daniel Oliveira 
> wrote:
> > >
> > > I made a bug for this specific issue (artifacts not publishing to the
> Apache Maven repo): https://issues.apache.org/jira/browse/BEAM-6919
> > >
> > > While I was gathering info for the bug report I also noticed +Yifan
> Zou has an experimental PR testing a fix:
> https://github.com/apache/beam/pull/8148
> > >
> > > On Tue, Mar 26, 2019 at 11:42 AM Boyuan Zhang 
> wrote:
> > >>
> > >> +Daniel Oliveira
> > >>
> > >> On Tue, Mar 26, 2019 at 9:57 AM Boyuan Zhang 
> wrote:
> > >>>
> > >>> Sorry for the typo. Ideally, the snapshot publish is independent
> from postrelease_snapshot.
> > >>>
> > >>> On Tue, Mar 26, 2019 at 9:55 AM Boyuan Zhang 
> wrote:
> > 
> >  Hey,
> > 
> >  I'm trying to publish the artifacts by commenting "Run Gradle
> Publish" in my PR, but there are several errors saying "cannot write
> artifacts into dir", anyone has idea on it? Ideally, the snapshot publish
> is dependent from postrelease_snapshot. The publish task is to build and
> publish artifacts and the postrelease_snapshot is to verify whether the
> snapshot works.
> > 
> >  On Tue, Mar 26, 2019 at 8:45 AM Ahmet Altay 
> wrote:
> > >
> > > I believe this is related to
> https://issues.apache.org/jira/browse/BEAM-6840 and +Boyuan Zhang has a
> fix in progress https://github.com/apache/beam/pull/8132
> > >
> > > On Tue, Mar 26, 2019 at 7:09 AM Ismaël Mejía 
> wrote:
> > >>
> > >> I was trying to validate a fix on the Spark runner and realized
> that
> > >> Beam SNAPSHOTS have not been updated since February 24 !
> > >>
> > >>
> https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-core/2.12.0-SNAPSHOT/
> > >>
> > >> Can somebody please take a look at why this is not been updated?
> > >>
> > >> Thanks,
> > >> Ismaël
>


Re: SNAPSHOTS have not been updated since february

2019-04-18 Thread Ismaël Mejía
And is there a way we can detect SNAPSHOTS not been published daily in
the future?

On Thu, Apr 18, 2019 at 6:37 PM Ismaël Mejía  wrote:
>
> Any progress on this?
>
> On Wed, Mar 27, 2019 at 5:38 AM Daniel Oliveira  
> wrote:
> >
> > I made a bug for this specific issue (artifacts not publishing to the 
> > Apache Maven repo): https://issues.apache.org/jira/browse/BEAM-6919
> >
> > While I was gathering info for the bug report I also noticed +Yifan Zou has 
> > an experimental PR testing a fix: https://github.com/apache/beam/pull/8148
> >
> > On Tue, Mar 26, 2019 at 11:42 AM Boyuan Zhang  wrote:
> >>
> >> +Daniel Oliveira
> >>
> >> On Tue, Mar 26, 2019 at 9:57 AM Boyuan Zhang  wrote:
> >>>
> >>> Sorry for the typo. Ideally, the snapshot publish is independent from 
> >>> postrelease_snapshot.
> >>>
> >>> On Tue, Mar 26, 2019 at 9:55 AM Boyuan Zhang  wrote:
> 
>  Hey,
> 
>  I'm trying to publish the artifacts by commenting "Run Gradle Publish" 
>  in my PR, but there are several errors saying "cannot write artifacts 
>  into dir", anyone has idea on it? Ideally, the snapshot publish is 
>  dependent from postrelease_snapshot. The publish task is to build and 
>  publish artifacts and the postrelease_snapshot is to verify whether the 
>  snapshot works.
> 
>  On Tue, Mar 26, 2019 at 8:45 AM Ahmet Altay  wrote:
> >
> > I believe this is related to 
> > https://issues.apache.org/jira/browse/BEAM-6840 and +Boyuan Zhang has a 
> > fix in progress https://github.com/apache/beam/pull/8132
> >
> > On Tue, Mar 26, 2019 at 7:09 AM Ismaël Mejía  wrote:
> >>
> >> I was trying to validate a fix on the Spark runner and realized that
> >> Beam SNAPSHOTS have not been updated since February 24 !
> >>
> >> https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-core/2.12.0-SNAPSHOT/
> >>
> >> Can somebody please take a look at why this is not been updated?
> >>
> >> Thanks,
> >> Ismaël


Re: SNAPSHOTS have not been updated since february

2019-04-18 Thread Ismaël Mejía
Any progress on this?

On Wed, Mar 27, 2019 at 5:38 AM Daniel Oliveira  wrote:
>
> I made a bug for this specific issue (artifacts not publishing to the Apache 
> Maven repo): https://issues.apache.org/jira/browse/BEAM-6919
>
> While I was gathering info for the bug report I also noticed +Yifan Zou has 
> an experimental PR testing a fix: https://github.com/apache/beam/pull/8148
>
> On Tue, Mar 26, 2019 at 11:42 AM Boyuan Zhang  wrote:
>>
>> +Daniel Oliveira
>>
>> On Tue, Mar 26, 2019 at 9:57 AM Boyuan Zhang  wrote:
>>>
>>> Sorry for the typo. Ideally, the snapshot publish is independent from 
>>> postrelease_snapshot.
>>>
>>> On Tue, Mar 26, 2019 at 9:55 AM Boyuan Zhang  wrote:

 Hey,

 I'm trying to publish the artifacts by commenting "Run Gradle Publish" in 
 my PR, but there are several errors saying "cannot write artifacts into 
 dir", anyone has idea on it? Ideally, the snapshot publish is dependent 
 from postrelease_snapshot. The publish task is to build and publish 
 artifacts and the postrelease_snapshot is to verify whether the snapshot 
 works.

 On Tue, Mar 26, 2019 at 8:45 AM Ahmet Altay  wrote:
>
> I believe this is related to 
> https://issues.apache.org/jira/browse/BEAM-6840 and +Boyuan Zhang has a 
> fix in progress https://github.com/apache/beam/pull/8132
>
> On Tue, Mar 26, 2019 at 7:09 AM Ismaël Mejía  wrote:
>>
>> I was trying to validate a fix on the Spark runner and realized that
>> Beam SNAPSHOTS have not been updated since February 24 !
>>
>> https://repository.apache.org/content/repositories/snapshots/org/apache/beam/beam-sdks-java-core/2.12.0-SNAPSHOT/
>>
>> Can somebody please take a look at why this is not been updated?
>>
>> Thanks,
>> Ismaël


Re: CassandraIO breakage

2019-04-18 Thread Reuven Lax
Interesting. Let me try a full rebuild, maybe some dependency is not
getting rebuilt.  I was getting errors like this:

/Users/relax/beam/sdks/java/io/cassandra/src/main/java/org/apache/beam/sdk/io/cassandra/CassandraServiceImpl.java:74:
error: incompatible types: ValueProvider> cannot be converted
to List

  source.spec.hosts(),

On Thu, Apr 18, 2019 at 8:36 AM Jean-Baptiste Onofré 
wrote:

> It builds fine on my machine.
>
> Let me check on Jenkins.
>
> Regards
> JB
>
> On 17/04/2019 21:48, Reuven Lax wrote:
> > Did something break with CassandraIO? It no longer seems to compile.
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: CassandraIO breakage

2019-04-18 Thread Jean-Baptiste Onofré
It builds fine on my machine.

Let me check on Jenkins.

Regards
JB

On 17/04/2019 21:48, Reuven Lax wrote:
> Did something break with CassandraIO? It no longer seems to compile.

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: CassandraIO breakage

2019-04-18 Thread Jean-Baptiste Onofré
Let me check if it works on my machine.

Regards
JB

On 17/04/2019 21:48, Reuven Lax wrote:
> Did something break with CassandraIO? It no longer seems to compile.

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Go SDK status

2019-04-18 Thread Thomas Weise
Hi Robert,

Thanks a bunch for providing this comprehensive update. This is exactly the
kind of perspective I was looking for, even when overall it means that for
potential users of the Go SDK it is even sooner than what I might have
hoped for.

For more context, my interest was primarily on the streaming side. From the
list of missing features you listed, State + Timers + Triggers would
probably be highest priority. Unfortunately I won't be able to contribute
to the Go SDK anytime soon, so this is mostly fyi in case anyone else does.

On improving the IOs, I think it would make a lot of sense to focus on the
cross-language route. There has been some work lately to make existing Beam
Java IOs available on the Flink runner (Max would be able to share more
details on that).

Thanks!
Thomas


On Wed, Apr 17, 2019 at 9:56 PM Robert Burke  wrote:

> Oh dang. Thanks for mentioning that! Here's an open copy of the versioning
> thoughts doc, though there shouldn't be any surprises from the points I
> mentioned above.
>
>
> https://docs.google.com/document/d/1ZjP30zNLWTu_WzkWbgY8F_ZXlA_OWAobAD9PuohJxPg/edit#heading=h.drpipq762xi7
>
> On Wed, 17 Apr 2019 at 21:20, Nathan Fisher 
> wrote:
>
>> Hi Robert,
>>
>> Great summary on the current state of play. FYI the referenced G doc
>> doesn't appear to people outside the org as a default.
>>
>> Great to hear the Go SDK is still getting love. I last looked at in
>> September-October of last year.
>>
>> Cheers,
>> Nathan
>>
>> On Wed, 17 Apr 2019 at 20:27, Lukasz Cwik  wrote:
>>
>>> Thanks for the indepth summary.
>>>
>>> On Mon, Apr 15, 2019 at 4:19 PM Robert Burke  wrote:
>>>
 Hi Thomas! I'm so glad you asked!

 The status of the Go SDK is complicated, so this email can't be brief.
 There's are several dimensions to consider: as a Go Open Source Project,
 User Libraries and Experience, and on Beam Features.

 I'm going to be updating the roadmap later this month when I have a
 spare moment.

 *tl;dr;*
 I would *love* help in improving the Go SDK, especially around
 interactions with Java/Python/Flink. Java and I do not have a good working
 relationship for operational purposes, and the last time I used Python, I
 had to re-image my machine. There's lots to do, but shouting out tasks to
 the void is rarely as productive as it is cathartic. If there's an offer to
 help, and a preference for/experience with  something to work on, I'm
 willing to find something useful to get started on for you.

 (Note: The following are simply my opinion as someone who works with
 the project weekly as a Go programmer, and should not be treated as demands
 or gospel. I just don't have anyone to talk about Go SDK issues with, and
 my previous discussions, have largely seemed to fall on uninterested ears.)

 *The SDK can be considered Alpha when all of the following are true:*
 * The SDK is tested by the Beam project on a ULR and on Flink as well
 as Dataflow.
 * The IOs have received some love to ensure they can scale (either
 through SDF or reshuffles), and be portable to different environments (eg.
 using the Go Cloud Development Kit (CDK) libraries).
* Cross-Language IO support would also be acceptable.
 * The SDK is using Go Modules for dependency management, marking it as
 version 0.Minor (where Minor should probably track the mainline Beam minor
 version for now).

 *We can move to calling it Beta when all of the following are true:*
 * The all implemented Beam features are meaningfully tested on the
 portable runners (eg. a proper "Validates Runner" suite exists in Go)
 * The SDK is properly documented on the Beam site, and in it's Go Docs.

 After this, I'll be more comfortable recommending it as something folks
 can use for production.
 That said, there are happy paths that are useable today in batch
 situations.

 *Intro*
 The Go SDK is a purely Beam Portable SDK. If it runs on a distributed
 system at all, it's being run portably. Currently it's regularly tested on
 Google Cloud Dataflow (though Dataflow doesn't officially support the SDK
 at this time), and on it's own single bundle Direct Runner (intended for
 unit testing purposes). In addition, it's being tested at scale within
 Google, on an internal runner, where it presently satisfies our performance
 benchmarks, and correctness tests.

 I've been working on cases to make the SDK suitable for data processing
 within Google. This unfortunately makes my contributions more towards
 general SDK usability, documentation, and performance, rather than "making
 it usable outside Google". Note this also precludes necessary work to
 resolve issues with running Go SDK pipelines on Google Cloud Dataflow. I
 believe that the SDK must become a good member of the Go ecosystem, the
 Beam ecosystem.

Re: CassandraIO breakage

2019-04-18 Thread Alexey Romanenko
How it fails? No issue for me with local build against master.

> On 17 Apr 2019, at 21:48, Reuven Lax  wrote:
> 
> Did something break with CassandraIO? It no longer seems to compile.