Re: Go SDK: How are re-starts handled?

2018-06-27 Thread Eduardo Morales
BEAM-4665 created.


On Wed, Jun 27, 2018 at 4:32 PM Ismaël Mejía  wrote:

> Eduardo can you please create a JIra on the Go SDK to track this issue.
> Thanks.
>
> On Mon, Jun 25, 2018 at 10:22 PM Lukasz Cwik  wrote:
>
>> Ah, sorry for the confusion. The SDK is meant to handle that for you as I
>> described. You'll want to use the fact that the 409 was returned until that
>> is implemented within the Go SDK.
>>
>> On Mon, Jun 25, 2018 at 1:13 PM eduardo.mora...@gmail.com <
>> eduardo.mora...@gmail.com> wrote:
>>
>>> Nope. It returns an error.
>>>
>>> 2018/06/25 20:10:46 Failed to execute job: googleapi: Error 409:
>>> (acbae89877e14d87): The workflow could not be created. Causes:
>>> (590297f494c27357): There is already an active job named xxx-yyy_zzz. If
>>> you want to  submit a second job, try again by setting a different name.,
>>> alreadyExists
>>>
>>> On 2018/06/25 16:18:55, Lukasz Cwik  wrote:
>>> > It should be that beamx.Run won't return an error if a job already
>>> exists
>>> > with the same job name.
>>> >
>>> > On Mon, Jun 25, 2018 at 9:15 AM eduardo.mora...@gmail.com <
>>> > eduardo.mora...@gmail.com> wrote:
>>> >
>>> > > I am sorry. I am not expressing myself correctly. Let me do it
>>> though code:
>>> > >
>>> > > if err := beamx.Run(ctx, pipe); err != nil {
>>> > >   // How do I know 'err' is the result of a pipeline already
>>> running, as
>>> > > opposed to some
>>> > >   // other problem that may need  special attention.
>>> > > }
>>> > >
>>> > > On 2018/06/22 23:10:13, Lukasz Cwik  wrote:
>>> > > > The job name is a user chosen value[1]. If you don't specify
>>> something, a
>>> > > > job name is generated for you automatically[2].
>>> > > >
>>> > > > 1:
>>> > > >
>>> > >
>>> https://github.com/apache/beam/blob/c1927cd339c57125e29a651e614fb5105abf6d33/sdks/go/pkg/beam/options/jobopts/options.go#L38
>>> > > > 2:
>>> > > >
>>> > >
>>> https://github.com/apache/beam/blob/c1927cd339c57125e29a651e614fb5105abf6d33/sdks/go/pkg/beam/options/jobopts/options.go#L71
>>> > > >
>>> > > > On Fri, Jun 22, 2018 at 3:28 PM eduardo.mora...@gmail.com <
>>> > > > eduardo.mora...@gmail.com> wrote:
>>> > > >
>>> > > > >
>>> > > > >
>>> > > > > On 2018/06/22 21:35:29, Lukasz Cwik  wrote:
>>> > > > > > There can only be one pipeline in Dataflow with the same job
>>> name so
>>> > > if
>>> > > > > you
>>> > > > > > attempt to submit another job with the same job name you'll
>>> get back
>>> > > an
>>> > > > > > identifier for the currently executing pipeline.
>>> > > > >
>>> > > > > But beam.Run() only returns an error. How do I get the job name
>>> back?
>>> > > > > My guess is that I have to use a different API (
>>> > > > > https://godoc.org/google.golang.org/api/dataflow/v1b3). Is that
>>> the
>>> > > > > correct way to detect job name collisions?
>>> > > > >
>>> > > > > Thanks again.
>>> > > > >
>>> > > > > > On Fri, Jun 22, 2018 at 2:27 PM eduardo.mora...@gmail.com <
>>> > > > > > eduardo.mora...@gmail.com> wrote:
>>> > > > > >
>>> > > > > > > If I have a k8s process launching dataflow pipelines, what
>>> happens
>>> > > when
>>> > > > > > > the process is restarted? Can Apache Beam detect a running
>>> > > pipeline and
>>> > > > > > > join accordingly? or will the pipeline be duplicated?
>>> > > > > > >
>>> > > > > > > Thanks in advance.
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>>
>>


Re: Go SDK: How are re-starts handled?

2018-06-27 Thread Ismaël Mejía
Eduardo can you please create a JIra on the Go SDK to track this issue.
Thanks.

On Mon, Jun 25, 2018 at 10:22 PM Lukasz Cwik  wrote:

> Ah, sorry for the confusion. The SDK is meant to handle that for you as I
> described. You'll want to use the fact that the 409 was returned until that
> is implemented within the Go SDK.
>
> On Mon, Jun 25, 2018 at 1:13 PM eduardo.mora...@gmail.com <
> eduardo.mora...@gmail.com> wrote:
>
>> Nope. It returns an error.
>>
>> 2018/06/25 20:10:46 Failed to execute job: googleapi: Error 409:
>> (acbae89877e14d87): The workflow could not be created. Causes:
>> (590297f494c27357): There is already an active job named xxx-yyy_zzz. If
>> you want to  submit a second job, try again by setting a different name.,
>> alreadyExists
>>
>> On 2018/06/25 16:18:55, Lukasz Cwik  wrote:
>> > It should be that beamx.Run won't return an error if a job already
>> exists
>> > with the same job name.
>> >
>> > On Mon, Jun 25, 2018 at 9:15 AM eduardo.mora...@gmail.com <
>> > eduardo.mora...@gmail.com> wrote:
>> >
>> > > I am sorry. I am not expressing myself correctly. Let me do it though
>> code:
>> > >
>> > > if err := beamx.Run(ctx, pipe); err != nil {
>> > >   // How do I know 'err' is the result of a pipeline already running,
>> as
>> > > opposed to some
>> > >   // other problem that may need  special attention.
>> > > }
>> > >
>> > > On 2018/06/22 23:10:13, Lukasz Cwik  wrote:
>> > > > The job name is a user chosen value[1]. If you don't specify
>> something, a
>> > > > job name is generated for you automatically[2].
>> > > >
>> > > > 1:
>> > > >
>> > >
>> https://github.com/apache/beam/blob/c1927cd339c57125e29a651e614fb5105abf6d33/sdks/go/pkg/beam/options/jobopts/options.go#L38
>> > > > 2:
>> > > >
>> > >
>> https://github.com/apache/beam/blob/c1927cd339c57125e29a651e614fb5105abf6d33/sdks/go/pkg/beam/options/jobopts/options.go#L71
>> > > >
>> > > > On Fri, Jun 22, 2018 at 3:28 PM eduardo.mora...@gmail.com <
>> > > > eduardo.mora...@gmail.com> wrote:
>> > > >
>> > > > >
>> > > > >
>> > > > > On 2018/06/22 21:35:29, Lukasz Cwik  wrote:
>> > > > > > There can only be one pipeline in Dataflow with the same job
>> name so
>> > > if
>> > > > > you
>> > > > > > attempt to submit another job with the same job name you'll get
>> back
>> > > an
>> > > > > > identifier for the currently executing pipeline.
>> > > > >
>> > > > > But beam.Run() only returns an error. How do I get the job name
>> back?
>> > > > > My guess is that I have to use a different API (
>> > > > > https://godoc.org/google.golang.org/api/dataflow/v1b3). Is that
>> the
>> > > > > correct way to detect job name collisions?
>> > > > >
>> > > > > Thanks again.
>> > > > >
>> > > > > > On Fri, Jun 22, 2018 at 2:27 PM eduardo.mora...@gmail.com <
>> > > > > > eduardo.mora...@gmail.com> wrote:
>> > > > > >
>> > > > > > > If I have a k8s process launching dataflow pipelines, what
>> happens
>> > > when
>> > > > > > > the process is restarted? Can Apache Beam detect a running
>> > > pipeline and
>> > > > > > > join accordingly? or will the pipeline be duplicated?
>> > > > > > >
>> > > > > > > Thanks in advance.
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>


Re: Using user developped source in streamline python

2018-06-27 Thread Ahmet Altay
Hi Sébastien,

Currently there is no work in progress for including the write transforms
for the locations you listed. You could develop your own version if
interested. Please see WriteToBigquery transform [1] for reference.

Ahmet

[1]
https://github.com/apache/beam/blob/375bd3a6a53ba3ba7c965278dcb322875e1b4dca/sdks/python/apache_beam/io/gcp/bigquery.py#L1287

On Wed, Jun 27, 2018 at 2:48 AM, Sebastien Morand <
sebastien.mor...@veolia.com> wrote:

> Hi,
>
> Thanks for your answer.
>
> Ok looking forward and ready to test alpha on this. Because we have
> actually some use cases to send data to CloudSQL or Spanner or BigTable or
> Firestore. As far as I read the documentation, there is no native support
> for them and so we have already implemented a custom source support.
>
> So is there any work in progress for any of the 4 sinks above for native
> support in the python SDK?
>
> Regards,
>
> *Sébastien MORAND*
> Team Lead Solution Architect
> Technology & Operations / Digital Factory
> Veolia - Group Information Systems & Technology (IS&T)
> Cell.: +33 6 12 03 41 15 / Direct: +33 1 85 57 71 08
> Bureau 0144C (Ouest)
> 30, rue Madeleine-Vionnet
> 
>  -
> 93300
> Aubervilliers, France
> 
> *www.veolia.com *
> 
> 
> 
> 
> 
>
>
> On Thu, 21 Jun 2018 at 16:53, Lukasz Cwik  wrote:
>
>> +d...@beam.apache.org
>>
>> Python streaming custom source support will be available via
>> SplittableDoFn. It is actively being worked on by a few contributors but to
>> my knowledge there is no roadmap yet for having support for this for
>> Dataflow.
>>
>>
>> On Thu, Jun 21, 2018 at 1:19 AM Sebastien Morand <
>> sebastien.mor...@veolia.com> wrote:
>>
>>> Hi,
>>>
>>> We need to setup streaming dataflow in python using developed source (in
>>> the opposite of native source) for Cloud SQL integration and Firestore
>>> integration.
>>>
>>> This is currently not supported as far as I understood the
>>> documentation, can you confirm? Any roadmap on the topic?
>>>
>>> Thanks by advance,
>>> Regards,
>>>
>>> *Sébastien MORAND*
>>> Team Lead Solution Architect
>>> Technology & Operations / Digital Factory
>>> Veolia - Group Information Systems & Technology (IS&T)
>>> Cell.: +33 6 12 03 41 15 / Direct: +33 1 85 57 71 08
>>> Bureau 0144C (Ouest)
>>> 30, rue Madeleine-Vionnet
>>> 
>>>  -
>>> 93300
>>> Aubervilliers, France
>>> 
>>> *www.veolia.com *
>>> 
>>> 
>>> 
>>> 
>>> 
>>>
>>>
>>> 
>>> 
>>> This e-mail transmission (message and any attached files) may contain
>>> information that is proprietary, privileged and/or confidential to Veolia
>>> Environnement and/or its affiliates and is intended exclusively for the
>>> person(s) to whom it is addressed. If you are not the intended recipient,
>>> please notify the sender by return e-mail and delete all copies of this
>>> e-mail, including all attachments. Unless expressly authorized, any use,
>>> disclosure, publication, retransmission or dissemination of this e-mail
>>> and/or of its attachments is strictly prohibited.
>>>
>>> Ce message electronique et ses fichiers attaches sont strictement
>>> confidentiels et peuvent contenir des elements dont Veolia Environnement
>>> et/ou l'une de ses entites affiliees sont proprietaires. Ils sont donc
>>> destines a l'usage de leurs seuls destinataires. Si vous avez recu ce
>>> message par erreur, merci de le retourner a son emetteur et de le detruire
>>> ainsi que toutes les pieces attachees. L'utilisation, la divulgation, la
>>> publication, la distribution, ou la reproduction non expressement
>>> autorisees de ce message et de ses pieces attachees sont interdites.
>>> 
>>> 
>>>
>>
>
> --

Re: Using user developped source in streamline python

2018-06-27 Thread Sebastien Morand
Hi,

Thanks for your answer.

Ok looking forward and ready to test alpha on this. Because we have
actually some use cases to send data to CloudSQL or Spanner or BigTable or
Firestore. As far as I read the documentation, there is no native support
for them and so we have already implemented a custom source support.

So is there any work in progress for any of the 4 sinks above for native
support in the python SDK?

Regards,

*Sébastien MORAND*
Team Lead Solution Architect
Technology & Operations / Digital Factory
Veolia - Group Information Systems & Technology (IS&T)
Cell.: +33 6 12 03 41 15 / Direct: +33 1 85 57 71 08
Bureau 0144C (Ouest)
30, rue Madeleine-Vionnet - 93300 Aubervilliers, France
*www.veolia.com *







On Thu, 21 Jun 2018 at 16:53, Lukasz Cwik  wrote:

> +d...@beam.apache.org
>
> Python streaming custom source support will be available via
> SplittableDoFn. It is actively being worked on by a few contributors but to
> my knowledge there is no roadmap yet for having support for this for
> Dataflow.
>
>
> On Thu, Jun 21, 2018 at 1:19 AM Sebastien Morand <
> sebastien.mor...@veolia.com> wrote:
>
>> Hi,
>>
>> We need to setup streaming dataflow in python using developed source (in
>> the opposite of native source) for Cloud SQL integration and Firestore
>> integration.
>>
>> This is currently not supported as far as I understood the documentation,
>> can you confirm? Any roadmap on the topic?
>>
>> Thanks by advance,
>> Regards,
>>
>> *Sébastien MORAND*
>> Team Lead Solution Architect
>> Technology & Operations / Digital Factory
>> Veolia - Group Information Systems & Technology (IS&T)
>> Cell.: +33 6 12 03 41 15 / Direct: +33 1 85 57 71 08
>> Bureau 0144C (Ouest)
>> 30, rue Madeleine-Vionnet - 93300 Aubervilliers, France
>> *www.veolia.com *
>> 
>> 
>> 
>> 
>> 
>>
>>
>>
>> 
>> This e-mail transmission (message and any attached files) may contain
>> information that is proprietary, privileged and/or confidential to Veolia
>> Environnement and/or its affiliates and is intended exclusively for the
>> person(s) to whom it is addressed. If you are not the intended recipient,
>> please notify the sender by return e-mail and delete all copies of this
>> e-mail, including all attachments. Unless expressly authorized, any use,
>> disclosure, publication, retransmission or dissemination of this e-mail
>> and/or of its attachments is strictly prohibited.
>>
>> Ce message electronique et ses fichiers attaches sont strictement
>> confidentiels et peuvent contenir des elements dont Veolia Environnement
>> et/ou l'une de ses entites affiliees sont proprietaires. Ils sont donc
>> destines a l'usage de leurs seuls destinataires. Si vous avez recu ce
>> message par erreur, merci de le retourner a son emetteur et de le detruire
>> ainsi que toutes les pieces attachees. L'utilisation, la divulgation, la
>> publication, la distribution, ou la reproduction non expressement
>> autorisees de ce message et de ses pieces attachees sont interdites.
>>
>> 
>>
>

-- 



This
 
e-mail transmission (message and any attached files) may contain 
information that is proprietary, privileged and/or confidential to Veolia 
Environnement and/or its affiliates and is intended exclusively for the 
person(s) to whom it is addressed. If you are not the intended recipient, 
please notify the sender by return e-mail and delete all copies of this 
e-mail, including all attachments. Unless expressly authorized, any use, 
disclosure, publication, retransmission or dissemination of this e-mail 
and/or of its attachments is strictly prohibited. 


Ce message 
electronique et ses fichiers attaches sont strictement confidentiels et 
peuvent contenir des elements dont Veolia Environnement et/ou l'une de ses 
entites affiliees sont proprietaires. Ils sont donc destines a l'usage de 
leurs seuls destinataires. Si vous avez recu ce message par erreur, merci 
de le retourner a son emetteur et de le detruire ainsi que toutes les 
pieces attachees. L'utilisation, la divulgation, la publication, la 
distribution, ou la reproduction non expressement autorisees de ce message 
et de ses pieces attachees sont interdites.