Re: Checkpointing on Google Cloud Dataflow Runner

2022-08-29 Thread Reuven Lax via user
Google Cloud Dataflow does support snapshots
. Is this
what you were looking for?

On Mon, Aug 29, 2022 at 4:04 PM Kenneth Knowles  wrote:

> Hi Will, David,
>
> I think you'll find the best source of answer for this sort of question on
> the user@beam list. I've put that in the To: line with a BCC: to the
> dev@beam list so everyone knows they can find the thread there. If I have
> misunderstood, and your question has to do with building Beam itself, feel
> free to move it back.
>
> Kenn
>
> On Mon, Aug 29, 2022 at 2:24 PM Will Baker  wrote:
>
>> Hello!
>>
>> I am wondering about using checkpoints with Beam running on Google
>> Cloud Dataflow.
>>
>> The docs indicate that checkpoints are not supported by Google Cloud
>> Dataflow:
>> https://beam.apache.org/documentation/runners/capability-matrix/additional-common-features-not-yet-part-of-the-beam-model/
>>
>> Is there a recommended approach to handling checkpointing on Google
>> Cloud Dataflow when using streaming sources like Kinesis and Kafka, so
>> that a pipeline could be resumed from where it left off if it needs to
>> be stopped or crashes for some reason?
>>
>> Thanks!
>> Will Baker
>>
>


Re: Checkpointing on Google Cloud Dataflow Runner

2022-08-29 Thread Kenneth Knowles
Hi Will, David,

I think you'll find the best source of answer for this sort of question on
the user@beam list. I've put that in the To: line with a BCC: to the
dev@beam list so everyone knows they can find the thread there. If I have
misunderstood, and your question has to do with building Beam itself, feel
free to move it back.

Kenn

On Mon, Aug 29, 2022 at 2:24 PM Will Baker  wrote:

> Hello!
>
> I am wondering about using checkpoints with Beam running on Google
> Cloud Dataflow.
>
> The docs indicate that checkpoints are not supported by Google Cloud
> Dataflow:
> https://beam.apache.org/documentation/runners/capability-matrix/additional-common-features-not-yet-part-of-the-beam-model/
>
> Is there a recommended approach to handling checkpointing on Google
> Cloud Dataflow when using streaming sources like Kinesis and Kafka, so
> that a pipeline could be resumed from where it left off if it needs to
> be stopped or crashes for some reason?
>
> Thanks!
> Will Baker
>


[question] Good Course to learn beam

2022-08-29 Thread Leandro Nahabedian via user
Hi community!

I'm looking for a good course to learn apache beam and I saw this one

which I believe is good, since it has very good feedback from students.
What do you think?

Thanks in advance for your help

Cheers,
Leandro

-- 
[image: dialpad] 
__
*Leandro Nahabedian *
Data Engineer
O: 908.883.4369


Re: How to run expansion service using go sdk in local development environment ?

2022-08-29 Thread Yu Watanabe
Hello Danny.

Ah . I see . Thank you for your advice.

Thanks,
Yu Watanabe

On Mon, Aug 29, 2022 at 9:26 AM Danny McCormick via user
 wrote:
>
> Hey Yu, as the error you posted suggests, the Go direct runner which you're 
> using in your local development environment doesn't support external 
> transforms using an expansion service. If you're going to do a x-lang 
> transform using an expansion service you should use a different runner like 
> Dataflow, Flink, Spark, or one of the other runners listed here - 
> https://beam.apache.org/documentation/runners/capability-matrix/
>
> Thanks,
> Danny
>
> On Sun, Aug 28, 2022 at 7:50 AM Yu Watanabe  wrote:
>>
>> Hello.
>>
>> I would like to ask a question about expansion service. I'm currently
>> testing my expansion service in my local development environment.
>> I have read notes about kafka in advance,
>>
>> https://github.com/apache/beam/blob/master/sdks/go/examples/kafka/taxi.go#L93
>>
>> I have prepared sdk containers .
>>
>> [ywatanabe@laptop-archlinux Development]$ docker image ls | grep apache
>> apache/beam_java8_sdk   2.42.0.dev
>> f7e9d38b01fe   11 days ago 643MB
>> apache/beam_go_sdk  latest
>> 8a87ea45255b   11 days ago 149MB
>>
>> However, when I run the code in my local environment, I get an error.
>>
>> [ywatanabe@laptop-archlinux go]$ go run ./examples/elasticsearch/sample.go \
>>   --runner direct \
>>   --sdk_harness_container_image_override
>> ".*java.*,apache/beam_java8_sdk:2.42.0.dev"
>> Hello world.
>> 2022/08/28 20:39:01 Executing pipeline with the direct runner.
>> 2022/08/28 20:39:01 Pipeline:
>> 2022/08/28 20:39:01 Nodes: {1: []uint8/bytes GLO}
>> {2: string/string GLO}
>> {3: []uint8/bytes GLO}
>> {4: []uint8/bytes GLO}
>> Edges: 1: Impulse [] -> [Out: []uint8 -> {1: []uint8/bytes GLO}]
>> 2: ParDo [In(Main): []uint8 <- {1: []uint8/bytes GLO}] -> [Out: T ->
>> {2: string/string GLO}]
>> 3: External [In(Main): string <- {2: string/string GLO}] -> [Out:
>> []uint8 -> {3: []uint8/bytes GLO} Out: []uint8 -> {4: []uint8/bytes
>> GLO}]
>> Pipeline failed: translation failed
>> caused by:
>> external transforms like 3: External [In(Main): string <- {2:
>> string/string GLO}] -> [Out: []uint8 -> {3: []uint8/bytes GLO} Out:
>> []uint8 -> {4: []uint8/bytes GLO}] are not supported in the Go direct
>> runner, please execute your pipel[ywatanabe@laptop-archlinux go]$
>>
>> Am I missing something ?
>>
>> My main and io code can be found below.
>>
>> https://gist.github.com/yuwtennis/dec3bf3bfc0c4fa54d9d3565c98d008e
>>
>> Thanks,
>> Yu
>>
>>
>> --
>> Yu Watanabe
>>
>> linkedin: www.linkedin.com/in/yuwatanabe1/
>> twitter:   twitter.com/yuwtennis



-- 
Yu Watanabe

linkedin: www.linkedin.com/in/yuwatanabe1/
twitter:   twitter.com/yuwtennis