Re: AvroUtils converting generic record to Beam Row causes class cast exception

2019-04-15 Thread Rui Wang
I didn't find code in `AvroUtils.toBeamRowStrict` that converts long to Joda time. `AvroUtils.toBeamRowStrict` retrieves objects from GenericRecord, and tries to cast objects based on their types (and cast(object) to long for "timestamp-millis"). see [1]. So in order to use

Re: AvroUtils converting generic record to Beam Row causes class cast exception

2019-04-15 Thread Vishwas Bm
Hi Rui, I agree that by converting it to long, there will be no error. But the KafkaIO is giving a GenericRecord with attribute of type JodaTime. Now I convert it to long. Then in the AvroUtils.toBeamRowStrict again converts it to JodaTime. I used the avro tools 1.8.2 jar, for the below schema

Re: Hi, some sample about Extracting data from Xlsx ?

2019-04-15 Thread Henrique Molina
Hi Pablo , Thanks for your attention, I so sorry, my bad written "Cs extension " I did means .csv extension ! The example like this: load-csv-file-from-google-cloud-storage

Re: Removing :beam-website:testWebsite from gradle build target

2019-04-15 Thread Kyle Weaver
I agree with Andrew that the external links checks are ultra-flaky and seldom strictly needed, so I filed a PR to make checking external links optional and disabled by default: https://github.com/apache/beam/pull/8318. Let me know what you all think. Kyle Weaver ️ Software Engineer ️

Comparison of Beam on X vs X

2019-04-15 Thread Mikhail Gryzykhin
Hi everyone, I've recently got curious of what are benefits/drawbacks for Beam on X vs X, where X is relevant runner (Spark, Hadoop, etc). I wonder, if anyone did similar research already and might have some documents/tables/references available? Sample topics of curiosity: * performance of

Re: Go SDK status

2019-04-15 Thread Robert Burke
Hi Thomas! I'm so glad you asked! The status of the Go SDK is complicated, so this email can't be brief. There's are several dimensions to consider: as a Go Open Source Project, User Libraries and Experience, and on Beam Features. I'm going to be updating the roadmap later this month when I have

[VOTE] Release 2.12.0, release candidate #4

2019-04-15 Thread Andrew Pilloud
Hi everyone, Please review and vote on the release candidate #4 for the version 2.12.0, as follows: [ ] +1, Approve the release [ ] -1, Do not approve the release (please provide specific comments) The complete staging area is available for your review, which includes: * JIRA release notes [1],

Re: [EXT] Re: [DOC] Portable Spark Runner

2019-04-15 Thread Ankur Goenka
Thanks for sharing. This looks great! On Mon, Apr 15, 2019 at 2:54 PM Kenneth Knowles wrote: > Great. Thanks for sharing! > > On Mon, Apr 15, 2019 at 2:38 PM Lei Xu wrote: > >> This is super nice! Really look forward to use this. >> >> On Mon, Apr 15, 2019 at 2:34 PM Thomas Weise wrote: >>

Re: [EXT] Re: [DOC] Portable Spark Runner

2019-04-15 Thread Kenneth Knowles
Great. Thanks for sharing! On Mon, Apr 15, 2019 at 2:38 PM Lei Xu wrote: > This is super nice! Really look forward to use this. > > On Mon, Apr 15, 2019 at 2:34 PM Thomas Weise wrote: > >> Great to see the portable Spark runner taking shape. Thanks for the >> update! >> >> >> On Mon, Apr 15,

Re: [EXT] Re: [DOC] Portable Spark Runner

2019-04-15 Thread Lei Xu
This is super nice! Really look forward to use this. On Mon, Apr 15, 2019 at 2:34 PM Thomas Weise wrote: > Great to see the portable Spark runner taking shape. Thanks for the update! > > > On Mon, Apr 15, 2019 at 10:53 AM Pablo Estrada wrote: > >> This is very cool Kyle. Thanks for moving it

Re: [DOC] Portable Spark Runner

2019-04-15 Thread Thomas Weise
Great to see the portable Spark runner taking shape. Thanks for the update! On Mon, Apr 15, 2019 at 10:53 AM Pablo Estrada wrote: > This is very cool Kyle. Thanks for moving it forward! > Best > -P. > > On Fri, Apr 12, 2019 at 1:21 PM Lukasz Cwik wrote: > >> Thanks for the doc. >> >> On Fri,

Re: Hi, some sample about Extracting data from Xlsx ?

2019-04-15 Thread Pablo Estrada
Hello Henrique, I am not aware of existing Beam transforms specifically used for reading in XLSX data. Can you share what you mean by "examples related with Cs extension"? I am aware of some Python libraries foir this sort of thing[1]. You could use the FileIO transforms in the Python SDK to

Hi, some sample about Extracting data from Xlsx ?

2019-04-15 Thread Henrique Molina
Hello I would like to use best practices from Apache Beams to read Xlsx. however I found examples only related with Cs extension. someone there is sample using ParDo to Collect all columns and sheets from Excel xlsx ? Afterwards I will put into google Big query. Thanks & Regards

Re: AvroUtils converting generic record to Beam Row causes class cast exception

2019-04-15 Thread Rui Wang
Read from the code and seems like as the logical type "timestamp-millis" means, it's expecting millis in Long as values under this logical type. So if you can convert joda-time to millis before calling "AvroUtils.toBeamRowStrict(genericRecord, this.beamSchema)", your exception will gone. -Rui

Re: [DOC] Portable Spark Runner

2019-04-15 Thread Pablo Estrada
This is very cool Kyle. Thanks for moving it forward! Best -P. On Fri, Apr 12, 2019 at 1:21 PM Lukasz Cwik wrote: > Thanks for the doc. > > On Fri, Apr 12, 2019 at 11:34 AM Kyle Weaver wrote: > >> Hi everyone, >> >> As some of you know, I've been piggybacking on the existing Spark and >> Flink

Re: DynamoDB Sink Contribution - Contributor Right Request

2019-04-15 Thread Lukasz Cwik
Welcome, I have added you as a contributor to the project and assigned BEAM-7043 to you. On Mon, Apr 15, 2019 at 10:42 AM cm...@godaddy.com wrote: > Hello everyone, > > > > I am an software engineer at Godaddy. Our team is working with and > supporting Beam. I just opened a Jira ticket to build

Re: Go SDK status

2019-04-15 Thread Robert Burke
Give me another hour. It's not a brief email to write. On Mon, 15 Apr 2019 at 10:43, Pablo Estrada wrote: > +Robert Burke ; ) thoughts? > > - AFAIK, we have wordcount running on Flink > > On Sat, Apr 13, 2019 at 11:31 AM Thomas Weise wrote: > >> How "experimental" is the Go SDK? What are

Re: Go SDK status

2019-04-15 Thread Pablo Estrada
+Robert Burke ; ) thoughts? - AFAIK, we have wordcount running on Flink On Sat, Apr 13, 2019 at 11:31 AM Thomas Weise wrote: > How "experimental" is the Go SDK? What are the major work items to reach > MVP? How close are we to be able to run let's say wordcount on the portable > Flink

DynamoDB Sink Contribution - Contributor Right Request

2019-04-15 Thread cm...@godaddy.com
Hello everyone, I am an software engineer at Godaddy. Our team is working with and supporting Beam. I just opened a Jira ticket to build a new component, which DynamoSink. Here is the ticket: https://issues.apache.org/jira/browse/BEAM-7043 I would love to become a contributor in this repo, or

Re: AvroUtils converting generic record to Beam Row causes class cast exception

2019-04-15 Thread Lukasz Cwik
+dev On Sun, Apr 14, 2019 at 10:29 PM Vishwas Bm wrote: > Hi, > > Below is my pipeline: > > KafkaSource (KafkaIO.read) --> Pardo ---> BeamSql > ---> KafkaSink(KafkaIO.write) > > > The avro schema of the topic has a field of logical type > timestamp-millis.

Re: [PROPOSAL] Custom JVM initialization for Beam workers

2019-04-15 Thread Ahmet Altay
On Mon, Apr 15, 2019 at 9:35 AM Udi Meiri wrote: > Is this like the way Python SDK allows for a custom setup.py? > example: > https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/complete/juliaset/setup.py > custom setup.py is slightly different. It will execute a custom

Re: [PROPOSAL] Custom JVM initialization for Beam workers

2019-04-15 Thread Udi Meiri
Is this like the way Python SDK allows for a custom setup.py? example: https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/complete/juliaset/setup.py On Fri, Apr 12, 2019 at 10:51 AM Lukasz Cwik wrote: > +1 on the use cases that Ahmet pointed out and the solution that

Re: Contributor Request

2019-04-15 Thread Maximilian Michels
Hi Thinh, Sounds great. Would be interesting to hear more about King's use case. I've added you as a JIRA contributor. Cheers, Max On 15.04.19 16:06, Thinh Ha wrote: Hi there, My name is Thinh. I am a strategic cloud engineer in the Google Cloud professional services team. I specialise in

Contributor Request

2019-04-15 Thread Thinh Ha
Hi there, My name is Thinh. I am a strategic cloud engineer in the Google Cloud professional services team. I specialise in Data and currently working with a customer who is a heavy user of Beam/Dataflow (King). I wanted to have a go at implementing a ticket that they requested as a contributor.

Beam Dependency Check Report (2019-04-15)

2019-04-15 Thread Apache Jenkins Server
High Priority Dependency Updates Of Beam Python SDK: Dependency Name Current Version Latest Version Release Date Of the Current Used Version Release Date Of The Latest Release JIRA Issue future 0.16.0 0.17.1 2016-10-27