Re: kafka client interoperability

2019-05-02 Thread Juan Carlos Garcia
Downgrade only the KafkaIO module to the version that works for you (also excluding any transient dependency of it) that works for us. JC. Lukasz Cwik schrieb am Do., 2. Mai 2019, 20:05: > +dev > > On Thu, May 2, 2019 at 10:34 AM Moorhead,Richard < > richard.moorhe...@cerner.com> wrote: > >>

Re: Will Beam add any overhead or lack certain API/functions available in Spark/Flink?

2019-05-02 Thread Jan Lukavský
Hi Max, comments inline. On 5/2/19 3:29 PM, Maximilian Michels wrote: Couple of comments: * Flink transforms It wouldn't be hard to add a way to run arbitrary Flink operators through the Beam API. Like you said, once you go down that road, you loose the ability to run the pipeline on a

Re: Will Beam add any overhead or lack certain API/functions available in Spark/Flink?

2019-05-02 Thread Pablo Estrada
An example that I can think of as a feature that Beam could provide to other runners is SQL. Beam SQL expands into Beam transforms, and it can run on other runners. Flink and Spark do have SQL support because they've invested in it, but think of smaller runners e.g. Nemo. Of course, not all of

Re: Beam's HCatalogIO for Hive Parquet Data

2019-05-02 Thread utkarsh . khare
Thanks Ismael. On Thu, May 2, 2019 at 7:29 PM Ismaël Mejía wrote: > Hello, > > Support for Parquet in HCatalog (Hive) started on version 3.0.0 > HIVE-8838 Support Parquet through HCatalog > https://issues.apache.org/jira/browse/HIVE-8838?attachmentSortBy=dateTime > > The current version used on

Re: kafka client interoperability

2019-05-02 Thread Lukasz Cwik
+dev On Thu, May 2, 2019 at 10:34 AM Moorhead,Richard < richard.moorhe...@cerner.com> wrote: > In Beam 2.9.0, this check was made: > > >

Re: Will Beam add any overhead or lack certain API/functions available in Spark/Flink?

2019-05-02 Thread kant kodali
If people don't want to use it because crucial libraries are written in only some language but not available in others, that makes some sense otherwise I would think it is biased(which is what happens most of the time). A lot of the Language arguments are biased anyways since most of them just

kafka client interoperability

2019-05-02 Thread Moorhead,Richard
In Beam 2.9.0, this check was made: https://github.com/apache/beam/blob/2ba00576e3a708bb961a3c64a2241d9ab32ab5b3/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaRecordCoder.java#L132 However this logic was removed in 2.10+ in the newer ProducerRecordCoder class:

Re: Will Beam add any overhead or lack certain API/functions available in Spark/Flink?

2019-05-02 Thread Lukasz Cwik
On Thu, May 2, 2019 at 6:29 AM Maximilian Michels wrote: > Couple of comments: > > * Flink transforms > > It wouldn't be hard to add a way to run arbitrary Flink operators > through the Beam API. Like you said, once you go down that road, you > loose the ability to run the pipeline on a

Re: cancel job

2019-05-02 Thread Lukasz Cwik
+user@beam.apache.org On Thu, May 2, 2019 at 9:51 AM Lukasz Cwik wrote: > ... build pipeline ... > pipeline_result = p.run() > if job_taking_too_long: > pipeline_result.cancel() > > Python: >

Re: Transform a PCollection> into PCollection (Java)

2019-05-02 Thread Andres Angel
Now it runs so well guys, what I have done is read the stream payload with a KafkaIO but using the parameters withoutMetadata and applying Values.String.create() which will return a PCollection and then I can process my payload as simple records. thanks so much On Wed, May 1, 2019 at 1:38 PM

Re: Beam's HCatalogIO for Hive Parquet Data

2019-05-02 Thread Ismaël Mejía
Hello, Support for Parquet in HCatalog (Hive) started on version 3.0.0 HIVE-8838 Support Parquet through HCatalog https://issues.apache.org/jira/browse/HIVE-8838?attachmentSortBy=dateTime The current version used on Beam is 2.1.0. I filled a new JIRA to tackle it BEAM-7209 - Update HCatalogIO to

Re: Will Beam add any overhead or lack certain API/functions available in Spark/Flink?

2019-05-02 Thread Maximilian Michels
Couple of comments: * Flink transforms It wouldn't be hard to add a way to run arbitrary Flink operators through the Beam API. Like you said, once you go down that road, you loose the ability to run the pipeline on a different Runner. And that's precisely one of the selling points of Beam.

Beam's HCatalogIO for Hive Parquet Data

2019-05-02 Thread utkarsh . khare
Hi, Am trying to read a table in CDH's Hive which is in Parquet Format. I am able to see the data using beeline. However, using HCatalogIO, all I get during read are NULL values for all rows. I see this problem only with Parquet Format. Other File Formats work as expected. Has someone else

Re: Will Beam add any overhead or lack certain API/functions available in Spark/Flink?

2019-05-02 Thread Jan Lukavský
Hi Robert, yes, that is correct. What I'm suggesting is a discussion whether Beam should or should not support this type of pipeline customization without hard forking code. But maybe this discussion should proceed in dev@. Jan On 5/2/19 1:44 PM, Robert Bradshaw wrote: Correct, there's no

Re: Will Beam add any overhead or lack certain API/functions available in Spark/Flink?

2019-05-02 Thread Robert Bradshaw
Correct, there's no out of the box way to do this. As mentioned, this would also result in non-portable pipelines. However, even the portability framework is set up such that runners can recognize particular transforms and provide their own implementations thereof (which is how translations are

Re: Will Beam add any overhead or lack certain API/functions available in Spark/Flink?

2019-05-02 Thread Robert Bradshaw
Sorry if I wasn't clear, I mean in the sense that A is neither the intersection nor union of B and C in the venn diagram below. As Max says, what really matters is the set of features you care about. [image: venn.png] https://www.benfrederickson.com/images/venn.png On Thu, May 2, 2019 at

Re: Will Beam add any overhead or lack certain API/functions available in Spark/Flink?

2019-05-02 Thread Jan Lukavský
Just to clarify - the code I posted is just a proposal, it is not actually possible currently. On 5/2/19 11:05 AM, Jan Lukavský wrote: Hi, I'd say that what Pankaj meant could be rephrased as "What if I want to manually tune or tweak my Pipeline for specific runner? Do I have any options

Re: Will Beam add any overhead or lack certain API/functions available in Spark/Flink?

2019-05-02 Thread kant kodali
I don't understand the following statement. "Beam's feature set is neither the intersection nor union of the feature sets of those runners it has available as execution engines." If Beam is neither intersection nor union how can Beam abstract anything? Generally speaking, if there is nothing in

Re: Will Beam add any overhead or lack certain API/functions available in Spark/Flink?

2019-05-02 Thread Robert Bradshaw
On Thu, May 2, 2019 at 12:13 AM Pankaj Chand wrote: > > I thought by choosing Beam, I would get the benefits of both (Spark and > Flink), one at a time. Now, I'm understanding that I might not get the full > potential from either of the two. You get the benefit of being able to choose, without