RE: A windows/trigger Question with kafkaIO over Spark Runner

2018-07-30 Thread linrick
Dear Nicolas, Yes, I have set this configure, as Pipeline p = Pipeline.create(options); options.setMaxRecordsPerBatch(1000L); options.setBatchIntervalMillis(1000L); options.setSparkMaster("local[*]"); … PCollection> readData1 = readData. apply(Window.>into(FixedWindows.of(Duration.standardSec

Re: delayed emit (timer) in py-beam?

2018-07-30 Thread Austin Bennett
Fantastic; thanks, Charles! On Mon, Jul 30, 2018 at 3:49 PM, Charles Chen wrote: > Hey Austin, > > This API is not yet implemented in the Python SDK. I am working on this > feature: the next step from my end is to finish a reference implementation > in the local DirectRunner. As you note, t

Re: delayed emit (timer) in py-beam?

2018-07-30 Thread Charles Chen
Hey Austin, This API is not yet implemented in the Python SDK. I am working on this feature: the next step from my end is to finish a reference implementation in the local DirectRunner. As you note, the doc at https://s.apache.org/beam-python-user-state-and-timers describes the design. You can

delayed emit (timer) in py-beam?

2018-07-30 Thread Austin Bennett
What's going on with timers and python? Am looking at building a pipeline (assuming another group in my company will grant access to the Kafka topic): Kafka -> beam -> have beam wait 24 hours -> do transform(s) and emit a record. If I read things correctly that's not currently possible in python

Re: Live coding & reviewing adventures

2018-07-30 Thread Holden Karau
So small schedule changes. I’ll be doing some poking at the Go SDK at 2pm today - https://www.youtube.com/watch?v=9UAu1DOZJhM and the one with Gris setting up Beam on a new machine will be moved to Friday because her laptop got delayed - https://www.youtube.com/watch?v=x8Wg7qCDA5k On Tue, Jul 24,

Re: Filtering data using external source.

2018-07-30 Thread Jose Bermeo
Hi JB. I'm not sure, I could create two PCollections, the question is how do I make the PCollection from Redshift reflect the changes in the table? To refrease my initial question, each element in my PCollection has a foreing_key_id, I need to check if the row associated with the foreing_key_id in

Re: Filtering data using external source.

2018-07-30 Thread Chamikara Jayalath
One solution will be to stabilize data read from redshift DB. To this end, sending your side input through a Reshuffle transform [1] should work for some runners. Robin is working on a more portable solution for supporting stable input [2]. Thanks, Cham [1] https://github.com/apache/beam/blob/mas

Regression of (BEAM-2277) - IllegalArgumentException when using Hadoop file system for WordCount example.

2018-07-30 Thread Juan Carlos Garcia
Hi Folks, I experienced the issued described in (BEAM-2277 ), which shows it was fixed by v2.0.0 However using version 2.4.0 and 2.6.0 (another user reported it) shows the same error. So either it was not 100% fixed, or the bug appeared again. Th

Re: Filtering data using external source.

2018-07-30 Thread Jean-Baptiste Onofré
Hi Jose, so basically, you create two PCollections with the same keys and then you join/filter/flatten ? Regards JB On 30/07/2018 15:09, Jose Bermeo wrote: > Hi, question guys. > > I have to filter an unbounded collection based on data from a redshift > DB. I cannot use a side input as redshift

Re: Beam SparkRunner and Spark KryoSerializer problem

2018-07-30 Thread Juan Carlos Garcia
Hi Jean, Thanks for taking a look. On Mon, Jul 30, 2018 at 2:49 PM, Jean-Baptiste Onofré wrote: > Hi Juan, > > it seems that has been introduce by the metrics layer in the core runner > API. > > Let me check. > > Regards > JB > > On 30/07/2018 14:47, Juan Carlos Garcia wrote: > > Bump! > > > >

Filtering data using external source.

2018-07-30 Thread Jose Bermeo
Hi, question guys. I have to filter an unbounded collection based on data from a redshift DB. I cannot use a side input as redshift data could change. One way to do it would be to group common elements, make a query to filter each group, finally flatten the pipe again.Do you know if this is the be

Re: Beam SparkRunner and Spark KryoSerializer problem

2018-07-30 Thread Jean-Baptiste Onofré
Hi Juan, it seems that has been introduce by the metrics layer in the core runner API. Let me check. Regards JB On 30/07/2018 14:47, Juan Carlos Garcia wrote: > Bump! > > Does any of the core-dev roam around here? > > Can someone provide a feedback about BEAM-4597 >

Re: Beam SparkRunner and Spark KryoSerializer problem

2018-07-30 Thread Juan Carlos Garcia
Bump! Does any of the core-dev roam around here? Can someone provide a feedback about BEAM-4597 Thanks and regards, On Thu, Jul 19, 2018 at 3:41 PM, Juan Carlos Garcia wrote: > Folks, > > Its someone using the SparkRunner out there with the Sp

Re: ElasticsearchIO bulk delete

2018-07-30 Thread Tim Robertson
> we decided to postpone the feature That makes sense. I believe the ES6 branch is in-part working (I've looked at the code but not used it) which you can see here [1] and the jira to watch or contribute is [2]. It would be a useful addition to test independently and report any observations or im

RE: A windows/trigger Question with kafkaIO over Spark Runner

2018-07-30 Thread Nicolas Viard
Hello, I think Spark has a default windowing strategy and pulls data from kafka every X ms. You can override it using SparkPipelineOptions.setBatchIntervalMillis(1000). Best regards, Nicolas De : linr...@itri.org.tw Envoyé : lundi 30 juillet 2018 10:58:26 À :

A windows/trigger Question with kafkaIO over Spark Runner

2018-07-30 Thread linrick
Dear all I have a question about the use of windows/triggers. The following versions of related tools are set in my running program: == Beam 2.4.0 (Direct runner and Spark runner) Spark 2.3.1 (local mode) Kafka: 2.11-0.10.1.1 scala: 2.11.8 java: 1.8

Re: ElasticsearchIO bulk delete

2018-07-30 Thread Wout Scheepers
Hey Tim, Thanks for your proposal to mentor me through my first PR. As we’re definitely planning to upgrade to ES6 when Beam supports it, we decided to postpone the feature (we have a fix that works for us, for now). When Beam supports ES6, I’ll be happy to make a contribution to get bulk delete