Re: Batch loads in streaming pipeline - withNumFileShards

2017-11-15 Thread Lukasz Cwik
Filed https://issues.apache.org/jira/browse/BEAM-3198 for the IllegalArgumentException Do you mind posting a little code snippet of how you build the BQ IO connector on BEAM-3198? On Wed, Nov 15, 2017 at 12:18 PM, Arpan Jain wrote: > Hi, > > I am trying to use

Batch loads in streaming pipeline - withNumFileShards

2017-11-15 Thread Arpan Jain
Hi, I am trying to use Method.FILE_LOADS for loading data into BQ in my streaming pipeline using RC3 release of 2.2.2. It looks like withNumFileShards needs to be also set for using this. Couple of questions regarding this: * Any guidelines on what's a good value for this? FWIW my pipeline is

Re: Does ElasticsearchIO in the latest RC support adding document IDs?

2017-11-15 Thread Tim Robertson
Hi Chet, I'll be a user of this, so thank you. It seems reasonable although - did you consider letting folk name the document ID field explicitly? It would avoid an unnecessary transformation and might be simpler: // instruct the writer to use a provided document ID

Re: Does ElasticsearchIO in the latest RC support adding document IDs?

2017-11-15 Thread Chet Aldrich
Given that this seems like a change that should probably happen, and I’d like to help contribute if possible, a few questions and my current opinion: So I’m leaning towards approach B here, which is: > b. (a bit less user friendly) PCollection with K as an id. But forces the > user to do a

Re: Does ElasticsearchIO in the latest RC support adding document IDs?

2017-11-15 Thread Etienne Chauchot
Yes, exactly. Actually, it raised from a discussion we had with Romain about ESIO. Le 15/11/2017 à 10:08, Jean-Baptiste Onofré a écrit : I think it's also related to the discussion Romain raised on the dev mailing list (gap between batch size, checkpointing & bundles). Regards JB On

Re: Does ElasticsearchIO in the latest RC support adding document IDs?

2017-11-15 Thread Tim Robertson
Hi Chet, +1 for interest in this from me too. If it helps, I'd have expected a) to be the implementation (e.g. something like "_id" being used if present) and handing multiple delivery being a responsibility of the developer. Thanks, Tim On Wed, Nov 15, 2017 at 10:08 AM, Jean-Baptiste

Re: Does ElasticsearchIO in the latest RC support adding document IDs?

2017-11-15 Thread Jean-Baptiste Onofré
I think it's also related to the discussion Romain raised on the dev mailing list (gap between batch size, checkpointing & bundles). Regards JB On 11/15/2017 09:53 AM, Etienne Chauchot wrote: Hi Chet, What you say is totally true, docs written using ElasticSearchIO will always have an ES

Re: [DISCUSS] Drop Spark 1.x support to focus on Spark 2.x

2017-11-15 Thread Jean-Baptiste Onofré
Any additional feedback about that ? I will update the thread with the two branches later today: the one with Spark 1.x & 2.x support, the one with Spark 2.x upgrade. Thanks Regards JB On 11/13/2017 09:32 AM, Jean-Baptiste Onofré wrote: Hi Beamers, I'm forwarding this discussion & vote