Re: A problem about additional outputs

2017-05-03 Thread Aviem Zur
By "cannot run normally" do you mean you get an exception? We recently had a bug on master in which streaming pipelines containing `ParDo` with multiple outputs ran into `NullPointerException`. This was fixed here: https://issues.apache.org/jira/browse/BEAM-2029 Is this what you're facing? If so

Re: A problem about additional outputs

2017-05-03 Thread zhenglin.Tian
hi, i have a trouble about addition outputs with SparkRunner. Here if my code, when i use DirectRunner, everything runs OK, but if i replace DirectRunner with SparkRunner, the code can't run normally. public class UnifiedDataExtraction {         private static TupleTag rawDataTag = new

Trevor Grant has shared a document on Google Docs with you

2017-05-03 Thread trevor . d . grant
Trevor Grant has invited you to view the following document: Open in Docs

Re: BigQuery join in Apache beam

2017-05-03 Thread Prabeesh K.
Hi Dan Thank you for your prompt reply. Regards, Prabeesh K. On 3 May 2017 at 19:23, Dan Halperin wrote: > Hi Prabeesh, > > The underlying Beam primitive you use for Join is CoGroupByKey – this > takes N different collections KV , KV , ... K and >

Re: BigQuery join in Apache beam

2017-05-03 Thread Dan Halperin
Hi Prabeesh, The underlying Beam primitive you use for Join is CoGroupByKey – this takes N different collections KV , KV , ... K and produces one collection KV. This is a compressed representation of a Join result, in that you can

Re: BigQuery join in Apache beam

2017-05-03 Thread Prabeesh K.
Hi Dan, Sorry for the late response. I agreed with you for the use cases that you mentioned. Advice me and please share if there is any sample code to join two data sets in Beam that are sharing some common keys. Regards, Prabeesh K. On 6 February 2017 at 10:38, Dan Halperin

BigQuery table backed by Google Sheet

2017-05-03 Thread Prabeesh K.
Hi, How to we can read a BigQuery table that backed by google sheet? For me, I am getting the following error. "error": { "errors": [ { "domain": "global", "reason": "accessDenied", "message": "Access Denied: BigQuery BigQuery: Permission denied while globbing file pattern.",

Re: Reprocessing historic data with streaming jobs

2017-05-03 Thread Lars BK
Thanks for your input and sorry for the late reply. Lukasz, you may be right that running the reprocessing as a batch job will be better and faster. I'm still experimenting with approach 3 where I publish all messages and then start the job to let the watermark progress through the data. It seems