Re: Writing bytes to BigQuery with beam

2019-05-16 Thread Valentyn Tymofieiev
Also, I filed https://issues.apache.org/jira/browse/BEAM-7346 to add more tests to Go SDK and verify the consistency of BQ IO behavior w.r.t. handling BYTES. On Thu, May 16, 2019 at 4:42 PM Valentyn Tymofieiev wrote: > > On Thu, May 16, 2019 at 1:12 PM Chamikara Jayalath > wrote: > >> >> >> On

Re: Writing bytes to BigQuery with beam

2019-05-16 Thread Valentyn Tymofieiev
On Thu, May 16, 2019 at 1:12 PM Chamikara Jayalath wrote: > > > On Wed, May 15, 2019 at 12:26 PM Valentyn Tymofieiev > wrote: > >> I took a closer look at BigQuery IO implementation in Beam SDK and >> Dataflow runner while reviewing a few PRs to address BEAM-6769, and I think >> we have to

Re: Writing bytes to BigQuery with beam

2019-05-16 Thread Chamikara Jayalath
On Wed, May 15, 2019 at 12:26 PM Valentyn Tymofieiev wrote: > I took a closer look at BigQuery IO implementation in Beam SDK and > Dataflow runner while reviewing a few PRs to address BEAM-6769, and I think > we have to revise the course of action here. > > It turns out, that when we first added

Re: Writing bytes to BigQuery with beam

2019-05-15 Thread Robert Burke
For the Go SDK: BigQueryIO exists, but other than maybe one PR that added batching of writes (to avoid the size limit communicating with BigQuery), the reads are probably going to be re-written I don't believe there's any

Re: Writing bytes to BigQuery with beam

2019-05-15 Thread Valentyn Tymofieiev
By the way, does anyone know what is the status of BigQuery connector in Beam Go and Beam SQL? Perhaps some folks working on these SDKs can chime in here. I am curious whether these SDKs also make / will make it a responsibility of the user to base64-encode bytes. As I mentioned above, it is

Re: Writing bytes to BigQuery with beam

2019-05-15 Thread Valentyn Tymofieiev
I took a closer look at BigQuery IO implementation in Beam SDK and Dataflow runner while reviewing a few PRs to address BEAM-6769, and I think we have to revise the course of action here. It turns out, that when we first added support for BYTES in Java BiqQuery IO, we designed the API with an

Re: Writing bytes to BigQuery with beam

2019-03-26 Thread Pablo Estrada
Sure, we can make users explicitly ask for schema autodetection, instead of it being the default when no schema is provided. I think that's reasonable. On Mon, Mar 25, 2019, 7:19 PM Valentyn Tymofieiev wrote: > Thanks everyone for input on this thread. I think there is a confusion > between

Re: Writing bytes to BigQuery with beam

2019-03-25 Thread Valentyn Tymofieiev
Thanks everyone for input on this thread. I think there is a confusion between not specifying the schema, and asking BigQuery to do schema autodetection. This is not the same thing, however in recent changes to BQ IO that happened after 2.11 release, we are forcing schema autodetection, when

Re: Writing bytes to BigQuery with beam

2019-03-25 Thread Chamikara Jayalath
On Mon, Mar 25, 2019 at 2:16 PM Pablo Estrada wrote: > +Chamikara Jayalath with the new BigQuery sink, > schema autodetection is supported (it's a very simple thing to have). Do > you think we should not have it? > Best > -P. > Ah good to know. But IMO users should be able to write to existing

Re: Writing bytes to BigQuery with beam

2019-03-25 Thread Pablo Estrada
+Chamikara Jayalath with the new BigQuery sink, schema autodetection is supported (it's a very simple thing to have). Do you think we should not have it? Best -P. On Mon, Mar 25, 2019 at 11:01 AM Chamikara Jayalath wrote: > > > On Mon, Mar 25, 2019 at 2:03 AM Juta Staes wrote: > >> >> On Mon,

Re: Writing bytes to BigQuery with beam

2019-03-25 Thread Chamikara Jayalath
On Mon, Mar 25, 2019 at 2:03 AM Juta Staes wrote: > > On Mon, 25 Mar 2019 at 06:15, Valentyn Tymofieiev > wrote: > >> We received feedback on https://issuetracker.google.com/issues/129006689 - >> BQ developers say that schema identification is done and they discourage to >> use schema

Re: Writing bytes to BigQuery with beam

2019-03-25 Thread Juta Staes
On Mon, 25 Mar 2019 at 06:15, Valentyn Tymofieiev wrote: > We received feedback on https://issuetracker.google.com/issues/129006689 - > BQ developers say that schema identification is done and they discourage to > use schema autodetection in tables using BYTES. In light of this, I think > may be

Re: Writing bytes to BigQuery with beam

2019-03-24 Thread Valentyn Tymofieiev
We received feedback on https://issuetracker.google.com/issues/129006689 - BQ developers say that schema identification is done and they discourage to use schema autodetection in tables using BYTES. In light of this, I think may be fair to recommend Beam users to specify BQ schemas as well when

Re: Writing bytes to BigQuery with beam

2019-03-20 Thread Chamikara Jayalath
On Wed, Mar 20, 2019 at 7:37 PM Valentyn Tymofieiev wrote: > Pablo, according to Juta's analysis (1.c in the document) and also > https://issuetracker.google.com/issues/129006689, I think BQ confuses > BYTES and STRING when schema is not specified... This seems to me like a BQ > bug, so for Beam

Re: Writing bytes to BigQuery with beam

2019-03-20 Thread Valentyn Tymofieiev
Pablo, according to Juta's analysis (1.c in the document) and also https://issuetracker.google.com/issues/129006689, I think BQ confuses BYTES and STRING when schema is not specified... This seems to me like a BQ bug, so for Beam this means that we either have to wait until BQ fixes or, or work

Re: Writing bytes to BigQuery with beam

2019-03-20 Thread Chamikara Jayalath
On Wed, Mar 20, 2019 at 6:30 PM Pablo Estrada wrote: > That sounds reasonable to me, Valentyn. > > Regarding (3), when the table already exists, it's not necessary to get > the schema. BQ is smart enough to load everything in appropriately. (as > long as bytes fields are base64 encoded) > > The

Re: Writing bytes to BigQuery with beam

2019-03-20 Thread Pablo Estrada
That sounds reasonable to me, Valentyn. Regarding (3), when the table already exists, it's not necessary to get the schema. BQ is smart enough to load everything in appropriately. (as long as bytes fields are base64 encoded) The problem is when the table does not exist and the user does not

Re: Writing bytes to BigQuery with beam

2019-03-20 Thread Valentyn Tymofieiev
Thanks Juta for detailed analysis. I reached out to BigQuery team to improve documentation around treatment of Bytes and reported the issue that schema autodetection does not work for BYTES in GCP issue tracker

Re: Writing bytes to BigQuery with beam

2019-03-20 Thread Reuven Lax
The Java SDK relies on Jackson to do the encoding. On Wed, Mar 20, 2019 at 11:33 AM Chamikara Jayalath wrote: > > > On Wed, Mar 20, 2019 at 5:46 AM Juta Staes wrote: > >> Hi all, >> >> >> I am working on porting beam to python 3 and discovered the following: >> >> >> Current handling of bytes

Re: Writing bytes to BigQuery with beam

2019-03-20 Thread Chamikara Jayalath
On Wed, Mar 20, 2019 at 5:46 AM Juta Staes wrote: > Hi all, > > > I am working on porting beam to python 3 and discovered the following: > > > Current handling of bytes in bigquery IO: > > When writing bytes to BQ , beam uses > https://cloud.google.com/bigquery/docs/reference/rest/v2/. This API