Re: [EXT] Re: using avro instead of json for BigQueryIO.Write

2020-02-26 Thread Pablo Estrada
This is great. I'll take a look today. On Wed, Feb 26, 2020 at 9:42 AM Chuck Yang wrote: > Hi Devs, > > I was able to get around to working on Avro file loads to BigQuery in > Python SDK and now have a PR available at > https://github.com/apache/beam/pull/10979 . Comments appreciated :) > >

Re: [EXT] Re: using avro instead of json for BigQueryIO.Write

2020-02-26 Thread Chuck Yang
Hi Devs, I was able to get around to working on Avro file loads to BigQuery in Python SDK and now have a PR available at https://github.com/apache/beam/pull/10979 . Comments appreciated :) Thanks, Chuck On Wed, Nov 27, 2019 at 10:10 AM Chuck Yang wrote: > > I would love to fix this, but not

Re: [EXT] Re: using avro instead of json for BigQueryIO.Write

2019-11-27 Thread Chuck Yang
I would love to fix this, but not sure if I have the bandwidth at the moment. Anyway, created the jira here: https://jira.apache.org/jira/browse/BEAM-8841 Thanks! Chuck -- *Confidentiality Note:* We care about protecting our proprietary information, confidential material, and trade secrets. 

Re: [EXT] Re: using avro instead of json for BigQueryIO.Write

2019-11-26 Thread Chamikara Jayalath
I don't believe so, please create one (we can dedup if we happen to find another issue). Even better if you can contribute to fix this :) Thanks, Cham On Tue, Nov 26, 2019 at 7:07 PM Chuck Yang wrote: > Has anyone looked into implementing this for the Python SDK? It would > be nice to have it

Re: using avro instead of json for BigQueryIO.Write

2019-10-17 Thread Reuven Lax
I'll take a look as well. Thanks for doing this! On Fri, Oct 4, 2019 at 9:16 PM Pablo Estrada wrote: > Thanks Steve! > I'll take a look next week. Sorry about the delay so far. > Best > -P. > > On Fri, Sep 27, 2019 at 10:37 AM Steve Niemitz > wrote: > >> I put up a semi-WIP pull request

Re: using avro instead of json for BigQueryIO.Write

2019-10-04 Thread Pablo Estrada
Thanks Steve! I'll take a look next week. Sorry about the delay so far. Best -P. On Fri, Sep 27, 2019 at 10:37 AM Steve Niemitz wrote: > I put up a semi-WIP pull request https://github.com/apache/beam/pull/9665 for > this. The initial results look good. I'll spend some time soon adding > unit

Re: using avro instead of json for BigQueryIO.Write

2019-09-27 Thread Steve Niemitz
I put up a semi-WIP pull request https://github.com/apache/beam/pull/9665 for this. The initial results look good. I'll spend some time soon adding unit tests and documentation, but I'd appreciate it if someone could take a first pass over it. On Wed, Sep 18, 2019 at 6:14 PM Pablo Estrada

Re: using avro instead of json for BigQueryIO.Write

2019-09-18 Thread Pablo Estrada
Thanks for offering to work on this! It would be awesome to have it. I can say that we don't have that for Python ATM. On Mon, Sep 16, 2019 at 10:56 AM Steve Niemitz wrote: > Our experience has actually been that avro is more efficient than even > parquet, but that might also be skewed from our

Re: using avro instead of json for BigQueryIO.Write

2019-09-16 Thread Steve Niemitz
Our experience has actually been that avro is more efficient than even parquet, but that might also be skewed from our datasets. I might try to take a crack at this, I found https://issues.apache.org/jira/browse/BEAM-2879 tracking it (which coincidentally references my thread from a couple years

Re: using avro instead of json for BigQueryIO.Write

2019-09-16 Thread Reuven Lax
It's been talked about, but nobody's done anything. There as some difficulties related to type conversion (json and avro don't support the same types), but if those are overcome then an avro version would be much more efficient. I believe Parquet files would be even more efficient if you wanted to

using avro instead of json for BigQueryIO.Write

2019-09-16 Thread Steve Niemitz
Has anyone investigated using avro rather than json to load data into BigQuery using BigQueryIO (+ FILE_LOADS)? I'd be interested in enhancing it to support this, but I'm curious if there's any prior work here.