Re: Arrow and R benchmark

2019-02-28 Thread Jonathan Chiang
Hi All, The thread about Apache Arrow and Google Cloud support did get some traction! Thanks to Micah for his suggestions. *If everyone can STAR this link, we could get more visibility. *I'm guessing if Wes responds to the thread it would be a huge win.

Re: Arrow and R benchmark

2019-02-26 Thread Wes McKinney
Thanks Micah for the update. The continued investment in Apache Avro is interesting given the low-activity state of that community. I'm optimistic that BQ will offer native Arrow export at some point in the future, perhaps after we reach a "1.0.0" release - Wes On Sat, Feb 23, 2019 at 12:17 PM

Re: Arrow and R benchmark

2019-02-23 Thread Jonathan Chiang
Hi Micah, Yes I filed the feature request from your advice. I will look more into avro for my own bigquery use cases. Thanks for following up. Best, Jonathan > On Feb 22, 2019, at 8:35 PM, Micah Kornfield wrote: > > Just to follow up on this thread, a new high throughput API [1] for

Re: Arrow and R benchmark

2019-02-22 Thread Micah Kornfield
Just to follow up on this thread, a new high throughput API [1] for reading data out of big query was released to public beta today. The format it streams is AVRO so it should be higher performance then parsing JSON (and reads can be parallelized). Implementing AVRO reading was something I was

Re: Arrow and R benchmark

2019-02-13 Thread Wes McKinney
Would someone like to make some feature requests to Google or engage with them in another way? I have interacted with GCP in the past; I think it would be helpful for them to hear from other Arrow users or community members since I have been quite public as a carrier of the Arrow banner. On Tue,

Re: Arrow and R benchmark

2019-02-04 Thread Micah Kornfield
Disclaimer: I work for Google (not on BQ). Everything I'm going to write reflects my own opinions, not those of my company. Jonathan and Wes, One way of trying to get support for this is filing a feature request at [1] and getting broader customer support for it. Another possible way of

Re: Arrow and R benchmark

2019-02-04 Thread Wes McKinney
Arrow support would be an obvious win for BigQuery. I've spoken with people at Google Cloud about this in several occasions. With the gRPC / Flight work coming along it might be a good opportunity to rekindle the discussion. If anyone from GCP is reading or if you know anyone at GCP who might be

Re: Arrow and R benchmark

2019-02-04 Thread Jonathan Chiang
Hi Wes, I am currently working a lot with Google BigQuery in R and Python. Hadley Wickham listed this as a big bottleneck for his library bigrquery. *The bottleneck for loading BigQuery data is now parsing BigQuery’s JSON format, which is difficult to optimise further because I’m already using

Re: Arrow and R benchmark

2018-12-03 Thread Wes McKinney
hi Jonathan, On Sat, Nov 24, 2018 at 6:19 PM Jonathan Chiang wrote: > > Hi Wes and Romain, > > I wrote a preliminary benchmark for reading and writing different file types > from R into arrow, borrowed some code from Hadley. I would like some feedback > to improve it and then possible push a

Re: Arrow and R benchmark

2018-11-24 Thread Jonathan Chiang
Hi Wes and Romain, I wrote a preliminary benchmark for reading and writing different file types from R into arrow, borrowed some code from Hadley. I would like some feedback to improve it and then possible push a R/benchmarks folder. I am willing to dedicate most of next week to this project, as

Re: Arrow and R benchmark

2018-11-15 Thread Jonathan Chiang
I'll go through that python repo and see what I can do. Thanks, Jonathan On Thu, Nov 15, 2018 at 1:55 PM Wes McKinney wrote: > I would suggest starting an r/benchmarks directory like we have in > Python (https://github.com/apache/arrow/tree/master/python/benchmarks) > and documenting the

Re: Arrow and R benchmark

2018-11-15 Thread Wes McKinney
I would suggest starting an r/benchmarks directory like we have in Python (https://github.com/apache/arrow/tree/master/python/benchmarks) and documenting the process for running all the benchmarks. On Thu, Nov 15, 2018 at 4:52 PM Romain François wrote: > > Right now, most of the code examples is

Re: Arrow and R benchmark

2018-11-15 Thread Romain François
Right now, most of the code examples is in the unit tests, but this is not measuring performance or stressing it. Perhaps you can start from there ? Romain > Le 15 nov. 2018 à 22:16, Wes McKinney a écrit : > > Adding dev@arrow.apache.org >> On Thu, Nov 15, 2018 at 4:13 PM Jonathan Chiang

Re: Arrow and R benchmark

2018-11-15 Thread Wes McKinney
Adding dev@arrow.apache.org On Thu, Nov 15, 2018 at 4:13 PM Jonathan Chiang wrote: > > Hi, > > I would like to contribute to developing benchmark suites for R and Arrow? > What would be the best way to start? > > Thanks, > Jonathan