Re: Continuous benchmarking setup

2018-05-11 Thread Wes McKinney
Thanks Tom and Antoine! Since these benchmarks are literally running on a machine in my closet at home, there may be some downtime in the future. At some point we should document a process of setting up a new machine from scratch to be the nightly bare metal benchmark slave. - Wes On Fri, May

Re: Question about streaming to memorymapped files

2018-05-11 Thread Wes McKinney
hi Robert, Thank you for this analysis. Having a memory map interface that supports growing the memory map sounds useful, so we would welcome this contribution to the project. best Wes On Fri, May 11, 2018 at 10:23 AM, Ambalu, Robert wrote: > Antoine, fair point. I

Re: [CI] Code coverage reports

2018-05-11 Thread Antoine Pitrou
Hi Wes, Le 11/05/2018 à 05:32, Wes McKinney a écrit : > > I also prefer codecov.io, but unfortunately Apache Infra does not > support it I believe due to some app hook permissions issue (there are > some similar problems preventing CircleCI from being made available to > Apache projects). I

Re: Question about streaming to memorymapped files

2018-05-11 Thread Antoine Pitrou
If you write your own auto-growing memory mapped file implementation, I'd be curious about performance measurements vs. FileOutputStream (and possibly BufferedOutputStream). mremap() and truncate() calls are not free. Also, at some point you'll want to unmap data already written to prevent the

Re: Continuous benchmarking setup

2018-05-11 Thread Antoine Pitrou
Hi again, Tom has configured the benchmarking machine to run and publish Arrow's ASV-based benchmarks. The latest results can now be seen at: https://pandas.pydata.org/speed/arrow/ I expect these are regenerated on a regular (daily?) basis. Thanks Tom :-) Regards Antoine. On Wed, 11 Apr

RE: Question about streaming to memorymapped files

2018-05-11 Thread Ambalu, Robert
Antoine, fair point. I just ran some perf stats using FileOutputStream vs my growing mmap impl. It seems in most cases you are correct, their runtimes are basically equivalent. The only time mmap beats it significantly is if there are many Flush calls. I have a parameter to control how many

[jira] [Created] (ARROW-2574) [CI] Collect and publish Python coverage

2018-05-11 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2574: - Summary: [CI] Collect and publish Python coverage Key: ARROW-2574 URL: https://issues.apache.org/jira/browse/ARROW-2574 Project: Apache Arrow Issue Type:

[jira] [Created] (ARROW-2573) Field metadata is lost on serialization round-trip

2018-05-11 Thread Thomas Buhrmann (JIRA)
Thomas Buhrmann created ARROW-2573: -- Summary: Field metadata is lost on serialization round-trip Key: ARROW-2573 URL: https://issues.apache.org/jira/browse/ARROW-2573 Project: Apache Arrow

[VOTE] Accept donation of Arrow Ruby bindings

2018-05-11 Thread Wes McKinney
Dear all, Arrow PMC member Kouhei Sutou has developed Ruby bindings to the GLib C interface for Apache Arrow * https://github.com/red-data-tools/red-arrow * https://github.com/red-data-tools/red-arrow-gpu He is proposing to pull these projects into Apache Arrow to develop them all in the same

Import Ruby bindings

2018-05-11 Thread Kouhei Sutou
Hi, I want to import the Ruby bindings written by me at the followings: * https://github.com/red-data-tools/red-arrow * https://github.com/red-data-tools/red-arrow-gpu https://github.com/apache/arrow/pull/1990a We need IP Clearance process to import the Ruby bindings but I think that I