Re: Preferred way to cite Apache Arrow?

2019-01-28 Thread Micah Kornfield
I believe web-pages are updated each release, so you can get a rough "version number" by looking in git history (e.g. for the 0.12.0 release Layout can be found at https://github.com/apache/arrow/blob/apache-arrow-0.12.0/docs/source/format/Layout.rst ) It would probably be nice to have this on

[jira] [Created] (ARROW-4412) Add explicit version numbers to the arrow specification documents.

2019-01-28 Thread Micah Kornfield (JIRA)
Micah Kornfield created ARROW-4412: -- Summary: Add explicit version numbers to the arrow specification documents. Key: ARROW-4412 URL: https://issues.apache.org/jira/browse/ARROW-4412 Project: Apache

RE: Preferred way to cite Apache Arrow?

2019-01-28 Thread Mike French
I also notice there is no standalone specification document, and the 3 spec web pages do not have a date or a version number. Mike -Original Message- From: Wes McKinney [mailto:wesmck...@gmail.com] Sent: Monday, January 28, 2019 6:01 PM To: dev@arrow.apache.org Subject: Re: Preferred

[jira] [Created] (ARROW-4410) [C++] Fix InvertKernel edge cases

2019-01-28 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-4410: --- Summary: [C++] Fix InvertKernel edge cases Key: ARROW-4410 URL: https://issues.apache.org/jira/browse/ARROW-4410 Project: Apache Arrow Issue Type: Bug

[jira] [Created] (ARROW-4409) [C++] Enable arrow::ipc internal JSON reader to read from a file path

2019-01-28 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-4409: --- Summary: [C++] Enable arrow::ipc internal JSON reader to read from a file path Key: ARROW-4409 URL: https://issues.apache.org/jira/browse/ARROW-4409 Project: Apache

Re: [RESULT] [VOTE] Accept donation of Rust DataFusion library for Apache Arrow

2019-01-28 Thread Wes McKinney
hi Andy -- yes you can definitely back out the patch if it becomes an issue. As soon as the CLAs are sorted I can run the IP clearance vote on general@incubator. Thanks for your patience on this On Mon, Jan 28, 2019 at 7:58 PM Andy Grove wrote: > > I'm making good progress with the audit and so

Re: [RESULT] [VOTE] Accept donation of Rust DataFusion library for Apache Arrow

2019-01-28 Thread Andy Grove
I'm making good progress with the audit and so far I have found ~50 lines of code from one contributor that is currently part of the proposed donation. I have reached out via github issues to see if I can make contact. This code could also be removed from the donation with no consequence. I'm

[jira] [Created] (ARROW-4408) [CPP/Doc] Remove outdated Parquet documentation

2019-01-28 Thread Krisztian Szucs (JIRA)
Krisztian Szucs created ARROW-4408: -- Summary: [CPP/Doc] Remove outdated Parquet documentation Key: ARROW-4408 URL: https://issues.apache.org/jira/browse/ARROW-4408 Project: Apache Arrow

Re: Support for schema evolution of Parquet files?

2019-01-28 Thread Wes McKinney
Schema evolution is not accounted for at all yet in the library. One JIRA corresponding to this for the C++ project is https://issues.apache.org/jira/browse/PARQUET-810 Currently in Python the dataset handling for multiple Parquet files is implemented in Python. I am interested in porting this

Support for schema evolution of Parquet files?

2019-01-28 Thread Dave Birdsall
Hi, In Hive, it is possible to evolve one's schema using ALTER TABLE ADD COLUMNS and/or ALTER TABLE REPLACE COLUMNS. These commands change the metadata for the Hive table as a whole but do not rewrite existing files that are part of the table. So, for example, if I create a Parquet table,

[jira] [Created] (ARROW-4407) [CMake] ExternalProject_Add does not capture CC/CXX correctly

2019-01-28 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-4407: - Summary: [CMake] ExternalProject_Add does not capture CC/CXX correctly Key: ARROW-4407 URL: https://issues.apache.org/jira/browse/ARROW-4407

[jira] [Created] (ARROW-4406) Ignore "*_$folder$" files on S3

2019-01-28 Thread George Sakkis (JIRA)
George Sakkis created ARROW-4406: Summary: Ignore "*_$folder$" files on S3 Key: ARROW-4406 URL: https://issues.apache.org/jira/browse/ARROW-4406 Project: Apache Arrow Issue Type: Improvement

[jira] [Created] (ARROW-4405) [Docs] Docker documentation builds fails since the source directory is mounted as readonly

2019-01-28 Thread Krisztian Szucs (JIRA)
Krisztian Szucs created ARROW-4405: -- Summary: [Docs] Docker documentation builds fails since the source directory is mounted as readonly Key: ARROW-4405 URL: https://issues.apache.org/jira/browse/ARROW-4405

[jira] [Created] (ARROW-4404) [CI] AppVeyor toolchain build does not build anything

2019-01-28 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-4404: - Summary: [CI] AppVeyor toolchain build does not build anything Key: ARROW-4404 URL: https://issues.apache.org/jira/browse/ARROW-4404 Project: Apache Arrow

Re: Preferred way to cite Apache Arrow?

2019-01-28 Thread Wes McKinney
That's fine, with "Apache Arrow Development Team" On Mon, Jan 28, 2019 at 9:14 AM Jim Pivarski wrote: > > Okay, thanks! I'm using > > Arrow Development Team, ``Apache Arrow'' [website], retrieved October 2018. > \url{https://arrow.apache.org} > > > -- Jim > > > > On Mon, Jan 28, 2019 at 9:01 AM

Re: Preferred way to cite Apache Arrow?

2019-01-28 Thread Sebastien Binet
what about sending something to the journal of open source software ? - https://joss.theoj.org/ cheers, -s On Mon, Jan 28, 2019 at 4:02 PM Wes McKinney wrote: > hi Jim, > > We don't have a canonical citation yet. I'd like to write an academic > paper about the project this year or next, so

Re: Preferred way to cite Apache Arrow?

2019-01-28 Thread Jim Pivarski
Okay, thanks! I'm using Arrow Development Team, ``Apache Arrow'' [website], retrieved October 2018. \url{https://arrow.apache.org} -- Jim On Mon, Jan 28, 2019 at 9:01 AM Wes McKinney wrote: > hi Jim, > > We don't have a canonical citation yet. I'd like to write an academic > paper about

Re: Preferred way to cite Apache Arrow?

2019-01-28 Thread Wes McKinney
hi Jim, We don't have a canonical citation yet. I'd like to write an academic paper about the project this year or next, so hopefully this will change, but I think you can cite the website in a publication in the meantime. - Wes On Mon, Jan 28, 2019 at 8:49 AM Jim Pivarski wrote: > > Is there

Preferred way to cite Apache Arrow?

2019-01-28 Thread Jim Pivarski
Is there a preferred reference (paper, proceedings, Zenodo link) to use when citing Apache Arrow? I couldn't find any on the Arrow website. Thanks, -- Jim

Re: [RESULT] [VOTE] Accept donation of Rust DataFusion library for Apache Arrow

2019-01-28 Thread Wes McKinney
On Mon, Jan 28, 2019 at 8:42 AM Andy Grove wrote: > > Thanks. I'm excited to be able to donate this code. > > I've just completed a quick first pass audit of the contributions and I > don't think any of them apply to the code being donated but I will do a > thorough audit to confirm over the next

Re: [RESULT] [VOTE] Accept donation of Rust DataFusion library for Apache Arrow

2019-01-28 Thread Andy Grove
Thanks. I'm excited to be able to donate this code. I've just completed a quick first pass audit of the contributions and I don't think any of them apply to the code being donated but I will do a thorough audit to confirm over the next couple of days. The commits generally fall into these

Re: [Format] Passing selection masks with Arrow record batches

2019-01-28 Thread Francois Saint-Jacques
On Mon, Jan 28, 2019 at 12:53 AM Wes McKinney wrote: > I was having a discussion recently about Arrow and the topic of > server-side filtering vs. client-side filtering came up. > > The basic problem is this: > > If you have a RecordBatch that you wish to filter out some of the > "rows", one way

[jira] [Created] (ARROW-4403) [Rust] CI fails due to formatting errors

2019-01-28 Thread Pindikura Ravindra (JIRA)
Pindikura Ravindra created ARROW-4403: - Summary: [Rust] CI fails due to formatting errors Key: ARROW-4403 URL: https://issues.apache.org/jira/browse/ARROW-4403 Project: Apache Arrow

[jira] [Created] (ARROW-4402) [Rust] Workaround for writing Cargo.lock to read-only mounted source directory in docker-compose

2019-01-28 Thread Krisztian Szucs (JIRA)
Krisztian Szucs created ARROW-4402: -- Summary: [Rust] Workaround for writing Cargo.lock to read-only mounted source directory in docker-compose Key: ARROW-4402 URL:

[jira] [Created] (ARROW-4401) [Python] Alpine dockerfile fails to build because pandas requires numpy as build dependency

2019-01-28 Thread Krisztian Szucs (JIRA)
Krisztian Szucs created ARROW-4401: -- Summary: [Python] Alpine dockerfile fails to build because pandas requires numpy as build dependency Key: ARROW-4401 URL: https://issues.apache.org/jira/browse/ARROW-4401

Re: [Format] [Rust] ChunkedArray, Column and Table

2019-01-28 Thread Sebastien Binet
On Sun, Jan 27, 2019 at 1:08 PM Neville Dipale wrote: > Hi Antoine, > > I've given your response some thought. > > I'm thinking more looking at the computational aspect of Arrow. I agree > that for representing and sharing data, RecordBatches achieve the purpose. > > I came across ChunkedArray,