[jira] [Created] (ARROW-2057) [Python] Configure size of data pages in pyarrow.parquet.write_table

2018-01-30 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-2057: --- Summary: [Python] Configure size of data pages in pyarrow.parquet.write_table Key: ARROW-2057 URL: https://issues.apache.org/jira/browse/ARROW-2057 Project: Apache

Re: Moving Arrow Java to JDK 8

2018-01-30 Thread Li Jin
I created https://issues.apache.org/jira/browse/ARROW-2055 to track. Also created the javadoc issue as subtask. On Tue, Jan 30, 2018 at 11:44 AM, Dwight Gunning wrote: > Thanks Li, > > There is no JIRA as yet except for Arrow 2015 for the JODA time migration > to Java 8

[jira] [Created] (ARROW-2056) Fix javadoc generation for Java 8

2018-01-30 Thread Li Jin (JIRA)
Li Jin created ARROW-2056: - Summary: Fix javadoc generation for Java 8 Key: ARROW-2056 URL: https://issues.apache.org/jira/browse/ARROW-2056 Project: Apache Arrow Issue Type: Sub-task

Re: Moving Arrow Java to JDK 8

2018-01-30 Thread Li Jin
Thanks Dwight, I think it would be good to track the required items for moving to Java 8 support. As far as I know, Arrow works with Java 8 already so this shouldn't be too hard. Dependencies wise downstream projects Spark 2.3 already drops Java 7 support, I am not sure about Dremio. Is there

[jira] [Created] (ARROW-2054) Compilation warnings

2018-01-30 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2054: - Summary: Compilation warnings Key: ARROW-2054 URL: https://issues.apache.org/jira/browse/ARROW-2054 Project: Apache Arrow Issue Type: Task

Re: [Python] Disk size performance of Snappy vs Brotli vs Blosc

2018-01-30 Thread simba nyatsanga
Hi Everyone, Just an update on the above questions. I've updated the numbers in Google sheet using data with less entropy here: https://docs.google.com/spreadsheets/d/1by1vCaO2p24PLq_NAA5Ckh1n3i-SoFYrRcfi1siYKFQ/edit#gid=0 I've also got the benchmarking code. Although some of the data examples

[jira] [Created] (ARROW-2061) [C++] Run ASAN builds in Travis CI

2018-01-30 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-2061: --- Summary: [C++] Run ASAN builds in Travis CI Key: ARROW-2061 URL: https://issues.apache.org/jira/browse/ARROW-2061 Project: Apache Arrow Issue Type:

[jira] [Created] (ARROW-2058) Add wheels for Alpine Linux

2018-01-30 Thread Omer Katz (JIRA)
Omer Katz created ARROW-2058: Summary: Add wheels for Alpine Linux Key: ARROW-2058 URL: https://issues.apache.org/jira/browse/ARROW-2058 Project: Apache Arrow Issue Type: Task

[jira] [Created] (ARROW-2059) [Python] Possible performance regression in Feather read/write path

2018-01-30 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-2059: --- Summary: [Python] Possible performance regression in Feather read/write path Key: ARROW-2059 URL: https://issues.apache.org/jira/browse/ARROW-2059 Project: Apache

[jira] [Created] (ARROW-2060) [Python] Documentation for creating StructArray using from_arrays or a sequence of dicts

2018-01-30 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-2060: --- Summary: [Python] Documentation for creating StructArray using from_arrays or a sequence of dicts Key: ARROW-2060 URL: https://issues.apache.org/jira/browse/ARROW-2060

Re: Duplicate Columns

2018-01-30 Thread Wes McKinney
In a sense, field names in Arrow schemas are "just data". Whether or not the data is invalid in the context of a particular use case may vary a great deal -- for example pandas supports duplicate column names (to its own hardship, admittedly) while most SQL systems do not. Sadly, sometimes

[jira] [Created] (ARROW-2062) [C++] Stalled builds in test_serialization.py in Travis CI

2018-01-30 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-2062: --- Summary: [C++] Stalled builds in test_serialization.py in Travis CI Key: ARROW-2062 URL: https://issues.apache.org/jira/browse/ARROW-2062 Project: Apache Arrow

[jira] [Created] (ARROW-2063) [C++] Implement variant of FixedSizeBufferWriter that also supports reading (like MemoryMappedFile)

2018-01-30 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-2063: --- Summary: [C++] Implement variant of FixedSizeBufferWriter that also supports reading (like MemoryMappedFile) Key: ARROW-2063 URL: https://issues.apache.org/jira/browse/ARROW-2063

[jira] [Created] (ARROW-2064) [GLib] Add common build problems link to the install section

2018-01-30 Thread yosuke shiro (JIRA)
yosuke shiro created ARROW-2064: --- Summary: [GLib] Add common build problems link to the install section Key: ARROW-2064 URL: https://issues.apache.org/jira/browse/ARROW-2064 Project: Apache Arrow

[jira] [Created] (ARROW-2065) Fix bug in SerializationContext.clone().

2018-01-30 Thread Robert Nishihara (JIRA)
Robert Nishihara created ARROW-2065: --- Summary: Fix bug in SerializationContext.clone(). Key: ARROW-2065 URL: https://issues.apache.org/jira/browse/ARROW-2065 Project: Apache Arrow Issue

[jira] [Created] (ARROW-2053) [C++] Build instruction is incomplete

2018-01-30 Thread yosuke shiro (JIRA)
yosuke shiro created ARROW-2053: --- Summary: [C++] Build instruction is incomplete Key: ARROW-2053 URL: https://issues.apache.org/jira/browse/ARROW-2053 Project: Apache Arrow Issue Type: