[jira] [Commented] (ARROW-780) PYTHON_EXECUTABLE Required to be set during build
[ https://issues.apache.org/jira/browse/ARROW-780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968344#comment-15968344 ] Phillip Cloud commented on ARROW-780: - Yes. [~wesmckinn] Can you show the output of your cmake when building arrow-cpp? I'd like to see what it should be doing for reference. > PYTHON_EXECUTABLE Required to be set during build > - > > Key: ARROW-780 > URL: https://issues.apache.org/jira/browse/ARROW-780 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python >Reporter: Phillip Cloud >Assignee: Phillip Cloud > Labels: build > > I had to set PYTHON_EXECUTABLE to my conda environment's Python interpreter. > [~wesm_impala_7e40] says he doesn't have to. We should clarify whether this > is necessary, if it in fact is. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ARROW-780) PYTHON_EXECUTABLE Required to be set during build
[ https://issues.apache.org/jira/browse/ARROW-780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968330#comment-15968330 ] Wes McKinney commented on ARROW-780: Is this still an issue? > PYTHON_EXECUTABLE Required to be set during build > - > > Key: ARROW-780 > URL: https://issues.apache.org/jira/browse/ARROW-780 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python >Reporter: Phillip Cloud >Assignee: Phillip Cloud > Labels: build > > I had to set PYTHON_EXECUTABLE to my conda environment's Python interpreter. > [~wesm_impala_7e40] says he doesn't have to. We should clarify whether this > is necessary, if it in fact is. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ARROW-809) C++: Writing sliced record batch to IPC writes the entire array
[ https://issues.apache.org/jira/browse/ARROW-809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968317#comment-15968317 ] Itai Incze commented on ARROW-809: -- I wrote the comment before seeing yours latest one, so not in order to doubt the solution. I've seen that code... though I'm certain you're much better acquainted with it than I am :) > C++: Writing sliced record batch to IPC writes the entire array > --- > > Key: ARROW-809 > URL: https://issues.apache.org/jira/browse/ARROW-809 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Itai Incze >Assignee: Wes McKinney >Priority: Minor > Fix For: 0.3.0 > > > The bug can be triggered through python: > {code} > import pyarrow.parquet > array = pyarrow.array.from_pylist([1] * 100) > rb = pyarrow.RecordBatch.from_arrays([array], ['a']) > rb2 = rb.slice(0,2) > with open('/tmp/t.arrow', 'wb') as f: > w = pyarrow.ipc.FileWriter(f, rb.schema) > w.write_batch(rb2) > w.close() > {code} > which will result in a big file: > {code} > $ ll /tmp/t.arrow > -rw-rw-r-- 1 itai itai 800618 Apr 12 13:22 /tmp/t.arrow > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ARROW-820) [C++] Build dependencies for Parquet library without arrow support
[ https://issues.apache.org/jira/browse/ARROW-820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968313#comment-15968313 ] Wes McKinney commented on ARROW-820: Is leaving the {{arrow/ipc}} directory off and its thirdparty dependencies (flatbuffers, rapidjson) sufficient? > [C++] Build dependencies for Parquet library without arrow support > -- > > Key: ARROW-820 > URL: https://issues.apache.org/jira/browse/ARROW-820 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Deepak Majeti > > Parquet C++ library without Arrow depends only on a subset of Arrow > components(buffers, io). The scope of this JIRA is to build libarrow with > minimal dependencies for users of Parquet C++ library without Arrow support. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ARROW-809) C++: Writing sliced record batch to IPC writes the entire array
[ https://issues.apache.org/jira/browse/ARROW-809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968310#comment-15968310 ] Wes McKinney commented on ARROW-809: There is some buffer slicing happening on the IPC write path already: https://github.com/apache/arrow/blob/master/cpp/src/arrow/ipc/writer.cc#L207. It needs to be made consistent (+ well tested), though > C++: Writing sliced record batch to IPC writes the entire array > --- > > Key: ARROW-809 > URL: https://issues.apache.org/jira/browse/ARROW-809 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Itai Incze >Assignee: Wes McKinney >Priority: Minor > Fix For: 0.3.0 > > > The bug can be triggered through python: > {code} > import pyarrow.parquet > array = pyarrow.array.from_pylist([1] * 100) > rb = pyarrow.RecordBatch.from_arrays([array], ['a']) > rb2 = rb.slice(0,2) > with open('/tmp/t.arrow', 'wb') as f: > w = pyarrow.ipc.FileWriter(f, rb.schema) > w.write_batch(rb2) > w.close() > {code} > which will result in a big file: > {code} > $ ll /tmp/t.arrow > -rw-rw-r-- 1 itai itai 800618 Apr 12 13:22 /tmp/t.arrow > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ARROW-820) [C++] Build dependencies for Parquet library without arrow support
[ https://issues.apache.org/jira/browse/ARROW-820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968304#comment-15968304 ] Deepak Majeti commented on ARROW-820: - Will post a PR shortly. > [C++] Build dependencies for Parquet library without arrow support > -- > > Key: ARROW-820 > URL: https://issues.apache.org/jira/browse/ARROW-820 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Deepak Majeti > > Parquet C++ library without Arrow depends only on a subset of Arrow > components(buffers, io). The scope of this JIRA is to build libarrow with > minimal dependencies for users of Parquet C++ library without Arrow support. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (ARROW-820) [C++] Build dependencies for Parquet library without arrow support
Deepak Majeti created ARROW-820: --- Summary: [C++] Build dependencies for Parquet library without arrow support Key: ARROW-820 URL: https://issues.apache.org/jira/browse/ARROW-820 Project: Apache Arrow Issue Type: Improvement Reporter: Deepak Majeti Parquet C++ library without Arrow depends only on a subset of Arrow components(buffers, io). The scope of this JIRA is to build libarrow with minimal dependencies for users of Parquet C++ library without Arrow support. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ARROW-809) C++: Writing sliced record batch to IPC writes the entire array
[ https://issues.apache.org/jira/browse/ARROW-809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968295#comment-15968295 ] Itai Incze commented on ARROW-809: -- I've fiddled with it a bit - without altering the array class, I found there's a problem finding the exact number of items with a boolean array - where it doesnt matter, and in union array. There may be other instances as well that i'm not aware of. Seems to me that adding a private boolean {{IsSliced}} to the array is the cleanest way. > C++: Writing sliced record batch to IPC writes the entire array > --- > > Key: ARROW-809 > URL: https://issues.apache.org/jira/browse/ARROW-809 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Itai Incze >Assignee: Wes McKinney >Priority: Minor > Fix For: 0.3.0 > > > The bug can be triggered through python: > {code} > import pyarrow.parquet > array = pyarrow.array.from_pylist([1] * 100) > rb = pyarrow.RecordBatch.from_arrays([array], ['a']) > rb2 = rb.slice(0,2) > with open('/tmp/t.arrow', 'wb') as f: > w = pyarrow.ipc.FileWriter(f, rb.schema) > w.write_batch(rb2) > w.close() > {code} > which will result in a big file: > {code} > $ ll /tmp/t.arrow > -rw-rw-r-- 1 itai itai 800618 Apr 12 13:22 /tmp/t.arrow > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Issue Comment Deleted] (ARROW-809) C++: Writing sliced record batch to IPC writes the entire array
[ https://issues.apache.org/jira/browse/ARROW-809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Itai Incze updated ARROW-809: - Comment: was deleted (was: Agreed - its a small and easy bug. All is needed is to agree on the approach. I've fiddled with it a bit - without altering the array class, I found there's a problem finding the exact number of items with a boolean array - where it doesnt matter, and in union array. There may be other instances as well that i'm not aware of. Seems to me that adding a private boolean {{IsSliced}} to the array is the cleanest way. ) > C++: Writing sliced record batch to IPC writes the entire array > --- > > Key: ARROW-809 > URL: https://issues.apache.org/jira/browse/ARROW-809 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Itai Incze >Assignee: Wes McKinney >Priority: Minor > Fix For: 0.3.0 > > > The bug can be triggered through python: > {code} > import pyarrow.parquet > array = pyarrow.array.from_pylist([1] * 100) > rb = pyarrow.RecordBatch.from_arrays([array], ['a']) > rb2 = rb.slice(0,2) > with open('/tmp/t.arrow', 'wb') as f: > w = pyarrow.ipc.FileWriter(f, rb.schema) > w.write_batch(rb2) > w.close() > {code} > which will result in a big file: > {code} > $ ll /tmp/t.arrow > -rw-rw-r-- 1 itai itai 800618 Apr 12 13:22 /tmp/t.arrow > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ARROW-809) C++: Writing sliced record batch to IPC writes the entire array
[ https://issues.apache.org/jira/browse/ARROW-809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968285#comment-15968285 ] Wes McKinney commented on ARROW-809: I'm going to truncate the data buffers to a 64-byte padding offset, patch coming tomorrow probably > C++: Writing sliced record batch to IPC writes the entire array > --- > > Key: ARROW-809 > URL: https://issues.apache.org/jira/browse/ARROW-809 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Itai Incze >Assignee: Wes McKinney >Priority: Minor > Fix For: 0.3.0 > > > The bug can be triggered through python: > {code} > import pyarrow.parquet > array = pyarrow.array.from_pylist([1] * 100) > rb = pyarrow.RecordBatch.from_arrays([array], ['a']) > rb2 = rb.slice(0,2) > with open('/tmp/t.arrow', 'wb') as f: > w = pyarrow.ipc.FileWriter(f, rb.schema) > w.write_batch(rb2) > w.close() > {code} > which will result in a big file: > {code} > $ ll /tmp/t.arrow > -rw-rw-r-- 1 itai itai 800618 Apr 12 13:22 /tmp/t.arrow > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (ARROW-819) [Python] Define and document public Cython API
Wes McKinney created ARROW-819: -- Summary: [Python] Define and document public Cython API Key: ARROW-819 URL: https://issues.apache.org/jira/browse/ARROW-819 Project: Apache Arrow Issue Type: New Feature Components: Python Reporter: Wes McKinney We have a handful of {{cdef api}} declarations, but it might be useful to have a proper {{pyarrow/api.pxd}} file and a prescribed implementation pattern for other Cython users to link to and use the Arrow types in other Python extensions -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (ARROW-818) [Python] Review public pyarrow.* API completeness and update docs
Wes McKinney created ARROW-818: -- Summary: [Python] Review public pyarrow.* API completeness and update docs Key: ARROW-818 URL: https://issues.apache.org/jira/browse/ARROW-818 Project: Apache Arrow Issue Type: Improvement Components: Python Reporter: Wes McKinney Fix For: 0.3.0 There are still many names missing from pyarrow.* and ARROW-797. We should do a final review and update before 0.3 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (ARROW-816) [C++] Use conda packages for RapidJSON, Flatbuffers to speed up builds
[ https://issues.apache.org/jira/browse/ARROW-816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-816. Resolution: Fixed Issue resolved by pull request 537 [https://github.com/apache/arrow/pull/537] > [C++] Use conda packages for RapidJSON, Flatbuffers to speed up builds > -- > > Key: ARROW-816 > URL: https://issues.apache.org/jira/browse/ARROW-816 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Assignee: Wes McKinney > Fix For: 0.3.0 > > > These libraries are being downloaded and built (in the case of Flatbuffers) > multiple times in the course of a normal build. It would be better to build > them ones and set the *_HOME environment variables so that they can be used > throughout the CI run. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (ARROW-817) [C++] Fix incorrect code comment from ARROW-722
[ https://issues.apache.org/jira/browse/ARROW-817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-817. Resolution: Fixed Issue resolved by pull request 536 [https://github.com/apache/arrow/pull/536] > [C++] Fix incorrect code comment from ARROW-722 > --- > > Key: ARROW-817 > URL: https://issues.apache.org/jira/browse/ARROW-817 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Wes McKinney >Assignee: Wes McKinney > Fix For: 0.3.0 > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (ARROW-816) [C++] Use conda packages for RapidJSON, Flatbuffers to speed up builds
[ https://issues.apache.org/jira/browse/ARROW-816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney reassigned ARROW-816: -- Assignee: Wes McKinney > [C++] Use conda packages for RapidJSON, Flatbuffers to speed up builds > -- > > Key: ARROW-816 > URL: https://issues.apache.org/jira/browse/ARROW-816 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Assignee: Wes McKinney > Fix For: 0.3.0 > > > These libraries are being downloaded and built (in the case of Flatbuffers) > multiple times in the course of a normal build. It would be better to build > them ones and set the *_HOME environment variables so that they can be used > throughout the CI run. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ARROW-785) possible issue on writing parquet via pyarrow, subsequently read in Hive
[ https://issues.apache.org/jira/browse/ARROW-785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15967700#comment-15967700 ] Phillip Cloud commented on ARROW-785: - I'm not sure if there's still a possible issue here. When using drill, if I cast the {{WORD}} column to {{varchar}} then the data look fine. When left as {{binary}} the values are unintelligible: {code} 0: jdbc:drill:zk=local> select `YEAR`, cast(`WORD` as varchar) as `WORD` from dfs.`/home/phillip/code/cpp/arrow/python/arrow_parquet.parquet`; +---+-+ | YEAR | WORD | +---+-+ | 2017 | Word 1 | | 2018 | Word 2 | +---+-+ {code} > possible issue on writing parquet via pyarrow, subsequently read in Hive > > > Key: ARROW-785 > URL: https://issues.apache.org/jira/browse/ARROW-785 > Project: Apache Arrow > Issue Type: Bug >Reporter: Jeff Reback >Priority: Minor > Fix For: 0.3.0 > > > details here: > http://stackoverflow.com/questions/43268872/parquet-creation-conversion-from-pandas-dataframe-to-pyarrow-table-not-working-f > This round trips in pandas->parquet->pandas just fine on released pandas > (0.19.2) and pyarrow (0.2). > OP stats that it is not readable in Hive however. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ARROW-817) [C++] Fix incorrect code comment from ARROW-722
[ https://issues.apache.org/jira/browse/ARROW-817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15967684#comment-15967684 ] Wes McKinney commented on ARROW-817: PR: https://github.com/apache/arrow/pull/536 > [C++] Fix incorrect code comment from ARROW-722 > --- > > Key: ARROW-817 > URL: https://issues.apache.org/jira/browse/ARROW-817 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Wes McKinney >Assignee: Wes McKinney > Fix For: 0.3.0 > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (ARROW-817) [C++] Fix incorrect code comment from ARROW-722
Wes McKinney created ARROW-817: -- Summary: [C++] Fix incorrect code comment from ARROW-722 Key: ARROW-817 URL: https://issues.apache.org/jira/browse/ARROW-817 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: Wes McKinney Assignee: Wes McKinney Fix For: 0.3.0 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ARROW-785) possible issue on writing parquet via pyarrow, subsequently read in Hive
[ https://issues.apache.org/jira/browse/ARROW-785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15967675#comment-15967675 ] Phillip Cloud commented on ARROW-785: - I'm able to run this using {{beeline}} declaring the {{word}} column as either {{binary}} or {{string}} type in Hive: {code} ubuntu@impala:~$ beeline --silent=true --showHeader=false -u jdbc:hive2://localhost:1/default -n ubuntu 0: jdbc:hive2://localhost:1/default> create external table t (year bigint, word string) stored as parquet location '/user/hive/warehouse/arrow'; 0: jdbc:hive2://localhost:1/default> select * from t; +-+-+--+ | 2017| Word 1 | | 2018| Word 2 | +-+-+--+ 0: jdbc:hive2://localhost:1/default> create external table t2 (year bigint, word binary) stored as parquet location '/user/hive/warehouse/arrow'; 0: jdbc:hive2://localhost:1/default> select * from t2; +--+--+--+ | 2017 | Word 1 | | 2018 | Word 2 | +--+--+--+ {code} > possible issue on writing parquet via pyarrow, subsequently read in Hive > > > Key: ARROW-785 > URL: https://issues.apache.org/jira/browse/ARROW-785 > Project: Apache Arrow > Issue Type: Bug >Reporter: Jeff Reback >Priority: Minor > Fix For: 0.3.0 > > > details here: > http://stackoverflow.com/questions/43268872/parquet-creation-conversion-from-pandas-dataframe-to-pyarrow-table-not-working-f > This round trips in pandas->parquet->pandas just fine on released pandas > (0.19.2) and pyarrow (0.2). > OP stats that it is not readable in Hive however. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (ARROW-816) [C++] Use conda packages for RapidJSON, Flatbuffers to speed up builds
Wes McKinney created ARROW-816: -- Summary: [C++] Use conda packages for RapidJSON, Flatbuffers to speed up builds Key: ARROW-816 URL: https://issues.apache.org/jira/browse/ARROW-816 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Wes McKinney Fix For: 0.3.0 These libraries are being downloaded and built (in the case of Flatbuffers) multiple times in the course of a normal build. It would be better to build them ones and set the *_HOME environment variables so that they can be used throughout the CI run. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ARROW-300) [Format] Add buffer compression option to IPC file format
[ https://issues.apache.org/jira/browse/ARROW-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15967616#comment-15967616 ] Wes McKinney commented on ARROW-300: [~kiszk] I agree that having in-memory compression schemes like in Spark is a good idea, in addition to simpler snappy/lz4/zlib buffer compression. Would you like to make a proposal for improvements to the Arrow metadata to support these compression schemes? We should indicate that Arrow implementations are not required to implement these in general, so for now they can be marked as experimental and optional for implementations (e.g. we wouldn't necessarily integration test them). For scan-based in-memory columnar workloads, these encodings can yield better scan throughput because of better cache efficiency, and many column-oriented databases rely on this to be able to achieve high performance, so having it natively in the Arrow libraries seems useful. > [Format] Add buffer compression option to IPC file format > - > > Key: ARROW-300 > URL: https://issues.apache.org/jira/browse/ARROW-300 > Project: Apache Arrow > Issue Type: New Feature > Components: Format >Reporter: Wes McKinney > > It may be useful if data is to be sent over the wire to compress the data > buffers themselves as their being written in the file layout. > I would propose that we keep this extremely simple with a global buffer > compression setting in the file Footer. Probably only two compressors worth > supporting out of the box would be zlib (higher compression ratios) and lz4 > (better performance). > What does everyone think? -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ARROW-798) [Docs] Publish Format Markdown documents somehow on arrow.apache.org
[ https://issues.apache.org/jira/browse/ARROW-798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15967604#comment-15967604 ] Wes McKinney commented on ARROW-798: It seems like we shouldn't need something more complicated than Pelican (since a lot of us are Python users) or Jekyll (analogous to Pelican, but Ruby-based) for the main site. I figure we can have a central documentation landing point, with links into the language-specific documentation: Doxygen for C++, Sphinx for Python, etc. > [Docs] Publish Format Markdown documents somehow on arrow.apache.org > > > Key: ARROW-798 > URL: https://issues.apache.org/jira/browse/ARROW-798 > Project: Apache Arrow > Issue Type: Task > Components: Format >Reporter: Wes McKinney > Fix For: 0.3.0 > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ARROW-798) [Docs] Publish Format Markdown documents somehow on arrow.apache.org
[ https://issues.apache.org/jira/browse/ARROW-798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15967576#comment-15967576 ] Uwe L. Korn commented on ARROW-798: --- Currently we have a single static HTML page as the website for arrow. I would like to move it to the same infrastructure as other projects like Calcite use (they have the website in the main source and only push the rendered version to the Website GIT/SVN). With that we should be able to easily get Markdown documents and API docs rendered and uploaded to arrow.apache.org. I already had a go to move to https://github.com/apache/apache-website-template but that stalled as the template is quite heavy. > [Docs] Publish Format Markdown documents somehow on arrow.apache.org > > > Key: ARROW-798 > URL: https://issues.apache.org/jira/browse/ARROW-798 > Project: Apache Arrow > Issue Type: Task > Components: Format >Reporter: Wes McKinney > Fix For: 0.3.0 > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ARROW-300) [Format] Add buffer compression option to IPC file format
[ https://issues.apache.org/jira/browse/ARROW-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15967452#comment-15967452 ] Uwe L. Korn commented on ARROW-300: --- Adding methods like RLE- or Delta-encoding brings us very much in the space of Parquet. Given that some of these methods are really fast, it might make sense to support them for IPC. But then I fear that we will get very much in a region where there is no clear distinction between Arrow and Parquet anymore. > [Format] Add buffer compression option to IPC file format > - > > Key: ARROW-300 > URL: https://issues.apache.org/jira/browse/ARROW-300 > Project: Apache Arrow > Issue Type: New Feature > Components: Format >Reporter: Wes McKinney > > It may be useful if data is to be sent over the wire to compress the data > buffers themselves as their being written in the file layout. > I would propose that we keep this extremely simple with a global buffer > compression setting in the file Footer. Probably only two compressors worth > supporting out of the box would be zlib (higher compression ratios) and lz4 > (better performance). > What does everyone think? -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (ARROW-751) [Python] Rename all Cython extensions to "private" status with leading underscore
[ https://issues.apache.org/jira/browse/ARROW-751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn resolved ARROW-751. --- Resolution: Fixed Issue resolved by pull request 533 [https://github.com/apache/arrow/pull/533] > [Python] Rename all Cython extensions to "private" status with leading > underscore > - > > Key: ARROW-751 > URL: https://issues.apache.org/jira/browse/ARROW-751 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Assignee: Wes McKinney > Fix For: 0.3.0 > > > We can do this after the dust settles with the in-flight patches, but it > would be good to have {{pyarrow._array}} instead of {{pyarrow.array}}. If we > need to expose a module "publicly" to the user, it would be better to do it > in pure Python (as we've done already with {{pyarrow._parquet}}). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (ARROW-797) [Python] Add updated pyarrow.* public API listing in Sphinx docs
[ https://issues.apache.org/jira/browse/ARROW-797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn resolved ARROW-797. --- Resolution: Fixed Issue resolved by pull request 535 [https://github.com/apache/arrow/pull/535] > [Python] Add updated pyarrow.* public API listing in Sphinx docs > > > Key: ARROW-797 > URL: https://issues.apache.org/jira/browse/ARROW-797 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Wes McKinney >Assignee: Wes McKinney > Fix For: 0.3.0 > > > like https://github.com/pandas-dev/pandas/blob/master/doc/source/api.rst -- This message was sent by Atlassian JIRA (v6.3.15#6346)