[jira] [Commented] (ARROW-3382) [C++] Run Gandiva tests in Travis CI
[ https://issues.apache.org/jira/browse/ARROW-3382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16636481#comment-16636481 ] Pindikura Ravindra commented on ARROW-3382: --- sure [~wesmckinn]. but, the java build is dependant on cpp build for gandiva. so, we want to get the cpp build and tests running first. > [C++] Run Gandiva tests in Travis CI > > > Key: ARROW-3382 > URL: https://issues.apache.org/jira/browse/ARROW-3382 > Project: Apache Arrow > Issue Type: Task > Components: C++ >Reporter: Praveen Kumar Desabandu >Priority: Major > > Integrate and test Gandiva-Cpp in travis. This would unblock new PRs to > gandiva. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-3173) [Rust] dynamic_types example does not run
[ https://issues.apache.org/jira/browse/ARROW-3173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16636305#comment-16636305 ] Kouhei Sutou commented on ARROW-3173: - Thanks! :-) > [Rust] dynamic_types example does not run > - > > Key: ARROW-3173 > URL: https://issues.apache.org/jira/browse/ARROW-3173 > Project: Apache Arrow > Issue Type: Bug > Components: Rust >Reporter: Paddy Horan >Assignee: Paddy Horan >Priority: Minor > Labels: pull-request-available > Fix For: 0.11.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3137) [Python] pyarrow 0.10 requires newer version of numpy than specified in requirements
[ https://issues.apache.org/jira/browse/ARROW-3137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou updated ARROW-3137: Fix Version/s: 0.11.0 > [Python] pyarrow 0.10 requires newer version of numpy than specified in > requirements > > > Key: ARROW-3137 > URL: https://issues.apache.org/jira/browse/ARROW-3137 > Project: Apache Arrow > Issue Type: Bug > Components: Packaging >Affects Versions: 0.10.0 >Reporter: James Campbell >Priority: Minor > Labels: pull-request-available > Fix For: 0.11.0 > > Time Spent: 10m > Remaining Estimate: 0h > > pyarrow 0.10 appears to have a binary incompatibility with numpy versions > prior to the 1.14.x series, but its requirements file claims support for > numpy>=1.10.0 > If an older version of numpy is used, the following RuntimeError results: > {{RuntimeError: module compiled against API version 0xc but this version of > numpy is 0xb}} > The following tox.ini file demonstrates the issue: > {{[tox] envlist=py27-numpy\{10,11,13,14,15}-pyarrow\{9,10} [testenv] deps = > numpy10: numpy>=1.10.0,<1.11 numpy11: numpy>=1.11.0,<1.12 numpy13: > numpy>=1.13.0,<1.14 numpy14: numpy>=1.14.0,<1.15 numpy15: numpy>=1.15.0,<1.16 > pyarrow9: pyarrow==0.9.0 pyarrow10: pyarrow==0.10.0 pytest commands = pytest > }} > Using a simple test function like the following: > {{def test_import_pyarrow(): import pyarrow }} > pyarrow 0.9 doesn't appear to have this issue. Was there a change in the > setup process for pyarrow 0.10 that no longer uses Cython to build? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3093) [C++] Linking errors with ORC enabled
[ https://issues.apache.org/jira/browse/ARROW-3093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou updated ARROW-3093: Component/s: C++ > [C++] Linking errors with ORC enabled > - > > Key: ARROW-3093 > URL: https://issues.apache.org/jira/browse/ARROW-3093 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 0.10.0 >Reporter: Antoine Pitrou >Priority: Major > Fix For: 0.11.0 > > > In an attempt to work around ARROW-3091 and ARROW-3092, I've recreated my > conda environment, and now I get linking errors if ORC support is enabled: > {code} > debug/libarrow.so.11.0.0: error: undefined reference to > 'google::protobuf::MessageLite::ParseFromString(std::string const&)' > debug/libarrow.so.11.0.0: error: undefined reference to > 'google::protobuf::MessageLite::SerializeToString(std::string*) const' > debug/libarrow.so.11.0.0: error: undefined reference to > 'google::protobuf::internal::fixed_address_empty_string' > [etc.] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3093) [C++] Linking errors with ORC enabled
[ https://issues.apache.org/jira/browse/ARROW-3093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou updated ARROW-3093: Fix Version/s: 0.11.0 > [C++] Linking errors with ORC enabled > - > > Key: ARROW-3093 > URL: https://issues.apache.org/jira/browse/ARROW-3093 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 0.10.0 >Reporter: Antoine Pitrou >Priority: Major > Fix For: 0.11.0 > > > In an attempt to work around ARROW-3091 and ARROW-3092, I've recreated my > conda environment, and now I get linking errors if ORC support is enabled: > {code} > debug/libarrow.so.11.0.0: error: undefined reference to > 'google::protobuf::MessageLite::ParseFromString(std::string const&)' > debug/libarrow.so.11.0.0: error: undefined reference to > 'google::protobuf::MessageLite::SerializeToString(std::string*) const' > debug/libarrow.so.11.0.0: error: undefined reference to > 'google::protobuf::internal::fixed_address_empty_string' > [etc.] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-3093) [C++] Linking errors with ORC enabled
[ https://issues.apache.org/jira/browse/ARROW-3093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou reassigned ARROW-3093: --- Assignee: Antoine Pitrou > [C++] Linking errors with ORC enabled > - > > Key: ARROW-3093 > URL: https://issues.apache.org/jira/browse/ARROW-3093 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 0.10.0 >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > Fix For: 0.11.0 > > > In an attempt to work around ARROW-3091 and ARROW-3092, I've recreated my > conda environment, and now I get linking errors if ORC support is enabled: > {code} > debug/libarrow.so.11.0.0: error: undefined reference to > 'google::protobuf::MessageLite::ParseFromString(std::string const&)' > debug/libarrow.so.11.0.0: error: undefined reference to > 'google::protobuf::MessageLite::SerializeToString(std::string*) const' > debug/libarrow.so.11.0.0: error: undefined reference to > 'google::protobuf::internal::fixed_address_empty_string' > [etc.] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3342) Appveyor builds have stopped triggering on GitHub
[ https://issues.apache.org/jira/browse/ARROW-3342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou updated ARROW-3342: Fix Version/s: 0.11.0 > Appveyor builds have stopped triggering on GitHub > - > > Key: ARROW-3342 > URL: https://issues.apache.org/jira/browse/ARROW-3342 > Project: Apache Arrow > Issue Type: Bug > Components: Continuous Integration >Reporter: Wes McKinney >Assignee: Antoine Pitrou >Priority: Major > Fix For: 0.11.0 > > > Not sure what's going on, but this is in the last couple of days -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-3342) Appveyor builds have stopped triggering on GitHub
[ https://issues.apache.org/jira/browse/ARROW-3342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou reassigned ARROW-3342: --- Assignee: Antoine Pitrou > Appveyor builds have stopped triggering on GitHub > - > > Key: ARROW-3342 > URL: https://issues.apache.org/jira/browse/ARROW-3342 > Project: Apache Arrow > Issue Type: Bug > Components: Continuous Integration >Reporter: Wes McKinney >Assignee: Antoine Pitrou >Priority: Major > Fix For: 0.11.0 > > > Not sure what's going on, but this is in the last couple of days -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3390) [C++] cmake file under windows msys2 system doesn't work
[ https://issues.apache.org/jira/browse/ARROW-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou updated ARROW-3390: Fix Version/s: 0.11.0 > [C++] cmake file under windows msys2 system doesn't work > > > Key: ARROW-3390 > URL: https://issues.apache.org/jira/browse/ARROW-3390 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 0.10.0 >Reporter: Dominic Sisneros >Priority: Major > Labels: windows > Fix For: 0.11.0 > > > I am trying to get this to build on a windows machine with msys2 installed. > I can generate a Makefile but nothing happens when I run make. I think it is > because for windows, the cmake file changes the shell to cmd.com. Under > msys2, it should run under the msys shell -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-3377) [Gandiva][C++] Remove If statement from bit map set function
[ https://issues.apache.org/jira/browse/ARROW-3377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney reassigned ARROW-3377: --- Assignee: Praveen Kumar Desabandu > [Gandiva][C++] Remove If statement from bit map set function > > > Key: ARROW-3377 > URL: https://issues.apache.org/jira/browse/ARROW-3377 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Gandiva >Reporter: Praveen Krishna >Assignee: Praveen Kumar Desabandu >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > Hello, > For setting a bit in bit map (which is used in gandiva) we have a branch > statement which can be replaced by bit operations like this > {code:java} > bmap[byteIdx] ^= (-value ^ bmap[byteIdx]) & (1UL << bitIdx); > {code} > which performs the same operation and we have avoid the branching. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-3377) [Gandiva][C++] Remove If statement from bit map set function
[ https://issues.apache.org/jira/browse/ARROW-3377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-3377. - Resolution: Fixed Fix Version/s: (was: 0.12.0) 0.11.0 Issue resolved by pull request 2672 [https://github.com/apache/arrow/pull/2672] > [Gandiva][C++] Remove If statement from bit map set function > > > Key: ARROW-3377 > URL: https://issues.apache.org/jira/browse/ARROW-3377 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Gandiva >Reporter: Praveen Krishna >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > Time Spent: 2h > Remaining Estimate: 0h > > Hello, > For setting a bit in bit map (which is used in gandiva) we have a branch > statement which can be replaced by bit operations like this > {code:java} > bmap[byteIdx] ^= (-value ^ bmap[byteIdx]) & (1UL << bitIdx); > {code} > which performs the same operation and we have avoid the branching. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3410) [C++] Streaming CSV reader interface for memory-constrainted environments
Wes McKinney created ARROW-3410: --- Summary: [C++] Streaming CSV reader interface for memory-constrainted environments Key: ARROW-3410 URL: https://issues.apache.org/jira/browse/ARROW-3410 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Wes McKinney Fix For: 0.12.0 CSV reads are currently all-or-nothing. If the results of parsing a CSV file do not fit into memory, this can be a problem. I propose to define a streaming {{RecordBatchReader}} interface so that the record batches produced by reading can be written out immediately to a stream on disk, to be memory mapped later -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-3404) [C++] Make CSV chunker faster
[ https://issues.apache.org/jira/browse/ARROW-3404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-3404. - Resolution: Fixed Fix Version/s: 0.11.0 Issue resolved by pull request 2684 [https://github.com/apache/arrow/pull/2684] > [C++] Make CSV chunker faster > - > > Key: ARROW-3404 > URL: https://issues.apache.org/jira/browse/ARROW-3404 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Affects Versions: 0.11.0 >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > Time Spent: 40m > Remaining Estimate: 0h > > Currently the CSV chunker can be the bottleneck in multi-threaded reads > (starting from 6 threads, according to my experiments). One way to make it > faster is to consider by default that CSV values cannot contain newline > characters (overridable via a setting), and then simply search for the last > newline character in each block of data. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3409) [C++] Add streaming compression interfaces
Antoine Pitrou created ARROW-3409: - Summary: [C++] Add streaming compression interfaces Key: ARROW-3409 URL: https://issues.apache.org/jira/browse/ARROW-3409 Project: Apache Arrow Issue Type: Improvement Components: C++ Affects Versions: 0.11.0 Reporter: Antoine Pitrou Assignee: Antoine Pitrou Currently the compression and decompression methods offered in {{arrow/util/compression.h}} are one-shot. We also need to expose streaming compressor and decompressor interfaces. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3408) [C++] Add option to CSV reader to dictionary encode individual columns or all string / binary columns
Wes McKinney created ARROW-3408: --- Summary: [C++] Add option to CSV reader to dictionary encode individual columns or all string / binary columns Key: ARROW-3408 URL: https://issues.apache.org/jira/browse/ARROW-3408 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Wes McKinney Fix For: 0.12.0 For many datasets, dictionary encoding everything can result in drastically lower memory usage and subsequently better performance in doing analytics One difficulty of dictionary encoding in multithreaded conversions is that ideally you end up with one dictionary at the end. So you have two options: * Implement a concurrent hashing scheme -- for low cardinality dictionaries, the overhead associated with mutex contention will not be meaningful, for high cardinality it can be more of a problem * Hash each chunk separately, then normalize at the end My guess is that a crude concurrent hash table with a mutex to protect mutations and resizes is going to outperform the latter -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-1019) [C++] Implement input stream and output stream with Gzip codec
[ https://issues.apache.org/jira/browse/ARROW-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou reassigned ARROW-1019: - Assignee: Antoine Pitrou > [C++] Implement input stream and output stream with Gzip codec > -- > > Key: ARROW-1019 > URL: https://issues.apache.org/jira/browse/ARROW-1019 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Wes McKinney >Assignee: Antoine Pitrou >Priority: Major > Labels: csv > Fix For: 0.12.0 > > > After incorporating the compression code and toolchain from parquet-cpp, we > should be able to add a codec layer for on-the-fly compression and > decompression -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3407) [C++] Add UTF8 conversion modes in CSV reader conversion options
Wes McKinney created ARROW-3407: --- Summary: [C++] Add UTF8 conversion modes in CSV reader conversion options Key: ARROW-3407 URL: https://issues.apache.org/jira/browse/ARROW-3407 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Wes McKinney Fix For: 0.12.0 There should be a few options: * Assume UTF8, but do not verify ("no seatbelts mode", for users that have reasonable security about UTF8 and want the maximum performance) * Full UTF8 verification * Maybe ASCII-only verification (because ASCII verification is very fast) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-3406) [C++] Create a caching memory pool implementation
[ https://issues.apache.org/jira/browse/ARROW-3406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16635865#comment-16635865 ] Wes McKinney commented on ARROW-3406: - Not really related, but on the subject of other kinds of allocators wanted to make you aware of the chunked allocator that's used (I think) in the Parquet encoding routines https://github.com/apache/parquet-cpp/blob/master/src/parquet/util/memory.h#L100 > [C++] Create a caching memory pool implementation > - > > Key: ARROW-3406 > URL: https://issues.apache.org/jira/browse/ARROW-3406 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Affects Versions: 0.11.0 >Reporter: Antoine Pitrou >Priority: Major > > A caching memory pool implementation would be able to recycle freed memory > blocks instead of returning them to the system immediately. Two different > policies may be chosen: > * either an unbounded cache > * or a size-limited cache, perhaps with some kind of LRU mechanism > Such a feature might help e.g. for CSV parsing, when reading and parsing data > into temporary memory buffers. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-3173) [Rust] dynamic_types example does not run
[ https://issues.apache.org/jira/browse/ARROW-3173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16635775#comment-16635775 ] Paddy Horan commented on ARROW-3173: [~kou] sorry for leaving the "components" off the issues, I'll make sure to add it in the future. > [Rust] dynamic_types example does not run > - > > Key: ARROW-3173 > URL: https://issues.apache.org/jira/browse/ARROW-3173 > Project: Apache Arrow > Issue Type: Bug > Components: Rust >Reporter: Paddy Horan >Assignee: Paddy Horan >Priority: Minor > Labels: pull-request-available > Fix For: 0.11.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3406) [C++] Create a caching memory pool implementation
Antoine Pitrou created ARROW-3406: - Summary: [C++] Create a caching memory pool implementation Key: ARROW-3406 URL: https://issues.apache.org/jira/browse/ARROW-3406 Project: Apache Arrow Issue Type: Improvement Components: C++ Affects Versions: 0.11.0 Reporter: Antoine Pitrou A caching memory pool implementation would be able to recycle freed memory blocks instead of returning them to the system immediately. Two different policies may be chosen: * either an unbounded cache * or a size-limited cache, perhaps with some kind of LRU mechanism Such a feature might help e.g. for CSV parsing, when reading and parsing data into temporary memory buffers. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3405) [Python] Document CSV reader
Antoine Pitrou created ARROW-3405: - Summary: [Python] Document CSV reader Key: ARROW-3405 URL: https://issues.apache.org/jira/browse/ARROW-3405 Project: Apache Arrow Issue Type: Bug Components: Documentation, Python Affects Versions: 0.11.0 Reporter: Antoine Pitrou We should document the Python CSV reader, or at least auto-document the various classes and functions. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3405) [Python] Document CSV reader
[ https://issues.apache.org/jira/browse/ARROW-3405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-3405: -- Description: We should document the Python CSV reader, or at least auto-document the various classes and functions. Perhaps we should first wait for the API to stabilize. was:We should document the Python CSV reader, or at least auto-document the various classes and functions. > [Python] Document CSV reader > > > Key: ARROW-3405 > URL: https://issues.apache.org/jira/browse/ARROW-3405 > Project: Apache Arrow > Issue Type: Bug > Components: Documentation, Python >Affects Versions: 0.11.0 >Reporter: Antoine Pitrou >Priority: Major > > We should document the Python CSV reader, or at least auto-document the > various classes and functions. > Perhaps we should first wait for the API to stabilize. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3404) [C++] Make CSV chunker faster
[ https://issues.apache.org/jira/browse/ARROW-3404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-3404: -- Labels: pull-request-available (was: ) > [C++] Make CSV chunker faster > - > > Key: ARROW-3404 > URL: https://issues.apache.org/jira/browse/ARROW-3404 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Affects Versions: 0.11.0 >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > > Currently the CSV chunker can be the bottleneck in multi-threaded reads > (starting from 6 threads, according to my experiments). One way to make it > faster is to consider by default that CSV values cannot contain newline > characters (overridable via a setting), and then simply search for the last > newline character in each block of data. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3404) [C++] Make CSV chunker faster
Antoine Pitrou created ARROW-3404: - Summary: [C++] Make CSV chunker faster Key: ARROW-3404 URL: https://issues.apache.org/jira/browse/ARROW-3404 Project: Apache Arrow Issue Type: Improvement Components: C++ Affects Versions: 0.11.0 Reporter: Antoine Pitrou Assignee: Antoine Pitrou Currently the CSV chunker can be the bottleneck in multi-threaded reads (starting from 6 threads, according to my experiments). One way to make it faster is to consider by default that CSV values cannot contain newline characters (overridable via a setting), and then simply search for the last newline character in each block of data. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-3400) [Packaging] Add support Parquet GLib related Linux packages
[ https://issues.apache.org/jira/browse/ARROW-3400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou resolved ARROW-3400. - Resolution: Fixed Fix Version/s: 0.11.0 Issue resolved by pull request 2682 [https://github.com/apache/arrow/pull/2682] > [Packaging] Add support Parquet GLib related Linux packages > --- > > Key: ARROW-3400 > URL: https://issues.apache.org/jira/browse/ARROW-3400 > Project: Apache Arrow > Issue Type: Improvement > Components: GLib >Reporter: Kouhei Sutou >Assignee: Kouhei Sutou >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-2782) [Python] Ongoing Travis CI failures in Plasma unit tests
[ https://issues.apache.org/jira/browse/ARROW-2782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou reassigned ARROW-2782: --- Assignee: Philipp Moritz > [Python] Ongoing Travis CI failures in Plasma unit tests > > > Key: ARROW-2782 > URL: https://issues.apache.org/jira/browse/ARROW-2782 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Wes McKinney >Assignee: Philipp Moritz >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > Time Spent: 1h > Remaining Estimate: 0h > > e.g. > {code} > [1m[31m_ test_use_huge_pages > __[0m > [1m@pytest.mark.skipif(not os.path.exists("/mnt/hugepages"),[0m > [1mreason="requires hugepage support")[0m > [1mdef test_use_huge_pages():[0m > [1mimport pyarrow.plasma as plasma[0m > [1mwith plasma.start_plasma_store([0m > [1mplasma_store_memory=2*10**9,[0m > [1mplasma_directory="/mnt/hugepages",[0m > [1muse_hugepages=True) as (plasma_store_name, p):[0m > [1mplasma_client = plasma.connect(plasma_store_name, "", 64)[0m > [1m> create_object(plasma_client, 10**8)[0m > [1m[31mpyarrow/tests/test_plasma.py[0m:773: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > [1m[31mpyarrow/tests/test_plasma.py[0m:79: in create_object > [1mseal=seal)[0m > [1m[31mpyarrow/tests/test_plasma.py[0m:68: in create_object_with_id > [1mmemory_buffer = client.create(object_id, data_size, metadata)[0m > [1m[31mpyarrow/_plasma.pyx[0m:300: in pyarrow._plasma.PlasmaClient.create > [1mcheck_status(self.client.get().Create(object_id.data, data_size,[0m > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > [1m> raise PlasmaStoreFull(message)[0m > [1m[31mE PlasmaStoreFull: > /home/travis/build/apache/arrow/cpp/src/plasma/client.cc:375 code: > ReadCreateReply(buffer.data(), buffer.size(), , , _fd, > _size)[0m > [1m[31mE object does not fit in the plasma store[0m > [1m[31mpyarrow/error.pxi[0m:99: PlasmaStoreFull > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-3011) [CI] Remove Slack notification
[ https://issues.apache.org/jira/browse/ARROW-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou reassigned ARROW-3011: --- Assignee: Krisztian Szucs > [CI] Remove Slack notification > -- > > Key: ARROW-3011 > URL: https://issues.apache.org/jira/browse/ARROW-3011 > Project: Apache Arrow > Issue Type: Bug > Components: Continuous Integration >Reporter: Wes McKinney >Assignee: Krisztian Szucs >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Remove code from ARROW-2682 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-3015) [Python] Fix documentation typo for pa.uint8
[ https://issues.apache.org/jira/browse/ARROW-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou reassigned ARROW-3015: --- Assignee: Antoine Pitrou > [Python] Fix documentation typo for pa.uint8 > > > Key: ARROW-3015 > URL: https://issues.apache.org/jira/browse/ARROW-3015 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Wes McKinney >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > Time Spent: 20m > Remaining Estimate: 0h > > See > http://arrow.apache.org/docs/python/generated/pyarrow.uint8.html#pyarrow.uint8 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-3018) [Plasma] Improve random ObjectID generation
[ https://issues.apache.org/jira/browse/ARROW-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou reassigned ARROW-3018: --- Assignee: Philipp Moritz > [Plasma] Improve random ObjectID generation > --- > > Key: ARROW-3018 > URL: https://issues.apache.org/jira/browse/ARROW-3018 > Project: Apache Arrow > Issue Type: Improvement > Components: Plasma (C++) >Affects Versions: 0.10.0 >Reporter: Philipp Moritz >Assignee: Philipp Moritz >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > Time Spent: 3h 50m > Remaining Estimate: 0h > > As pointed out by [~pitrou], the mersenne twister in Plasma is currently not > seeded appropriately (I just saw the comment recently): > https://github.com/apache/arrow/pull/2039 > I can submit a patch for Plasma but I'm also wondering if we should have a > properly seeded random number in Arrow. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-3047) [C++] cmake downloads and builds ORC even though it's installed
[ https://issues.apache.org/jira/browse/ARROW-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou reassigned ARROW-3047: --- Assignee: Antoine Pitrou > [C++] cmake downloads and builds ORC even though it's installed > --- > > Key: ARROW-3047 > URL: https://issues.apache.org/jira/browse/ARROW-3047 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 0.10.0 >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > I have installed orc 1.5.1 from conda-forge, but our cmake build chain still > tries to build protobuf and ORC from source (and fails). > {code:bash} > $ ls $CONDA_PREFIX/include/orc/ > ColumnPrinter.hh Common.hh Exceptions.hh Int128.hh MemoryPool.hh > orc-config.hh OrcFile.hh Reader.hh Statistics.hh Type.hh Vector.hh > Writer.hh > $ ls -l $CONDA_PREFIX/lib/liborc* > -rw-rw-r-- 2 antoine antoine 1952298 juin 20 17:32 > /home/antoine/miniconda3/envs/pyarrow/lib/liborc.a > {code} > [~jim.crist] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-3127) [C++] Add Tutorial about Sending Tensor from C++ to Python
[ https://issues.apache.org/jira/browse/ARROW-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou reassigned ARROW-3127: --- Assignee: Simon Mo > [C++] Add Tutorial about Sending Tensor from C++ to Python > -- > > Key: ARROW-3127 > URL: https://issues.apache.org/jira/browse/ARROW-3127 > Project: Apache Arrow > Issue Type: Improvement > Components: Website >Reporter: Simon Mo >Assignee: Simon Mo >Priority: Minor > Labels: pull-request-available > Fix For: 0.11.0 > > Time Spent: 50m > Remaining Estimate: 0h > > I can add a short tutorial showing how to > # Serialize a floating-point array in C++ into Tensor > # Save the Tensor to Plasma > # Access the Tensor in Python > c.f. [https://github.com/apache/arrow/pull/2481] > cc @[pcmoritz|https://github.com/pcmoritz] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-3142) [C++] Fetch all libs from toolchain environment
[ https://issues.apache.org/jira/browse/ARROW-3142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou reassigned ARROW-3142: --- Assignee: Antoine Pitrou > [C++] Fetch all libs from toolchain environment > --- > > Key: ARROW-3142 > URL: https://issues.apache.org/jira/browse/ARROW-3142 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Affects Versions: 0.10.0 >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > When setting ARROW_BUILD_TOOLCHAIN, gtest and orc are currently not taken > from the toolchain environment. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-3175) [Java] Upgrade to official FlatBuffers release (Flatbuffers incompatibility)
[ https://issues.apache.org/jira/browse/ARROW-3175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou reassigned ARROW-3175: --- Assignee: Li Jin > [Java] Upgrade to official FlatBuffers release (Flatbuffers incompatibility) > > > Key: ARROW-3175 > URL: https://issues.apache.org/jira/browse/ARROW-3175 > Project: Apache Arrow > Issue Type: Bug > Components: Java >Affects Versions: 0.10.0 >Reporter: Alex Black >Assignee: Li Jin >Priority: Blocker > Labels: pull-request-available > Fix For: 0.11.0 > > Time Spent: 7h 50m > Remaining Estimate: 0h > > Arrow Java currently uses an unofficial flatbuffers dependency - > com.vlkan:flatbuffers: > [https://github.com/apache/arrow/blob/master/java/pom.xml#L481-L485] > The likely motivation here is that previously, no Java flatbuffers > implementation was available on maven central. > [https://github.com/vy/flatbuffers] > > Unfortunately, FlatBuffers project does not publish any artifacts to the > Maven Central Repository > However, this is no longer the case: > > [https://search.maven.org/search?q=g:com.google.flatbuffers%20AND%20a:flatbuffers-java=gav] > The flatbuffers version used in Arrow java is a nearly 3-year-old snapshot, > not even a version of an official release: > [https://github.com/vy/flatbuffers#usage] > The main problem is that this version of flatbuffers is not compatible with > the official releases of flatbuffers. > For example, we use the official flatbuffers releases in ND4J and > Deeplearning4j: [https://github.com/deeplearning4j/deeplearning4j] > Running Arrow with an official flatbuffers library on the classpath results > in issues such as: > {noformat} > java.lang.NoSuchMethodError: > com.google.flatbuffers.FlatBufferBuilder.createString(Ljava/lang/String;)I > at org.apache.arrow.vector.types.pojo.Field.getField(Field.java:154) > at org.apache.arrow.vector.types.pojo.Schema.getSchema(Schema.java:145) > at > org.apache.arrow.vector.ipc.message.MessageSerializer.serialize(MessageSerializer.java:124) > at > org.apache.arrow.vector.ipc.ArrowWriter.ensureStarted(ArrowWriter.java:136) > at org.apache.arrow.vector.ipc.ArrowWriter.start(ArrowWriter.java:97) > at FlatBuffersDependencyIssue.test(FlatBuffersDependencyIssue.java:56) > {noformat} > > Simply excluding the com.vlkan:flatbuffers dependency in lieu of an official > flatbuffers release is not a solution (same exception as above) and we aren't > prepared to downgrade all of our projects to use the flatbuffers version that > Arrow currently requires. > Consequently, this is a major issue that prevents us using Arrow in our > libraries. > I have prepared a simple repository to reproduce this issue, if required: > [https://github.com/AlexDBlack/arrowflatbufferstest] > Is there a reason for using this particular version of flatbuffers, and if > not, can Arrow java use an official release of flatbuffers instead? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-3281) [Java] Make sure that WritableByteChannel in WriteChannel writes out complete bytes
[ https://issues.apache.org/jira/browse/ARROW-3281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou reassigned ARROW-3281: --- Assignee: Animesh Trivedi > [Java] Make sure that WritableByteChannel in WriteChannel writes out complete > bytes > --- > > Key: ARROW-3281 > URL: https://issues.apache.org/jira/browse/ARROW-3281 > Project: Apache Arrow > Issue Type: Improvement > Components: Java >Reporter: Animesh Trivedi >Assignee: Animesh Trivedi >Priority: Trivial > Labels: pull-request-available > Fix For: 0.11.0 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > In the current WriteChannel class, the write function just calls to push the > ByteBuffer into the WritableByteChannel. However, there is no guarantee if > the whole buffer has been consumed by the channel in one go. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3196) Enable merge_arrow_py.py script to merge Parquet patches and set fix versions
[ https://issues.apache.org/jira/browse/ARROW-3196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou updated ARROW-3196: Component/s: Developer Tools > Enable merge_arrow_py.py script to merge Parquet patches and set fix versions > - > > Key: ARROW-3196 > URL: https://issues.apache.org/jira/browse/ARROW-3196 > Project: Apache Arrow > Issue Type: New Feature > Components: Developer Tools >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > Follow up to ARROW-3075 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2865) [C++/Python] Reduce some duplicated code in python/builtin_convert.cc
[ https://issues.apache.org/jira/browse/ARROW-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou updated ARROW-2865: Component/s: Python C++ > [C++/Python] Reduce some duplicated code in python/builtin_convert.cc > - > > Key: ARROW-2865 > URL: https://issues.apache.org/jira/browse/ARROW-2865 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Python >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Fix For: 0.11.0 > > > See discussion in https://github.com/apache/arrow/pull/2270 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2960) [Packaging] Fix verify-release-candidate for binary packages and fix release cutting script for lib64 cmake issue
[ https://issues.apache.org/jira/browse/ARROW-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou updated ARROW-2960: Component/s: Packaging > [Packaging] Fix verify-release-candidate for binary packages and fix release > cutting script for lib64 cmake issue > - > > Key: ARROW-2960 > URL: https://issues.apache.org/jira/browse/ARROW-2960 > Project: Apache Arrow > Issue Type: Task > Components: Packaging >Affects Versions: 0.9.0 >Reporter: Phillip Cloud >Assignee: Phillip Cloud >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > Time Spent: 20m > Remaining Estimate: 0h > > The binary package verification function isn't correct, as it is not > downloading packages and associated checksums and signatures at all. > We also need to set {{CMAKE_INSTALL_LIBDIR}} in the source release creation > script because cmake uses {{lib64}} instead of {{lib}} (and the script > assumes {{lib}}) on platforms whose install libdir it doesn't know about a > priori. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3069) [Release] Stop using SHA1 checksums per ASF policy
[ https://issues.apache.org/jira/browse/ARROW-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou updated ARROW-3069: Component/s: Packaging > [Release] Stop using SHA1 checksums per ASF policy > -- > > Key: ARROW-3069 > URL: https://issues.apache.org/jira/browse/ARROW-3069 > Project: Apache Arrow > Issue Type: Improvement > Components: Packaging >Reporter: Wes McKinney >Assignee: Krisztian Szucs >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > Time Spent: 20m > Remaining Estimate: 0h > > https://www.apache.org/dev/release-distribution#sigs-and-sums -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3094) [Python] Allow lighter construction of pa.Schema / pa.StructType
[ https://issues.apache.org/jira/browse/ARROW-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou updated ARROW-3094: Component/s: Python > [Python] Allow lighter construction of pa.Schema / pa.StructType > > > Key: ARROW-3094 > URL: https://issues.apache.org/jira/browse/ARROW-3094 > Project: Apache Arrow > Issue Type: Wish > Components: Python >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > One shouldn't have to call {{pa.field}} explicitly. See this example: > https://github.com/apache/arrow/pull/2449/files#diff-a01a3e7cbe0d7dd0ec300a725ac0c0c6R148 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3174) [Rust] run examples as part of CI
[ https://issues.apache.org/jira/browse/ARROW-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou updated ARROW-3174: Component/s: Rust > [Rust] run examples as part of CI > - > > Key: ARROW-3174 > URL: https://issues.apache.org/jira/browse/ARROW-3174 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Reporter: Paddy Horan >Assignee: Paddy Horan >Priority: Trivial > Labels: pull-request-available > Fix For: 0.11.0 > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3173) [Rust] dynamic_types example does not run
[ https://issues.apache.org/jira/browse/ARROW-3173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou updated ARROW-3173: Component/s: Rust > [Rust] dynamic_types example does not run > - > > Key: ARROW-3173 > URL: https://issues.apache.org/jira/browse/ARROW-3173 > Project: Apache Arrow > Issue Type: Bug > Components: Rust >Reporter: Paddy Horan >Assignee: Paddy Horan >Priority: Minor > Labels: pull-request-available > Fix For: 0.11.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3177) [Rust] Update expected error messages for tests that 'should panic'
[ https://issues.apache.org/jira/browse/ARROW-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou updated ARROW-3177: Component/s: Rust > [Rust] Update expected error messages for tests that 'should panic' > --- > > Key: ARROW-3177 > URL: https://issues.apache.org/jira/browse/ARROW-3177 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Reporter: Paddy Horan >Assignee: Paddy Horan >Priority: Trivial > Labels: pull-request-available > Fix For: 0.11.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3375) [Rust] Remove memory_pool.rs
[ https://issues.apache.org/jira/browse/ARROW-3375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kouhei Sutou updated ARROW-3375: Component/s: Rust > [Rust] Remove memory_pool.rs > > > Key: ARROW-3375 > URL: https://issues.apache.org/jira/browse/ARROW-3375 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Affects Versions: 0.10.0 >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Trivial > Labels: pull-request-available > Fix For: 0.11.0 > > Time Spent: 40m > Remaining Estimate: 0h > > A while back we approved a PR to add a custom memory pool but it isn't > actually used. Rust has other mechanisms now for specifying custom memory > allocators so I think we should remove this unused code. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-3403) [Website] Source tarball link missing from install page
[ https://issues.apache.org/jira/browse/ARROW-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-3403. - Resolution: Fixed Issue resolved by pull request 2683 [https://github.com/apache/arrow/pull/2683] > [Website] Source tarball link missing from install page > --- > > Key: ARROW-3403 > URL: https://issues.apache.org/jira/browse/ARROW-3403 > Project: Apache Arrow > Issue Type: Bug > Components: Website >Reporter: Wes McKinney >Assignee: Krisztian Szucs >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > Time Spent: 10m > Remaining Estimate: 0h > > This can be seen on http://arrow.apache.org/install/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3403) [Website] Source tarball link missing from install page
[ https://issues.apache.org/jira/browse/ARROW-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-3403: -- Labels: pull-request-available (was: ) > [Website] Source tarball link missing from install page > --- > > Key: ARROW-3403 > URL: https://issues.apache.org/jira/browse/ARROW-3403 > Project: Apache Arrow > Issue Type: Bug > Components: Website >Reporter: Wes McKinney >Assignee: Krisztian Szucs >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > > This can be seen on http://arrow.apache.org/install/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-3403) [Website] Source tarball link missing from install page
[ https://issues.apache.org/jira/browse/ARROW-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs reassigned ARROW-3403: -- Assignee: Krisztian Szucs > [Website] Source tarball link missing from install page > --- > > Key: ARROW-3403 > URL: https://issues.apache.org/jira/browse/ARROW-3403 > Project: Apache Arrow > Issue Type: Bug > Components: Website >Reporter: Wes McKinney >Assignee: Krisztian Szucs >Priority: Major > Fix For: 0.11.0 > > > This can be seen on http://arrow.apache.org/install/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-3397) [C++] Use relative CMake path for modules
[ https://issues.apache.org/jira/browse/ARROW-3397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney reassigned ARROW-3397: --- Assignee: Ivan Zhukov > [C++] Use relative CMake path for modules > - > > Key: ARROW-3397 > URL: https://issues.apache.org/jira/browse/ARROW-3397 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Assignee: Ivan Zhukov >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3403) [Website] Source tarball link missing from install page
Wes McKinney created ARROW-3403: --- Summary: [Website] Source tarball link missing from install page Key: ARROW-3403 URL: https://issues.apache.org/jira/browse/ARROW-3403 Project: Apache Arrow Issue Type: Bug Components: Website Reporter: Wes McKinney Fix For: 0.11.0 This can be seen on http://arrow.apache.org/install/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-3392) [Python] Support filters in disjunctive normal form in ParquetDataset
[ https://issues.apache.org/jira/browse/ARROW-3392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-3392. - Resolution: Fixed Fix Version/s: (was: 0.12.0) 0.11.0 Issue resolved by pull request 2677 [https://github.com/apache/arrow/pull/2677] > [Python] Support filters in disjunctive normal form in ParquetDataset > - > > Key: ARROW-3392 > URL: https://issues.apache.org/jira/browse/ARROW-3392 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Uwe L. Korn >Assignee: Uwe L. Korn >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > This allows us to represent any boolean predicate. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3402) [Gandiva][C++] Utilize common bitmap operation implementations in precompiled IR routines
Wes McKinney created ARROW-3402: --- Summary: [Gandiva][C++] Utilize common bitmap operation implementations in precompiled IR routines Key: ARROW-3402 URL: https://issues.apache.org/jira/browse/ARROW-3402 Project: Apache Arrow Issue Type: Improvement Components: C++, Gandiva Reporter: Wes McKinney Fix For: 0.12.0 It should be possible to use common inline/header-only implementations of bitmap operations in Gandiva functions which are being precompiled to LLVM IR -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3386) [Gandiva] [Java] Build platform independent JAR package
[ https://issues.apache.org/jira/browse/ARROW-3386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3386: Component/s: Java Gandiva > [Gandiva] [Java] Build platform independent JAR package > --- > > Key: ARROW-3386 > URL: https://issues.apache.org/jira/browse/ARROW-3386 > Project: Apache Arrow > Issue Type: Task > Components: Gandiva, Java >Reporter: Praveen Kumar Desabandu >Priority: Major > > Currently we only package .so for the gandiva jar, we would need a packaged > lib for windows and mac. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3384) [Gandiva] Sync remaining commits from gandiva repo
[ https://issues.apache.org/jira/browse/ARROW-3384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3384: Summary: [Gandiva] Sync remaining commits from gandiva repo (was: Sync remaining commits from gandiva repo) > [Gandiva] Sync remaining commits from gandiva repo > -- > > Key: ARROW-3384 > URL: https://issues.apache.org/jira/browse/ARROW-3384 > Project: Apache Arrow > Issue Type: Task > Components: C++, Gandiva >Reporter: Praveen Kumar Desabandu >Priority: Major > > After initial merge some new commits were done in gandiva, we need to port > them to the arrow repo. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-3382) [C++] Run Gandiva tests in Travis CI
[ https://issues.apache.org/jira/browse/ARROW-3382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16635366#comment-16635366 ] Wes McKinney commented on ARROW-3382: - Might make sense to have a single CI entry that runs the Gandiva tests both for C++ and JAva > [C++] Run Gandiva tests in Travis CI > > > Key: ARROW-3382 > URL: https://issues.apache.org/jira/browse/ARROW-3382 > Project: Apache Arrow > Issue Type: Task > Components: C++ >Reporter: Praveen Kumar Desabandu >Priority: Major > > Integrate and test Gandiva-Cpp in travis. This would unblock new PRs to > gandiva. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3386) [Gandiva] [Java] Build platform independent JAR package
[ https://issues.apache.org/jira/browse/ARROW-3386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3386: Summary: [Gandiva] [Java] Build platform independent JAR package (was: Platform independent gandiva jar) > [Gandiva] [Java] Build platform independent JAR package > --- > > Key: ARROW-3386 > URL: https://issues.apache.org/jira/browse/ARROW-3386 > Project: Apache Arrow > Issue Type: Task >Reporter: Praveen Kumar Desabandu >Priority: Major > > Currently we only package .so for the gandiva jar, we would need a packaged > lib for windows and mac. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3383) [Java] Run Gandiva tests in Travis CI
[ https://issues.apache.org/jira/browse/ARROW-3383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3383: Summary: [Java] Run Gandiva tests in Travis CI (was: Gandiva Java in travis ci) > [Java] Run Gandiva tests in Travis CI > - > > Key: ARROW-3383 > URL: https://issues.apache.org/jira/browse/ARROW-3383 > Project: Apache Arrow > Issue Type: Task > Components: Gandiva, Java >Reporter: Praveen Kumar Desabandu >Priority: Major > Fix For: 0.12.0 > > > Enable and test for gandiva java in travis ci. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3401) [C++] Pluggable statistics collector API for unconvertible CSV values
Wes McKinney created ARROW-3401: --- Summary: [C++] Pluggable statistics collector API for unconvertible CSV values Key: ARROW-3401 URL: https://issues.apache.org/jira/browse/ARROW-3401 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Wes McKinney Fix For: 0.12.0 It would be useful to be able to collect statistics (e.g. distinct value counts) about values in a column of a CSV file that cannot be converted to a desired data type. When conversion fails, the converters can call into an abstract API like {code} statistics_->CannotConvert(token, size); {code} or something similar -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-3395) [C++/Python] Add docker container for linting
[ https://issues.apache.org/jira/browse/ARROW-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn resolved ARROW-3395. Resolution: Fixed Fix Version/s: 0.11.0 Issue resolved by pull request 2680 [https://github.com/apache/arrow/pull/2680] > [C++/Python] Add docker container for linting > - > > Key: ARROW-3395 > URL: https://issues.apache.org/jira/browse/ARROW-3395 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Python >Reporter: Uwe L. Korn >Assignee: Uwe L. Korn >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Add a docker container that runs clang-format and flake8 checks. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3400) [Packaging] Add support Parquet GLib related Linux packages
[ https://issues.apache.org/jira/browse/ARROW-3400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-3400: -- Labels: pull-request-available (was: ) > [Packaging] Add support Parquet GLib related Linux packages > --- > > Key: ARROW-3400 > URL: https://issues.apache.org/jira/browse/ARROW-3400 > Project: Apache Arrow > Issue Type: Improvement > Components: GLib >Reporter: Kouhei Sutou >Assignee: Kouhei Sutou >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-3399) Cannot serialize numpy matrix object
[ https://issues.apache.org/jira/browse/ARROW-3399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16635077#comment-16635077 ] Mitar commented on ARROW-3399: -- Oh, the difference is not between Arrow 0.9.0 and 0.10.0 but between numpy 1.14.3 and 1.15.2. Upgrading numpy to latest version throws the error above, while it works on an older version. > Cannot serialize numpy matrix object > > > Key: ARROW-3399 > URL: https://issues.apache.org/jira/browse/ARROW-3399 > Project: Apache Arrow > Issue Type: Bug >Affects Versions: 0.10.0 >Reporter: Mitar >Priority: Major > > This is a regression from 0.9.0 and happens with 0.10.0 with Python 3.6.5 on > Linux. > {code:java} > from pyarrow import plasma > import numpy > import time > import subprocess > import os > import signal > m = numpy.matrix(numpy.array([[1, 2], [3, 4]])) > process = subprocess.Popen(['plasma_store', '-m', '100', '-s', > '/tmp/plasma', '-d', '/dev/shm'], stdout=subprocess.DEVNULL, > stderr=subprocess.DEVNULL, encoding='utf8', preexec_fn=os.setpgrp) > time.sleep(5) > client = plasma.connect('/tmp/plasma', '', 0) > try: > client.put(m) > finally: > client.disconnect() > os.killpg(os.getpgid(process.pid), signal.SIGTERM) > {code} > Error: > {noformat} > File "pyarrow/_plasma.pyx", line 397, in pyarrow._plasma.PlasmaClient.put > File "pyarrow/serialization.pxi", line 338, in pyarrow.lib.serialize > File "pyarrow/error.pxi", line 89, in pyarrow.lib.check_status > pyarrow.lib.ArrowNotImplementedError: This object exceeds the maximum > recursion depth. It may contain itself recursively.{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3399) Cannot serialize numpy matrix object
Mitar created ARROW-3399: Summary: Cannot serialize numpy matrix object Key: ARROW-3399 URL: https://issues.apache.org/jira/browse/ARROW-3399 Project: Apache Arrow Issue Type: Bug Affects Versions: 0.10.0 Reporter: Mitar This is a regression from 0.9.0 and happens with 0.10.0 with Python 3.6.5 on Linux. {code:java} from pyarrow import plasma import numpy import time import subprocess import os import signal m = numpy.matrix(numpy.array([[1, 2], [3, 4]])) process = subprocess.Popen(['plasma_store', '-m', '100', '-s', '/tmp/plasma', '-d', '/dev/shm'], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL, encoding='utf8', preexec_fn=os.setpgrp) time.sleep(5) client = plasma.connect('/tmp/plasma', '', 0) try: client.put(m) finally: client.disconnect() os.killpg(os.getpgid(process.pid), signal.SIGTERM) {code} Error: {noformat} File "pyarrow/_plasma.pyx", line 397, in pyarrow._plasma.PlasmaClient.put File "pyarrow/serialization.pxi", line 338, in pyarrow.lib.serialize File "pyarrow/error.pxi", line 89, in pyarrow.lib.check_status pyarrow.lib.ArrowNotImplementedError: This object exceeds the maximum recursion depth. It may contain itself recursively.{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)