[jira] [Commented] (ARROW-3382) [C++] Run Gandiva tests in Travis CI

2018-10-02 Thread Pindikura Ravindra (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16636481#comment-16636481
 ] 

Pindikura Ravindra commented on ARROW-3382:
---

sure [~wesmckinn]. but, the java build is dependant on cpp build for gandiva. 
so, we want to get the cpp build and tests running first.  

> [C++] Run Gandiva tests in Travis CI
> 
>
> Key: ARROW-3382
> URL: https://issues.apache.org/jira/browse/ARROW-3382
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++
>Reporter: Praveen Kumar Desabandu
>Priority: Major
>
> Integrate and test Gandiva-Cpp in travis. This would unblock new PRs to 
> gandiva.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3173) [Rust] dynamic_types example does not run

2018-10-02 Thread Kouhei Sutou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16636305#comment-16636305
 ] 

Kouhei Sutou commented on ARROW-3173:
-

Thanks! :-)

> [Rust] dynamic_types example does not run
> -
>
> Key: ARROW-3173
> URL: https://issues.apache.org/jira/browse/ARROW-3173
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Paddy Horan
>Assignee: Paddy Horan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3137) [Python] pyarrow 0.10 requires newer version of numpy than specified in requirements

2018-10-02 Thread Kouhei Sutou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou updated ARROW-3137:

Fix Version/s: 0.11.0

> [Python] pyarrow 0.10 requires newer version of numpy than specified in 
> requirements
> 
>
> Key: ARROW-3137
> URL: https://issues.apache.org/jira/browse/ARROW-3137
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Packaging
>Affects Versions: 0.10.0
>Reporter: James Campbell
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> pyarrow 0.10 appears to have a binary incompatibility with numpy versions 
> prior to the 1.14.x series, but its requirements file claims support for 
> numpy>=1.10.0
> If an older version of numpy is used, the following RuntimeError results: 
> {{RuntimeError: module compiled against API version 0xc but this version of 
> numpy is 0xb}}
> The following tox.ini file demonstrates the issue:
> {{[tox] envlist=py27-numpy\{10,11,13,14,15}-pyarrow\{9,10} [testenv] deps = 
> numpy10: numpy>=1.10.0,<1.11 numpy11: numpy>=1.11.0,<1.12 numpy13: 
> numpy>=1.13.0,<1.14 numpy14: numpy>=1.14.0,<1.15 numpy15: numpy>=1.15.0,<1.16 
> pyarrow9: pyarrow==0.9.0 pyarrow10: pyarrow==0.10.0 pytest commands = pytest 
> }}
> Using a simple test function like the following:
> {{def test_import_pyarrow(): import pyarrow }}
> pyarrow 0.9 doesn't appear to have this issue. Was there a change in the 
> setup process for pyarrow 0.10 that no longer uses Cython to build?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3093) [C++] Linking errors with ORC enabled

2018-10-02 Thread Kouhei Sutou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou updated ARROW-3093:

Component/s: C++

> [C++] Linking errors with ORC enabled
> -
>
> Key: ARROW-3093
> URL: https://issues.apache.org/jira/browse/ARROW-3093
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.10.0
>Reporter: Antoine Pitrou
>Priority: Major
> Fix For: 0.11.0
>
>
> In an attempt to work around ARROW-3091 and ARROW-3092, I've recreated my 
> conda environment, and now I get linking errors if ORC support is enabled:
> {code}
> debug/libarrow.so.11.0.0: error: undefined reference to 
> 'google::protobuf::MessageLite::ParseFromString(std::string const&)'
> debug/libarrow.so.11.0.0: error: undefined reference to 
> 'google::protobuf::MessageLite::SerializeToString(std::string*) const'
> debug/libarrow.so.11.0.0: error: undefined reference to 
> 'google::protobuf::internal::fixed_address_empty_string'
> [etc.]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3093) [C++] Linking errors with ORC enabled

2018-10-02 Thread Kouhei Sutou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou updated ARROW-3093:

Fix Version/s: 0.11.0

> [C++] Linking errors with ORC enabled
> -
>
> Key: ARROW-3093
> URL: https://issues.apache.org/jira/browse/ARROW-3093
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.10.0
>Reporter: Antoine Pitrou
>Priority: Major
> Fix For: 0.11.0
>
>
> In an attempt to work around ARROW-3091 and ARROW-3092, I've recreated my 
> conda environment, and now I get linking errors if ORC support is enabled:
> {code}
> debug/libarrow.so.11.0.0: error: undefined reference to 
> 'google::protobuf::MessageLite::ParseFromString(std::string const&)'
> debug/libarrow.so.11.0.0: error: undefined reference to 
> 'google::protobuf::MessageLite::SerializeToString(std::string*) const'
> debug/libarrow.so.11.0.0: error: undefined reference to 
> 'google::protobuf::internal::fixed_address_empty_string'
> [etc.]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-3093) [C++] Linking errors with ORC enabled

2018-10-02 Thread Kouhei Sutou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou reassigned ARROW-3093:
---

Assignee: Antoine Pitrou

> [C++] Linking errors with ORC enabled
> -
>
> Key: ARROW-3093
> URL: https://issues.apache.org/jira/browse/ARROW-3093
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.10.0
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
> Fix For: 0.11.0
>
>
> In an attempt to work around ARROW-3091 and ARROW-3092, I've recreated my 
> conda environment, and now I get linking errors if ORC support is enabled:
> {code}
> debug/libarrow.so.11.0.0: error: undefined reference to 
> 'google::protobuf::MessageLite::ParseFromString(std::string const&)'
> debug/libarrow.so.11.0.0: error: undefined reference to 
> 'google::protobuf::MessageLite::SerializeToString(std::string*) const'
> debug/libarrow.so.11.0.0: error: undefined reference to 
> 'google::protobuf::internal::fixed_address_empty_string'
> [etc.]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3342) Appveyor builds have stopped triggering on GitHub

2018-10-02 Thread Kouhei Sutou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou updated ARROW-3342:

Fix Version/s: 0.11.0

> Appveyor builds have stopped triggering on GitHub
> -
>
> Key: ARROW-3342
> URL: https://issues.apache.org/jira/browse/ARROW-3342
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration
>Reporter: Wes McKinney
>Assignee: Antoine Pitrou
>Priority: Major
> Fix For: 0.11.0
>
>
> Not sure what's going on, but this is in the last couple of days



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-3342) Appveyor builds have stopped triggering on GitHub

2018-10-02 Thread Kouhei Sutou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou reassigned ARROW-3342:
---

Assignee: Antoine Pitrou

> Appveyor builds have stopped triggering on GitHub
> -
>
> Key: ARROW-3342
> URL: https://issues.apache.org/jira/browse/ARROW-3342
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration
>Reporter: Wes McKinney
>Assignee: Antoine Pitrou
>Priority: Major
> Fix For: 0.11.0
>
>
> Not sure what's going on, but this is in the last couple of days



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3390) [C++] cmake file under windows msys2 system doesn't work

2018-10-02 Thread Kouhei Sutou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou updated ARROW-3390:

Fix Version/s: 0.11.0

> [C++] cmake file under windows msys2 system doesn't work
> 
>
> Key: ARROW-3390
> URL: https://issues.apache.org/jira/browse/ARROW-3390
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.10.0
>Reporter: Dominic Sisneros
>Priority: Major
>  Labels: windows
> Fix For: 0.11.0
>
>
> I am trying to get this to build on a windows machine with msys2 installed.  
> I can generate a Makefile but nothing happens when I run make.  I think it is 
> because for windows, the cmake file changes the shell to cmd.com.  Under 
> msys2, it should run under the msys shell



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-3377) [Gandiva][C++] Remove If statement from bit map set function

2018-10-02 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-3377:
---

Assignee: Praveen Kumar Desabandu

> [Gandiva][C++] Remove If statement from bit map set function
> 
>
> Key: ARROW-3377
> URL: https://issues.apache.org/jira/browse/ARROW-3377
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Gandiva
>Reporter: Praveen Krishna
>Assignee: Praveen Kumar Desabandu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Hello,
> For setting a bit in bit map (which is used in gandiva) we have a branch 
> statement which can be replaced by bit operations like this
> {code:java}
> bmap[byteIdx] ^= (-value ^ bmap[byteIdx]) & (1UL << bitIdx);
> {code}
>  which performs the same operation and we have avoid the branching.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-3377) [Gandiva][C++] Remove If statement from bit map set function

2018-10-02 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-3377.
-
   Resolution: Fixed
Fix Version/s: (was: 0.12.0)
   0.11.0

Issue resolved by pull request 2672
[https://github.com/apache/arrow/pull/2672]

> [Gandiva][C++] Remove If statement from bit map set function
> 
>
> Key: ARROW-3377
> URL: https://issues.apache.org/jira/browse/ARROW-3377
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Gandiva
>Reporter: Praveen Krishna
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Hello,
> For setting a bit in bit map (which is used in gandiva) we have a branch 
> statement which can be replaced by bit operations like this
> {code:java}
> bmap[byteIdx] ^= (-value ^ bmap[byteIdx]) & (1UL << bitIdx);
> {code}
>  which performs the same operation and we have avoid the branching.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3410) [C++] Streaming CSV reader interface for memory-constrainted environments

2018-10-02 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-3410:
---

 Summary: [C++] Streaming CSV reader interface for 
memory-constrainted environments
 Key: ARROW-3410
 URL: https://issues.apache.org/jira/browse/ARROW-3410
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Wes McKinney
 Fix For: 0.12.0


CSV reads are currently all-or-nothing. If the results of parsing a CSV file do 
not fit into memory, this can be a problem. I propose to define a streaming 
{{RecordBatchReader}} interface so that the record batches produced by reading 
can be written out immediately to a stream on disk, to be memory mapped later



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-3404) [C++] Make CSV chunker faster

2018-10-02 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-3404.
-
   Resolution: Fixed
Fix Version/s: 0.11.0

Issue resolved by pull request 2684
[https://github.com/apache/arrow/pull/2684]

> [C++] Make CSV chunker faster
> -
>
> Key: ARROW-3404
> URL: https://issues.apache.org/jira/browse/ARROW-3404
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 0.11.0
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently the CSV chunker can be the bottleneck in multi-threaded reads 
> (starting from 6 threads, according to my experiments). One way to make it 
> faster is to consider by default that CSV values cannot contain newline 
> characters (overridable via a setting), and then simply search for the last 
> newline character in each block of data.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3409) [C++] Add streaming compression interfaces

2018-10-02 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-3409:
-

 Summary: [C++] Add streaming compression interfaces
 Key: ARROW-3409
 URL: https://issues.apache.org/jira/browse/ARROW-3409
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Affects Versions: 0.11.0
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou


Currently the compression and decompression methods offered in 
{{arrow/util/compression.h}} are one-shot. We also need to expose streaming 
compressor and decompressor interfaces.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3408) [C++] Add option to CSV reader to dictionary encode individual columns or all string / binary columns

2018-10-02 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-3408:
---

 Summary: [C++] Add option to CSV reader to dictionary encode 
individual columns or all string / binary columns
 Key: ARROW-3408
 URL: https://issues.apache.org/jira/browse/ARROW-3408
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Wes McKinney
 Fix For: 0.12.0


For many datasets, dictionary encoding everything can result in drastically 
lower memory usage and subsequently better performance in doing analytics

One difficulty of dictionary encoding in multithreaded conversions is that 
ideally you end up with one dictionary at the end. So you have two options:

* Implement a concurrent hashing scheme -- for low cardinality dictionaries, 
the overhead associated with mutex contention will not be meaningful, for high 
cardinality it can be more of a problem

* Hash each chunk separately, then normalize at the end

My guess is that a crude concurrent hash table with a mutex to protect 
mutations and resizes is going to outperform the latter



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-1019) [C++] Implement input stream and output stream with Gzip codec

2018-10-02 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou reassigned ARROW-1019:
-

Assignee: Antoine Pitrou

> [C++] Implement input stream and output stream with Gzip codec
> --
>
> Key: ARROW-1019
> URL: https://issues.apache.org/jira/browse/ARROW-1019
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: csv
> Fix For: 0.12.0
>
>
> After incorporating the compression code and toolchain from parquet-cpp, we 
> should be able to add a codec layer for on-the-fly compression and 
> decompression



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3407) [C++] Add UTF8 conversion modes in CSV reader conversion options

2018-10-02 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-3407:
---

 Summary: [C++] Add UTF8 conversion modes in CSV reader conversion 
options
 Key: ARROW-3407
 URL: https://issues.apache.org/jira/browse/ARROW-3407
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Wes McKinney
 Fix For: 0.12.0


There should be a few options:

* Assume UTF8, but do not verify ("no seatbelts mode", for users that have 
reasonable security about UTF8 and want the maximum performance)

* Full UTF8 verification

* Maybe ASCII-only verification (because ASCII verification is very fast)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3406) [C++] Create a caching memory pool implementation

2018-10-02 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16635865#comment-16635865
 ] 

Wes McKinney commented on ARROW-3406:
-

Not really related, but on the subject of other kinds of allocators wanted to 
make you aware of the chunked allocator that's used (I think) in the Parquet 
encoding routines 
https://github.com/apache/parquet-cpp/blob/master/src/parquet/util/memory.h#L100

> [C++] Create a caching memory pool implementation
> -
>
> Key: ARROW-3406
> URL: https://issues.apache.org/jira/browse/ARROW-3406
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 0.11.0
>Reporter: Antoine Pitrou
>Priority: Major
>
> A caching memory pool implementation would be able to recycle freed memory 
> blocks instead of returning them to the system immediately. Two different 
> policies may be chosen:
> * either an unbounded cache
> * or a size-limited cache, perhaps with some kind of LRU mechanism
> Such a feature might help e.g. for CSV parsing, when reading and parsing data 
> into temporary memory buffers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3173) [Rust] dynamic_types example does not run

2018-10-02 Thread Paddy Horan (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16635775#comment-16635775
 ] 

Paddy Horan commented on ARROW-3173:


[~kou] sorry for leaving the "components" off the issues, I'll make sure to add 
it in the future.

> [Rust] dynamic_types example does not run
> -
>
> Key: ARROW-3173
> URL: https://issues.apache.org/jira/browse/ARROW-3173
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Paddy Horan
>Assignee: Paddy Horan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3406) [C++] Create a caching memory pool implementation

2018-10-02 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-3406:
-

 Summary: [C++] Create a caching memory pool implementation
 Key: ARROW-3406
 URL: https://issues.apache.org/jira/browse/ARROW-3406
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Affects Versions: 0.11.0
Reporter: Antoine Pitrou


A caching memory pool implementation would be able to recycle freed memory 
blocks instead of returning them to the system immediately. Two different 
policies may be chosen:
* either an unbounded cache
* or a size-limited cache, perhaps with some kind of LRU mechanism

Such a feature might help e.g. for CSV parsing, when reading and parsing data 
into temporary memory buffers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3405) [Python] Document CSV reader

2018-10-02 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-3405:
-

 Summary: [Python] Document CSV reader
 Key: ARROW-3405
 URL: https://issues.apache.org/jira/browse/ARROW-3405
 Project: Apache Arrow
  Issue Type: Bug
  Components: Documentation, Python
Affects Versions: 0.11.0
Reporter: Antoine Pitrou


We should document the Python CSV reader, or at least auto-document the various 
classes and functions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3405) [Python] Document CSV reader

2018-10-02 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-3405:
--
Description: 
We should document the Python CSV reader, or at least auto-document the various 
classes and functions.

Perhaps we should first wait for the API to stabilize.

  was:We should document the Python CSV reader, or at least auto-document the 
various classes and functions.


> [Python] Document CSV reader
> 
>
> Key: ARROW-3405
> URL: https://issues.apache.org/jira/browse/ARROW-3405
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Documentation, Python
>Affects Versions: 0.11.0
>Reporter: Antoine Pitrou
>Priority: Major
>
> We should document the Python CSV reader, or at least auto-document the 
> various classes and functions.
> Perhaps we should first wait for the API to stabilize.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3404) [C++] Make CSV chunker faster

2018-10-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-3404:
--
Labels: pull-request-available  (was: )

> [C++] Make CSV chunker faster
> -
>
> Key: ARROW-3404
> URL: https://issues.apache.org/jira/browse/ARROW-3404
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 0.11.0
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
>
> Currently the CSV chunker can be the bottleneck in multi-threaded reads 
> (starting from 6 threads, according to my experiments). One way to make it 
> faster is to consider by default that CSV values cannot contain newline 
> characters (overridable via a setting), and then simply search for the last 
> newline character in each block of data.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3404) [C++] Make CSV chunker faster

2018-10-02 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-3404:
-

 Summary: [C++] Make CSV chunker faster
 Key: ARROW-3404
 URL: https://issues.apache.org/jira/browse/ARROW-3404
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Affects Versions: 0.11.0
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou


Currently the CSV chunker can be the bottleneck in multi-threaded reads 
(starting from 6 threads, according to my experiments). One way to make it 
faster is to consider by default that CSV values cannot contain newline 
characters (overridable via a setting), and then simply search for the last 
newline character in each block of data.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-3400) [Packaging] Add support Parquet GLib related Linux packages

2018-10-02 Thread Kouhei Sutou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou resolved ARROW-3400.
-
   Resolution: Fixed
Fix Version/s: 0.11.0

Issue resolved by pull request 2682
[https://github.com/apache/arrow/pull/2682]

> [Packaging] Add support Parquet GLib related Linux packages
> ---
>
> Key: ARROW-3400
> URL: https://issues.apache.org/jira/browse/ARROW-3400
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: GLib
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2782) [Python] Ongoing Travis CI failures in Plasma unit tests

2018-10-02 Thread Kouhei Sutou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou reassigned ARROW-2782:
---

Assignee: Philipp Moritz

> [Python] Ongoing Travis CI failures in Plasma unit tests
> 
>
> Key: ARROW-2782
> URL: https://issues.apache.org/jira/browse/ARROW-2782
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> e.g.
> {code}
> _ test_use_huge_pages 
> __
> @pytest.mark.skipif(not os.path.exists("/mnt/hugepages"),
> reason="requires hugepage support")
> def test_use_huge_pages():
> import pyarrow.plasma as plasma
> with plasma.start_plasma_store(
> plasma_store_memory=2*10**9,
> plasma_directory="/mnt/hugepages",
> use_hugepages=True) as (plasma_store_name, p):
> plasma_client = plasma.connect(plasma_store_name, "", 64)
> >   create_object(plasma_client, 10**8)
> pyarrow/tests/test_plasma.py:773: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> pyarrow/tests/test_plasma.py:79: in create_object
> seal=seal)
> pyarrow/tests/test_plasma.py:68: in create_object_with_id
> memory_buffer = client.create(object_id, data_size, metadata)
> pyarrow/_plasma.pyx:300: in pyarrow._plasma.PlasmaClient.create
> check_status(self.client.get().Create(object_id.data, data_size,
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> >   raise PlasmaStoreFull(message)
> E   PlasmaStoreFull: 
> /home/travis/build/apache/arrow/cpp/src/plasma/client.cc:375 code: 
> ReadCreateReply(buffer.data(), buffer.size(), , , _fd, 
> _size)
> E   object does not fit in the plasma store
> pyarrow/error.pxi:99: PlasmaStoreFull
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-3011) [CI] Remove Slack notification

2018-10-02 Thread Kouhei Sutou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou reassigned ARROW-3011:
---

Assignee: Krisztian Szucs

> [CI] Remove Slack notification
> --
>
> Key: ARROW-3011
> URL: https://issues.apache.org/jira/browse/ARROW-3011
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration
>Reporter: Wes McKinney
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Remove code from ARROW-2682



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-3015) [Python] Fix documentation typo for pa.uint8

2018-10-02 Thread Kouhei Sutou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou reassigned ARROW-3015:
---

Assignee: Antoine Pitrou

> [Python] Fix documentation typo for pa.uint8
> 
>
> Key: ARROW-3015
> URL: https://issues.apache.org/jira/browse/ARROW-3015
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> See 
> http://arrow.apache.org/docs/python/generated/pyarrow.uint8.html#pyarrow.uint8



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-3018) [Plasma] Improve random ObjectID generation

2018-10-02 Thread Kouhei Sutou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou reassigned ARROW-3018:
---

Assignee: Philipp Moritz

> [Plasma] Improve random ObjectID generation
> ---
>
> Key: ARROW-3018
> URL: https://issues.apache.org/jira/browse/ARROW-3018
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Plasma (C++)
>Affects Versions: 0.10.0
>Reporter: Philipp Moritz
>Assignee: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> As pointed out by [~pitrou], the mersenne twister in Plasma is currently not 
> seeded appropriately (I just saw the comment recently): 
> https://github.com/apache/arrow/pull/2039
> I can submit a patch for Plasma but I'm also wondering if we should have a 
> properly seeded random number in Arrow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-3047) [C++] cmake downloads and builds ORC even though it's installed

2018-10-02 Thread Kouhei Sutou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou reassigned ARROW-3047:
---

Assignee: Antoine Pitrou

> [C++] cmake downloads and builds ORC even though it's installed
> ---
>
> Key: ARROW-3047
> URL: https://issues.apache.org/jira/browse/ARROW-3047
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.10.0
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> I have installed orc 1.5.1 from conda-forge, but our cmake build chain still 
> tries to build protobuf and ORC from source (and fails).
> {code:bash}
> $ ls $CONDA_PREFIX/include/orc/
> ColumnPrinter.hh  Common.hh  Exceptions.hh  Int128.hh  MemoryPool.hh  
> orc-config.hh  OrcFile.hh  Reader.hh  Statistics.hh  Type.hh  Vector.hh  
> Writer.hh
> $ ls -l $CONDA_PREFIX/lib/liborc*
> -rw-rw-r-- 2 antoine antoine 1952298 juin  20 17:32 
> /home/antoine/miniconda3/envs/pyarrow/lib/liborc.a
> {code}
> [~jim.crist]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-3127) [C++] Add Tutorial about Sending Tensor from C++ to Python

2018-10-02 Thread Kouhei Sutou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou reassigned ARROW-3127:
---

Assignee: Simon Mo

> [C++] Add Tutorial about Sending Tensor from C++ to Python
> --
>
> Key: ARROW-3127
> URL: https://issues.apache.org/jira/browse/ARROW-3127
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Website
>Reporter: Simon Mo
>Assignee: Simon Mo
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> I can add a short tutorial showing how to
>  # Serialize a floating-point array in C++ into Tensor
>  # Save the Tensor to Plasma
>  # Access the Tensor in Python
> c.f. [https://github.com/apache/arrow/pull/2481]
> cc @[pcmoritz|https://github.com/pcmoritz]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-3142) [C++] Fetch all libs from toolchain environment

2018-10-02 Thread Kouhei Sutou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou reassigned ARROW-3142:
---

Assignee: Antoine Pitrou

> [C++] Fetch all libs from toolchain environment
> ---
>
> Key: ARROW-3142
> URL: https://issues.apache.org/jira/browse/ARROW-3142
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 0.10.0
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> When setting ARROW_BUILD_TOOLCHAIN, gtest and orc are currently not taken 
> from the toolchain environment.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-3175) [Java] Upgrade to official FlatBuffers release (Flatbuffers incompatibility)

2018-10-02 Thread Kouhei Sutou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou reassigned ARROW-3175:
---

Assignee: Li Jin

> [Java] Upgrade to official FlatBuffers release (Flatbuffers incompatibility)
> 
>
> Key: ARROW-3175
> URL: https://issues.apache.org/jira/browse/ARROW-3175
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Affects Versions: 0.10.0
>Reporter: Alex Black
>Assignee: Li Jin
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> Arrow Java currently uses an unofficial flatbuffers dependency - 
> com.vlkan:flatbuffers:
>  [https://github.com/apache/arrow/blob/master/java/pom.xml#L481-L485]
> The likely motivation here is that previously, no Java flatbuffers 
> implementation was available on maven central.
>  [https://github.com/vy/flatbuffers]
>  > Unfortunately, FlatBuffers project does not publish any artifacts to the 
> Maven Central Repository
> However, this is no longer the case:
>  
> [https://search.maven.org/search?q=g:com.google.flatbuffers%20AND%20a:flatbuffers-java=gav]
> The flatbuffers version used in Arrow java is a nearly 3-year-old snapshot, 
> not even a version of an official release: 
> [https://github.com/vy/flatbuffers#usage]
> The main problem is that this version of flatbuffers is not compatible with 
> the official releases of flatbuffers.
>  For example, we use the official flatbuffers releases in ND4J and 
> Deeplearning4j: [https://github.com/deeplearning4j/deeplearning4j]
> Running Arrow with an official flatbuffers library on the classpath results 
> in issues such as:
> {noformat}
> java.lang.NoSuchMethodError: 
> com.google.flatbuffers.FlatBufferBuilder.createString(Ljava/lang/String;)I
>  at org.apache.arrow.vector.types.pojo.Field.getField(Field.java:154)
>  at org.apache.arrow.vector.types.pojo.Schema.getSchema(Schema.java:145)
>  at 
> org.apache.arrow.vector.ipc.message.MessageSerializer.serialize(MessageSerializer.java:124)
>  at 
> org.apache.arrow.vector.ipc.ArrowWriter.ensureStarted(ArrowWriter.java:136)
>  at org.apache.arrow.vector.ipc.ArrowWriter.start(ArrowWriter.java:97)
>  at FlatBuffersDependencyIssue.test(FlatBuffersDependencyIssue.java:56)
> {noformat}
>  
> Simply excluding the com.vlkan:flatbuffers dependency in lieu of an official 
> flatbuffers release is not a solution (same exception as above) and we aren't 
> prepared to downgrade all of our projects to use the flatbuffers version that 
> Arrow currently requires.
>  Consequently, this is a major issue that prevents us using Arrow in our 
> libraries.
> I have prepared a simple repository to reproduce this issue, if required: 
> [https://github.com/AlexDBlack/arrowflatbufferstest]
> Is there a reason for using this particular version of flatbuffers, and if 
> not, can Arrow java use an official release of flatbuffers instead?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-3281) [Java] Make sure that WritableByteChannel in WriteChannel writes out complete bytes

2018-10-02 Thread Kouhei Sutou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou reassigned ARROW-3281:
---

Assignee: Animesh Trivedi

> [Java] Make sure that WritableByteChannel in WriteChannel writes out complete 
> bytes
> ---
>
> Key: ARROW-3281
> URL: https://issues.apache.org/jira/browse/ARROW-3281
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Animesh Trivedi
>Assignee: Animesh Trivedi
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> In the current WriteChannel class, the write function just calls to push the 
> ByteBuffer into the WritableByteChannel. However, there is no guarantee if 
> the whole buffer has been consumed by the channel in one go. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3196) Enable merge_arrow_py.py script to merge Parquet patches and set fix versions

2018-10-02 Thread Kouhei Sutou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou updated ARROW-3196:

Component/s: Developer Tools

> Enable merge_arrow_py.py script to merge Parquet patches and set fix versions
> -
>
> Key: ARROW-3196
> URL: https://issues.apache.org/jira/browse/ARROW-3196
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Developer Tools
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Follow up to ARROW-3075



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2865) [C++/Python] Reduce some duplicated code in python/builtin_convert.cc

2018-10-02 Thread Kouhei Sutou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou updated ARROW-2865:

Component/s: Python
 C++

> [C++/Python] Reduce some duplicated code in python/builtin_convert.cc
> -
>
> Key: ARROW-2865
> URL: https://issues.apache.org/jira/browse/ARROW-2865
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
> Fix For: 0.11.0
>
>
> See discussion in https://github.com/apache/arrow/pull/2270



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2960) [Packaging] Fix verify-release-candidate for binary packages and fix release cutting script for lib64 cmake issue

2018-10-02 Thread Kouhei Sutou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou updated ARROW-2960:

Component/s: Packaging

> [Packaging] Fix verify-release-candidate for binary packages and fix release 
> cutting script for lib64 cmake issue
> -
>
> Key: ARROW-2960
> URL: https://issues.apache.org/jira/browse/ARROW-2960
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Packaging
>Affects Versions: 0.9.0
>Reporter: Phillip Cloud
>Assignee: Phillip Cloud
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The binary package verification function isn't correct, as it is not 
> downloading packages and associated checksums and signatures at all.
> We also need to set {{CMAKE_INSTALL_LIBDIR}} in the source release creation 
> script because cmake uses {{lib64}} instead of {{lib}} (and the script 
> assumes {{lib}}) on platforms whose install libdir it doesn't know about a 
> priori.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3069) [Release] Stop using SHA1 checksums per ASF policy

2018-10-02 Thread Kouhei Sutou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou updated ARROW-3069:

Component/s: Packaging

> [Release] Stop using SHA1 checksums per ASF policy
> --
>
> Key: ARROW-3069
> URL: https://issues.apache.org/jira/browse/ARROW-3069
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Wes McKinney
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> https://www.apache.org/dev/release-distribution#sigs-and-sums



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3094) [Python] Allow lighter construction of pa.Schema / pa.StructType

2018-10-02 Thread Kouhei Sutou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou updated ARROW-3094:

Component/s: Python

> [Python] Allow lighter construction of pa.Schema / pa.StructType
> 
>
> Key: ARROW-3094
> URL: https://issues.apache.org/jira/browse/ARROW-3094
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: Python
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> One shouldn't have to call {{pa.field}} explicitly. See this example:
> https://github.com/apache/arrow/pull/2449/files#diff-a01a3e7cbe0d7dd0ec300a725ac0c0c6R148



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3174) [Rust] run examples as part of CI

2018-10-02 Thread Kouhei Sutou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou updated ARROW-3174:

Component/s: Rust

> [Rust] run examples as part of CI
> -
>
> Key: ARROW-3174
> URL: https://issues.apache.org/jira/browse/ARROW-3174
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Paddy Horan
>Assignee: Paddy Horan
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3173) [Rust] dynamic_types example does not run

2018-10-02 Thread Kouhei Sutou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou updated ARROW-3173:

Component/s: Rust

> [Rust] dynamic_types example does not run
> -
>
> Key: ARROW-3173
> URL: https://issues.apache.org/jira/browse/ARROW-3173
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust
>Reporter: Paddy Horan
>Assignee: Paddy Horan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3177) [Rust] Update expected error messages for tests that 'should panic'

2018-10-02 Thread Kouhei Sutou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou updated ARROW-3177:

Component/s: Rust

> [Rust] Update expected error messages for tests that 'should panic'
> ---
>
> Key: ARROW-3177
> URL: https://issues.apache.org/jira/browse/ARROW-3177
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Paddy Horan
>Assignee: Paddy Horan
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3375) [Rust] Remove memory_pool.rs

2018-10-02 Thread Kouhei Sutou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou updated ARROW-3375:

Component/s: Rust

> [Rust] Remove memory_pool.rs
> 
>
> Key: ARROW-3375
> URL: https://issues.apache.org/jira/browse/ARROW-3375
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Affects Versions: 0.10.0
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> A while back we approved a PR to add a custom memory pool but it isn't 
> actually used. Rust has other mechanisms now for specifying custom memory 
> allocators so I think we should remove this unused code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-3403) [Website] Source tarball link missing from install page

2018-10-02 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-3403.
-
Resolution: Fixed

Issue resolved by pull request 2683
[https://github.com/apache/arrow/pull/2683]

> [Website] Source tarball link missing from install page
> ---
>
> Key: ARROW-3403
> URL: https://issues.apache.org/jira/browse/ARROW-3403
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Website
>Reporter: Wes McKinney
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This can be seen on http://arrow.apache.org/install/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3403) [Website] Source tarball link missing from install page

2018-10-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-3403:
--
Labels: pull-request-available  (was: )

> [Website] Source tarball link missing from install page
> ---
>
> Key: ARROW-3403
> URL: https://issues.apache.org/jira/browse/ARROW-3403
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Website
>Reporter: Wes McKinney
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> This can be seen on http://arrow.apache.org/install/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-3403) [Website] Source tarball link missing from install page

2018-10-02 Thread Krisztian Szucs (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs reassigned ARROW-3403:
--

Assignee: Krisztian Szucs

> [Website] Source tarball link missing from install page
> ---
>
> Key: ARROW-3403
> URL: https://issues.apache.org/jira/browse/ARROW-3403
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Website
>Reporter: Wes McKinney
>Assignee: Krisztian Szucs
>Priority: Major
> Fix For: 0.11.0
>
>
> This can be seen on http://arrow.apache.org/install/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-3397) [C++] Use relative CMake path for modules

2018-10-02 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-3397:
---

Assignee: Ivan Zhukov

> [C++] Use relative CMake path for modules
> -
>
> Key: ARROW-3397
> URL: https://issues.apache.org/jira/browse/ARROW-3397
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Ivan Zhukov
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3403) [Website] Source tarball link missing from install page

2018-10-02 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-3403:
---

 Summary: [Website] Source tarball link missing from install page
 Key: ARROW-3403
 URL: https://issues.apache.org/jira/browse/ARROW-3403
 Project: Apache Arrow
  Issue Type: Bug
  Components: Website
Reporter: Wes McKinney
 Fix For: 0.11.0


This can be seen on http://arrow.apache.org/install/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-3392) [Python] Support filters in disjunctive normal form in ParquetDataset

2018-10-02 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-3392.
-
   Resolution: Fixed
Fix Version/s: (was: 0.12.0)
   0.11.0

Issue resolved by pull request 2677
[https://github.com/apache/arrow/pull/2677]

> [Python] Support filters in disjunctive normal form in ParquetDataset
> -
>
> Key: ARROW-3392
> URL: https://issues.apache.org/jira/browse/ARROW-3392
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> This allows us to represent any boolean predicate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3402) [Gandiva][C++] Utilize common bitmap operation implementations in precompiled IR routines

2018-10-02 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-3402:
---

 Summary: [Gandiva][C++] Utilize common bitmap operation 
implementations in precompiled IR routines
 Key: ARROW-3402
 URL: https://issues.apache.org/jira/browse/ARROW-3402
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, Gandiva
Reporter: Wes McKinney
 Fix For: 0.12.0


It should be possible to use common inline/header-only implementations of 
bitmap operations in Gandiva functions which are being precompiled to LLVM IR



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3386) [Gandiva] [Java] Build platform independent JAR package

2018-10-02 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3386:

Component/s: Java
 Gandiva

> [Gandiva] [Java] Build platform independent JAR package
> ---
>
> Key: ARROW-3386
> URL: https://issues.apache.org/jira/browse/ARROW-3386
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Gandiva, Java
>Reporter: Praveen Kumar Desabandu
>Priority: Major
>
> Currently we only package .so for the gandiva jar, we would need a packaged 
> lib for windows and mac.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3384) [Gandiva] Sync remaining commits from gandiva repo

2018-10-02 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3384:

Summary: [Gandiva] Sync remaining commits from gandiva repo  (was: Sync 
remaining commits from gandiva repo)

> [Gandiva] Sync remaining commits from gandiva repo
> --
>
> Key: ARROW-3384
> URL: https://issues.apache.org/jira/browse/ARROW-3384
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++, Gandiva
>Reporter: Praveen Kumar Desabandu
>Priority: Major
>
> After initial merge some new commits were done in gandiva, we need to port 
> them to the arrow repo.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3382) [C++] Run Gandiva tests in Travis CI

2018-10-02 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16635366#comment-16635366
 ] 

Wes McKinney commented on ARROW-3382:
-

Might make sense to have a single CI entry that runs the Gandiva tests both for 
C++ and JAva

> [C++] Run Gandiva tests in Travis CI
> 
>
> Key: ARROW-3382
> URL: https://issues.apache.org/jira/browse/ARROW-3382
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++
>Reporter: Praveen Kumar Desabandu
>Priority: Major
>
> Integrate and test Gandiva-Cpp in travis. This would unblock new PRs to 
> gandiva.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3386) [Gandiva] [Java] Build platform independent JAR package

2018-10-02 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3386:

Summary: [Gandiva] [Java] Build platform independent JAR package  (was: 
Platform independent gandiva jar)

> [Gandiva] [Java] Build platform independent JAR package
> ---
>
> Key: ARROW-3386
> URL: https://issues.apache.org/jira/browse/ARROW-3386
> Project: Apache Arrow
>  Issue Type: Task
>Reporter: Praveen Kumar Desabandu
>Priority: Major
>
> Currently we only package .so for the gandiva jar, we would need a packaged 
> lib for windows and mac.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3383) [Java] Run Gandiva tests in Travis CI

2018-10-02 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3383:

Summary: [Java] Run Gandiva tests in Travis CI  (was: Gandiva Java in 
travis ci)

> [Java] Run Gandiva tests in Travis CI
> -
>
> Key: ARROW-3383
> URL: https://issues.apache.org/jira/browse/ARROW-3383
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Gandiva, Java
>Reporter: Praveen Kumar Desabandu
>Priority: Major
> Fix For: 0.12.0
>
>
> Enable and test for gandiva java in travis ci.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3401) [C++] Pluggable statistics collector API for unconvertible CSV values

2018-10-02 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-3401:
---

 Summary: [C++] Pluggable statistics collector API for 
unconvertible CSV values
 Key: ARROW-3401
 URL: https://issues.apache.org/jira/browse/ARROW-3401
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Wes McKinney
 Fix For: 0.12.0


It would be useful to be able to collect statistics (e.g. distinct value 
counts) about values in a column of a CSV file that cannot be converted to a 
desired data type. 

When conversion fails, the converters can call into an abstract API like

{code}
statistics_->CannotConvert(token, size);
{code}

or something similar



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-3395) [C++/Python] Add docker container for linting

2018-10-02 Thread Uwe L. Korn (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-3395.

   Resolution: Fixed
Fix Version/s: 0.11.0

Issue resolved by pull request 2680
[https://github.com/apache/arrow/pull/2680]

> [C++/Python] Add docker container for linting
> -
>
> Key: ARROW-3395
> URL: https://issues.apache.org/jira/browse/ARROW-3395
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Add a docker container that runs clang-format and flake8 checks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3400) [Packaging] Add support Parquet GLib related Linux packages

2018-10-02 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-3400:
--
Labels: pull-request-available  (was: )

> [Packaging] Add support Parquet GLib related Linux packages
> ---
>
> Key: ARROW-3400
> URL: https://issues.apache.org/jira/browse/ARROW-3400
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: GLib
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3399) Cannot serialize numpy matrix object

2018-10-02 Thread Mitar (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16635077#comment-16635077
 ] 

Mitar commented on ARROW-3399:
--

Oh, the difference is not between Arrow 0.9.0 and 0.10.0 but between numpy 
1.14.3 and 1.15.2. Upgrading numpy to latest version throws the error above, 
while it works on an older version.

> Cannot serialize numpy matrix object
> 
>
> Key: ARROW-3399
> URL: https://issues.apache.org/jira/browse/ARROW-3399
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: 0.10.0
>Reporter: Mitar
>Priority: Major
>
> This is a regression from 0.9.0 and happens with 0.10.0 with Python 3.6.5 on 
> Linux.
> {code:java}
> from pyarrow import plasma
> import numpy
> import time
> import subprocess
> import os
> import signal
> m = numpy.matrix(numpy.array([[1, 2], [3, 4]]))
> process = subprocess.Popen(['plasma_store', '-m', '100', '-s', 
> '/tmp/plasma', '-d', '/dev/shm'], stdout=subprocess.DEVNULL, 
> stderr=subprocess.DEVNULL, encoding='utf8', preexec_fn=os.setpgrp)
> time.sleep(5)
> client = plasma.connect('/tmp/plasma', '', 0)
> try:
> client.put(m)
> finally:
> client.disconnect()
> os.killpg(os.getpgid(process.pid), signal.SIGTERM)
> {code}
> Error:
> {noformat}
>   File "pyarrow/_plasma.pyx", line 397, in pyarrow._plasma.PlasmaClient.put
>   File "pyarrow/serialization.pxi", line 338, in pyarrow.lib.serialize
>   File "pyarrow/error.pxi", line 89, in pyarrow.lib.check_status
> pyarrow.lib.ArrowNotImplementedError: This object exceeds the maximum 
> recursion depth. It may contain itself recursively.{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3399) Cannot serialize numpy matrix object

2018-10-02 Thread Mitar (JIRA)
Mitar created ARROW-3399:


 Summary: Cannot serialize numpy matrix object
 Key: ARROW-3399
 URL: https://issues.apache.org/jira/browse/ARROW-3399
 Project: Apache Arrow
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Mitar


This is a regression from 0.9.0 and happens with 0.10.0 with Python 3.6.5 on 
Linux.
{code:java}
from pyarrow import plasma
import numpy
import time
import subprocess
import os
import signal

m = numpy.matrix(numpy.array([[1, 2], [3, 4]]))

process = subprocess.Popen(['plasma_store', '-m', '100', '-s', 
'/tmp/plasma', '-d', '/dev/shm'], stdout=subprocess.DEVNULL, 
stderr=subprocess.DEVNULL, encoding='utf8', preexec_fn=os.setpgrp)
time.sleep(5)
client = plasma.connect('/tmp/plasma', '', 0)

try:
client.put(m)
finally:
client.disconnect()
os.killpg(os.getpgid(process.pid), signal.SIGTERM)
{code}
Error:
{noformat}
  File "pyarrow/_plasma.pyx", line 397, in pyarrow._plasma.PlasmaClient.put
  File "pyarrow/serialization.pxi", line 338, in pyarrow.lib.serialize
  File "pyarrow/error.pxi", line 89, in pyarrow.lib.check_status
pyarrow.lib.ArrowNotImplementedError: This object exceeds the maximum recursion 
depth. It may contain itself recursively.{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)