date:20190530

[jira] [Updated] (ARROW-3896) [MATLAB] Decouple MATLAB-Arrow conversion logic from Feather file specific logic

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3896:

Fix Version/s: (was: 0.14.0)

> [MATLAB] Decouple MATLAB-Arrow conversion logic from Feather file specific 
> logic
> 
>
> Key: ARROW-3896
> URL: https://issues.apache.org/jira/browse/ARROW-3896
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: MATLAB
>Reporter: Kevin Gurney
>Assignee: Kevin Gurney
>Priority: Major
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Currently, the logic for converting between a MATLAB mxArray and various 
> Arrow data structures (arrow::Table, arrow::Array, etc.) is tightly coupled 
> and fairly tangled up with the logic specific to handling Feather files. It 
> would be helpful to factor out these conversions into a more generic 
> "mlarrow" conversion layer component so that it can be reused in the future 
> for use cases other than Feather support. Furthermore, this would be helpful 
> to enforce a cleaner separation of concerns.
> It would be nice to start off with this refactoring work up front before 
> adding support for more datatypes to the MATLAB featherread/featherwrite 
> functions, so that we can start off with a clean base upon which to expand 
> moving forward.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-3919) [Python] Support 64 bit indices for pyarrow.serialize and pyarrow.deserialize

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3919:

Fix Version/s: (was: 0.14.0)

> [Python] Support 64 bit indices for pyarrow.serialize and pyarrow.deserialize
> -
>
> Key: ARROW-3919
> URL: https://issues.apache.org/jira/browse/ARROW-3919
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Philipp Moritz
>Assignee: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> see https://github.com/modin-project/modin/issues/266



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-3873) [C++] Build shared libraries consistently with -fvisibility=hidden

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3873:

Fix Version/s: (was: 0.14.0)
   0.15.0

> [C++] Build shared libraries consistently with -fvisibility=hidden
> --
>
> Key: ARROW-3873
> URL: https://issues.apache.org/jira/browse/ARROW-3873
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> See https://github.com/apache/arrow/pull/2437



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-3901) [Python] Make Schema hashable

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3901:

Fix Version/s: (was: 0.14.0)

> [Python] Make Schema hashable
> -
>
> Key: ARROW-3901
> URL: https://issues.apache.org/jira/browse/ARROW-3901
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>
> Currently pa.Schema is not hashable, however all of its components are 
> hashable 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4022) [C++] RFC: promote Datum variant out of compute namespace

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4022:

Fix Version/s: (was: 0.14.0)

> [C++] RFC: promote Datum variant out of compute namespace
> -
>
> Key: ARROW-4022
> URL: https://issues.apache.org/jira/browse/ARROW-4022
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>
> In working on ARROW-3762, I've found it's useful to be able to have functions 
> return either {{Array}} or {{ChunkedArray}}. We might consider promoting the 
> {{arrow::compute::Datum}} variant out of {{arrow/compute/kernel.h}} so it can 
> be used in other places where it's helpful



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4001) [Python] Create Parquet Schema in python

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4001:

Fix Version/s: (was: 0.14.0)

> [Python] Create Parquet Schema in python
> 
>
> Key: ARROW-4001
> URL: https://issues.apache.org/jira/browse/ARROW-4001
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Affects Versions: 0.9.0
>Reporter: David Stauffer
>Priority: Major
>  Labels: parquet
>
> Enable the creation of a Parquet schema in python. For functions like 
> pyarrow.parquet.ParquetDataset, a schema must be a Parquet schema. See: 
> https://stackoverflow.com/questions/53725691/pyarrow-lib-schema-vs-pyarrow-parquet-schema



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4046) [Python/CI] Run nightly large memory tests

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4046:

Fix Version/s: (was: 0.14.0)

> [Python/CI] Run nightly large memory tests
> --
>
> Key: ARROW-4046
> URL: https://issues.apache.org/jira/browse/ARROW-4046
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Continuous Integration, Python
>Reporter: Krisztian Szucs
>Priority: Major
>  Labels: nightly
>
> See comment https://github.com/apache/arrow/pull/3171#issuecomment-447156646



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4046) [Python/CI] Run nightly large memory tests

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4046:

Labels: nightly  (was: )

> [Python/CI] Run nightly large memory tests
> --
>
> Key: ARROW-4046
> URL: https://issues.apache.org/jira/browse/ARROW-4046
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Continuous Integration, Python
>Reporter: Krisztian Szucs
>Priority: Major
>  Labels: nightly
> Fix For: 0.14.0
>
>
> See comment https://github.com/apache/arrow/pull/3171#issuecomment-447156646



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (ARROW-5455) [Rust] Build broken by 2019-05-30 Rust nightly

2019-05-30 Thread Wes McKinney (JIRA)

Wes McKinney created ARROW-5455:
---

 Summary: [Rust] Build broken by 2019-05-30 Rust nightly
 Key: ARROW-5455
 URL: https://issues.apache.org/jira/browse/ARROW-5455
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Wes McKinney
 Fix For: 0.14.0


Seem example failed build

https://travis-ci.org/apache/arrow/jobs/539477452



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (ARROW-5453) [C++] Just-released cmake-format 0.5.2 breaks the build

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-5453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-5453.
-
Resolution: Fixed

Issue resolved by pull request 4423
[https://github.com/apache/arrow/pull/4423]

> [C++] Just-released cmake-format 0.5.2 breaks the build
> ---
>
> Key: ARROW-5453
> URL: https://issues.apache.org/jira/browse/ARROW-5453
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> It seems we should always pin the cmake-format version until the developers 
> stop changing the formatting algorithm



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4631) [C++] Implement serial version of sort computational kernel

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4631:

Fix Version/s: (was: 0.14.0)
   0.15.0

> [C++] Implement serial version of sort computational kernel
> ---
>
> Key: ARROW-4631
> URL: https://issues.apache.org/jira/browse/ARROW-4631
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Affects Versions: 0.13.0
>Reporter: Areg Melik-Adamyan
>Assignee: Areg Melik-Adamyan
>Priority: Major
>  Labels: analytics
> Fix For: 0.15.0
>
>
> Implement serial version of sort computational kernel.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4591) [Rust] Add explicit SIMD vectorization for aggregation ops in "array_ops"

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4591:

Fix Version/s: (was: 0.14.0)

> [Rust] Add explicit SIMD vectorization for aggregation ops in "array_ops"
> -
>
> Key: ARROW-4591
> URL: https://issues.apache.org/jira/browse/ARROW-4591
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Paddy Horan
>Assignee: Paddy Horan
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4575) [Python] Add Python Flight implementation to integration testing

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4575:

Fix Version/s: (was: 0.14.0)

> [Python] Add Python Flight implementation to integration testing
> 
>
> Key: ARROW-4575
> URL: https://issues.apache.org/jira/browse/ARROW-4575
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: FlightRPC, Integration, Python
>Reporter: David Li
>Assignee: David Li
>Priority: Major
>  Labels: flight
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-4567) [C++] Convert Scalar values to Array values with length 1

2019-05-30 Thread Wes McKinney (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-4567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852594#comment-16852594
 ] 

Wes McKinney commented on ARROW-4567:
-

cc [~fsaintjacques]

> [C++] Convert Scalar values to Array values with length 1
> -
>
> Key: ARROW-4567
> URL: https://issues.apache.org/jira/browse/ARROW-4567
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>
> A common approach to performing operations on both scalar and array values is 
> to treat a Scalar as an array of length 1. For example, we cannot currently 
> use our Cast kernels to cast a Scalar. It would be senseless to create 
> separate kernel implementations specialized for a single value, and much 
> easier to promote a scalar to an Array, execute the kernel, then unbox the 
> result back into a Scalar



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (ARROW-5457) [GLib][Plasma] Environment variable name for test is wrong

2019-05-30 Thread Kouhei Sutou (JIRA)

Kouhei Sutou created ARROW-5457:
---

 Summary: [GLib][Plasma] Environment variable name for test is wrong
 Key: ARROW-5457
 URL: https://issues.apache.org/jira/browse/ARROW-5457
 Project: Apache Arrow
  Issue Type: Bug
  Components: GLib
Affects Versions: 0.13.0
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-5457) [GLib][Plasma] Environment variable name for test is wrong

2019-05-30 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-5457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-5457:
--
Labels: pull-request-available  (was: )

> [GLib][Plasma] Environment variable name for test is wrong
> --
>
> Key: ARROW-5457
> URL: https://issues.apache.org/jira/browse/ARROW-5457
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: GLib
>Affects Versions: 0.13.0
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-750) [Format] Add LargeBinary and LargeString types

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-750:
---
Fix Version/s: (was: 0.14.0)
   0.15.0

> [Format] Add LargeBinary and LargeString types
> --
>
> Key: ARROW-750
> URL: https://issues.apache.org/jira/browse/ARROW-750
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Format
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.15.0
>
>
> These are string and binary types that use 64-bit offsets. Java will not need 
> to implement these types for the time being, but they are needed when 
> representing very large datasets in C++



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-3840) [C++] Run fuzzer tests with docker-compose

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3840:

Fix Version/s: (was: 0.14.0)

> [C++] Run fuzzer tests with docker-compose
> --
>
> Key: ARROW-3840
> URL: https://issues.apache.org/jira/browse/ARROW-3840
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>
> These are not being run regularly right now



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-3419) [C++] Run include-what-you-use checks as nightly build

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3419:

Fix Version/s: (was: 0.14.0)

> [C++] Run include-what-you-use checks as nightly build
> --
>
> Key: ARROW-3419
> URL: https://issues.apache.org/jira/browse/ARROW-3419
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>
> As part of linting (and running linter checks in a separate Travis entry), we 
> should also run include-what-you-use on changed files so that we can force 
> include cleanliness



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-3410) [C++] Streaming CSV reader interface for memory-constrainted environments

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3410:

Fix Version/s: (was: 0.14.0)

> [C++] Streaming CSV reader interface for memory-constrainted environments
> -
>
> Key: ARROW-3410
> URL: https://issues.apache.org/jira/browse/ARROW-3410
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>
> CSV reads are currently all-or-nothing. If the results of parsing a CSV file 
> do not fit into memory, this can be a problem. I propose to define a 
> streaming {{RecordBatchReader}} interface so that the record batches produced 
> by reading can be written out immediately to a stream on disk, to be memory 
> mapped later



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-3408) [C++] Add option to CSV reader to dictionary encode individual columns or all string / binary columns

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3408:

Labels: datasets  (was: )

> [C++] Add option to CSV reader to dictionary encode individual columns or all 
> string / binary columns
> -
>
> Key: ARROW-3408
> URL: https://issues.apache.org/jira/browse/ARROW-3408
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>  Labels: datasets
> Fix For: 0.14.0
>
>
> For many datasets, dictionary encoding everything can result in drastically 
> lower memory usage and subsequently better performance in doing analytics
> One difficulty of dictionary encoding in multithreaded conversions is that 
> ideally you end up with one dictionary at the end. So you have two options:
> * Implement a concurrent hashing scheme -- for low cardinality dictionaries, 
> the overhead associated with mutex contention will not be meaningful, for 
> high cardinality it can be more of a problem
> * Hash each chunk separately, then normalize at the end
> My guess is that a crude concurrent hash table with a mutex to protect 
> mutations and resizes is going to outperform the latter



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-3379) [C++] Implement regex/multichar delimiter tokenizer

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3379:

Fix Version/s: (was: 0.14.0)
   0.15.0

> [C++] Implement regex/multichar delimiter tokenizer
> ---
>
> Key: ARROW-3379
> URL: https://issues.apache.org/jira/browse/ARROW-3379
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>  Labels: csv, datasets
> Fix For: 0.15.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-3424) [Python] Improved workflow for loading an arbitrary collection of Parquet files

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3424:

Labels: datasets parquet  (was: parquet)

> [Python] Improved workflow for loading an arbitrary collection of Parquet 
> files
> ---
>
> Key: ARROW-3424
> URL: https://issues.apache.org/jira/browse/ARROW-3424
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: datasets, parquet
> Fix For: 0.14.0
>
>
> See SO question for use case: 
> https://stackoverflow.com/questions/52613682/load-multiple-parquet-files-into-dataframe-for-analysis



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-3408) [C++] Add option to CSV reader to dictionary encode individual columns or all string / binary columns

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3408:

Fix Version/s: (was: 0.14.0)
   0.15.0

> [C++] Add option to CSV reader to dictionary encode individual columns or all 
> string / binary columns
> -
>
> Key: ARROW-3408
> URL: https://issues.apache.org/jira/browse/ARROW-3408
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>  Labels: datasets
> Fix For: 0.15.0
>
>
> For many datasets, dictionary encoding everything can result in drastically 
> lower memory usage and subsequently better performance in doing analytics
> One difficulty of dictionary encoding in multithreaded conversions is that 
> ideally you end up with one dictionary at the end. So you have two options:
> * Implement a concurrent hashing scheme -- for low cardinality dictionaries, 
> the overhead associated with mutex contention will not be meaningful, for 
> high cardinality it can be more of a problem
> * Hash each chunk separately, then normalize at the end
> My guess is that a crude concurrent hash table with a mutex to protect 
> mutations and resizes is going to outperform the latter



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-3401) [C++] Pluggable statistics collector API for unconvertible CSV values

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3401:

Fix Version/s: (was: 0.14.0)
   0.15.0

> [C++] Pluggable statistics collector API for unconvertible CSV values
> -
>
> Key: ARROW-3401
> URL: https://issues.apache.org/jira/browse/ARROW-3401
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.15.0
>
>
> It would be useful to be able to collect statistics (e.g. distinct value 
> counts) about values in a column of a CSV file that cannot be converted to a 
> desired data type. 
> When conversion fails, the converters can call into an abstract API like
> {code}
> statistics_->CannotConvert(token, size);
> {code}
> or something similar



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-3406) [C++] Create a caching memory pool implementation

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3406:

Fix Version/s: (was: 0.14.0)
   0.15.0

> [C++] Create a caching memory pool implementation
> -
>
> Key: ARROW-3406
> URL: https://issues.apache.org/jira/browse/ARROW-3406
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 0.11.0
>Reporter: Antoine Pitrou
>Priority: Minor
> Fix For: 0.15.0
>
>
> A caching memory pool implementation would be able to recycle freed memory 
> blocks instead of returning them to the system immediately. Two different 
> policies may be chosen:
> * either an unbounded cache
> * or a size-limited cache, perhaps with some kind of LRU mechanism
> Such a feature might help e.g. for CSV parsing, when reading and parsing data 
> into temporary memory buffers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4259) [Plasma] CI failure in test_plasma_tf_op

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4259:

Fix Version/s: (was: 0.14.0)

> [Plasma] CI failure in test_plasma_tf_op
> 
>
> Key: ARROW-4259
> URL: https://issues.apache.org/jira/browse/ARROW-4259
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++ - Plasma, Continuous Integration, Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: ci-failure
>
> Recently-appeared failure on master:
> https://travis-ci.org/apache/arrow/jobs/479378188#L7108



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4286) [C++/R] Namespace vendored Boost

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4286:

Fix Version/s: (was: 0.14.0)

> [C++/R] Namespace vendored Boost
> 
>
> Key: ARROW-4286
> URL: https://issues.apache.org/jira/browse/ARROW-4286
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Packaging, R
>Reporter: Uwe L. Korn
>Priority: Major
>
> For R, we vendor Boost and thus also include the symbols privately in our 
> modules. While they are private, some things like virtual destructors can 
> still interfere with other packages that vendor Boost. We should also 
> namespace the vendored Boost as we do in the manylinux1 packaging: 
> https://github.com/apache/arrow/blob/0f8bd747468dd28c909ef823bed77d8082a5b373/python/manylinux1/scripts/build_boost.sh#L28



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4217) [Plasma] Remove custom object metadata

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4217:

Fix Version/s: (was: 0.14.0)

> [Plasma] Remove custom object metadata
> --
>
> Key: ARROW-4217
> URL: https://issues.apache.org/jira/browse/ARROW-4217
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++ - Plasma
>Affects Versions: 0.11.1
>Reporter: Philipp Moritz
>Assignee: Philipp Moritz
>Priority: Minor
>
> Currently, Plasma supports custom metadata for objects. This doesn't seem to 
> be used at the moment, and removing it will simplify the interface and 
> implementation of plasma. Removing the custom metadata will also make 
> eviction to other blob stores easier (most other stores don't support custom 
> metadata).
> My personal use case was to store arrow schemata in there, but they are now 
> stored as part of the object itself.
> If nobody else is using this, I'd suggest removing it. If people really want 
> metadata, they could always store it as a separate object if desired.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-4220) [Python] Add buffered input and output stream ASV benchmarks with simulated high latency IO

2019-05-30 Thread Wes McKinney (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-4220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852570#comment-16852570
 ] 

Wes McKinney commented on ARROW-4220:
-

cc [~jorisvandenbossche]

> [Python] Add buffered input and output stream ASV benchmarks with simulated 
> high latency IO
> ---
>
> Key: ARROW-4220
> URL: https://issues.apache.org/jira/browse/ARROW-4220
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>
> Follow up to ARROW-3126



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4283) [Python] Should RecordBatchStreamReader/Writer be AsyncIterable?

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4283:

Fix Version/s: (was: 0.14.0)

> [Python] Should RecordBatchStreamReader/Writer be AsyncIterable?
> 
>
> Key: ARROW-4283
> URL: https://issues.apache.org/jira/browse/ARROW-4283
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Paul Taylor
>Priority: Minor
>
> Filing this issue after a discussion today with [~xhochy] about how to 
> implement streaming pyarrow http services. I had attempted to use both Flask 
> and [aiohttp|https://aiohttp.readthedocs.io/en/stable/streams.html]'s 
> streaming interfaces because they seemed familiar, but no dice. I have no 
> idea how hard this would be to add -- supporting all the asynciterable 
> primitives in JS was non-trivial.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4309) [Release] gen_apidocs docker-compose task is out of date

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4309:

Fix Version/s: (was: 0.14.0)

> [Release] gen_apidocs docker-compose task is out of date
> 
>
> Key: ARROW-4309
> URL: https://issues.apache.org/jira/browse/ARROW-4309
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Developer Tools, Documentation
>Reporter: Wes McKinney
>Priority: Major
>  Labels: docker
>
> This needs to be updated to build with CUDA support (which in turn will 
> require the host machine to have nvidia-docker), among other things



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (ARROW-4302) [C++] Add OpenSSL to C++ build toolchain

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-4302.
-
Resolution: Fixed

> [C++] Add OpenSSL to C++ build toolchain
> 
>
> Key: ARROW-4302
> URL: https://issues.apache.org/jira/browse/ARROW-4302
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Deepak Majeti
>Priority: Major
>  Labels: parquet, pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> This is needed for encryption support for Parquet, among other things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-4301) [Java][Gandiva] Maven snapshot version update does not seem to update Gandiva submodule

2019-05-30 Thread Wes McKinney (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-4301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852571#comment-16852571
 ] 

Wes McKinney commented on ARROW-4301:
-

[~pravindra] any ideas about this? This will get us again in 0.14 if it is not 
fixed

> [Java][Gandiva] Maven snapshot version update does not seem to update Gandiva 
> submodule
> ---
>
> Key: ARROW-4301
> URL: https://issues.apache.org/jira/browse/ARROW-4301
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++ - Gandiva, Java
>Reporter: Wes McKinney
>Assignee: Praveen Kumar Desabandu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> See 
> https://github.com/apache/arrow/commit/a486db8c1476be1165981c4fe22996639da8e550.
>  This is breaking the build so I'm going to patch manually



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4465) [Rust] [DataFusion] Add support for ORDER BY

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4465:

Fix Version/s: (was: 0.14.0)

> [Rust] [DataFusion] Add support for ORDER BY
> 
>
> Key: ARROW-4465
> URL: https://issues.apache.org/jira/browse/ARROW-4465
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust, Rust - DataFusion
>Reporter: Andy Grove
>Priority: Major
>
> As a user, I would like to be able to specify an ORDER BY clause on my query.
> Work involved:
>  * Add OrderBy to LogicalPlan enum
>  * Write query planner code to translate SQL AST to OrderBy (SQL parser that 
> we use already supports parsing ORDER BY)
>  * Implement SortRelation
> My high level thoughts on implementing the SortRelation:
>  * Create Arrow array of uint32 same size as batch and populate such that 
> each element contains its own index i.e. array will be 0, 1, 2, 3
>  * Find a Rust crate for sorting that allows us to provide our own comparison 
> lambda
>  * Implement the comparison logic (probably can reuse existing execution code 
> - see filter.rs for how it implements comparison expressions)
>  * Use index array to store the result of the sort i.e. no need to rewrite 
> the whole batch, just the index
>  * Rewrite the batch after the sort has completed
> It would also be good to see how Gandiva has implemented this
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-4439) [C++] Improve FindBrotli.cmake

2019-05-30 Thread Wes McKinney (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-4439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852586#comment-16852586
 ] 

Wes McKinney commented on ARROW-4439:
-

[~rip@gmail.com] is this OK in master now?

> [C++] Improve FindBrotli.cmake
> --
>
> Key: ARROW-4439
> URL: https://issues.apache.org/jira/browse/ARROW-4439
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Renat Valiullin
>Assignee: Renat Valiullin
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4453) [Python] Create Cython wrappers for SparseTensor

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4453:

Fix Version/s: (was: 0.14.0)

> [Python] Create Cython wrappers for SparseTensor
> 
>
> Key: ARROW-4453
> URL: https://issues.apache.org/jira/browse/ARROW-4453
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Philipp Moritz
>Assignee: Rok Mihevc
>Priority: Minor
>
> We should have cython wrappers for [https://github.com/apache/arrow/pull/2546]
> This is related to support for 
> https://issues.apache.org/jira/browse/ARROW-4223 and 
> https://issues.apache.org/jira/browse/ARROW-4224
> I imagine the code would be similar to 
> https://github.com/apache/arrow/blob/5a502d281545402240e818d5fd97a9aaf36363f2/python/pyarrow/array.pxi#L748



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (ARROW-4447) [C++] Investigate dynamic linking for libthift

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-4447.
-
Resolution: Fixed
  Assignee: Uwe L. Korn

Thrift is now dynamically linked

> [C++] Investigate dynamic linking for libthift
> --
>
> Key: ARROW-4447
> URL: https://issues.apache.org/jira/browse/ARROW-4447
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
> Fix For: 0.14.0
>
>
> We're currently only linking statically against {{libthrift}} . Distributions 
> would often prefer a dynamic linkage to libraries where possible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4470) [Python] Pyarrow using considerable more memory when reading partitioned Parquet file

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4470:

Fix Version/s: (was: 0.14.0)
   0.15.0

> [Python] Pyarrow using considerable more memory when reading partitioned 
> Parquet file
> -
>
> Key: ARROW-4470
> URL: https://issues.apache.org/jira/browse/ARROW-4470
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.12.0
>Reporter: Ivan SPM
>Priority: Major
>  Labels: datasets, parquet
> Fix For: 0.15.0
>
>
> Hi,
> I have a partitioned Parquet table in Impala in HDFS, using Hive metastore, 
> with the following structure:
> {{/data/myparquettable/year=2016}}{{/data/myparquettable/year=2016/myfile_1.prt}}
> {{/data/myparquettable/year=2016/myfile_2.prt}}
> {{/data/myparquettable/year=2016/myfile_3.prt}}
> {{/data/myparquettable/year=2017}}
> {{/data/myparquettable/year=2017/myfile_1.prt}}
> {{/data/myparquettable/year=2017/myfile_2.prt}}
> {{/data/myparquettable/year=2017/myfile_3.prt}}
> and so on. I need to work with one partition, so I copied one partition to a 
> local filesystem:
> {{hdfs fs -get /data/myparquettable/year=2017 /local/}}
> so now I have some data on the local disk:
> {{/local/year=2017/myfile_1.prt }}{{/local/year=2017/myfile_2.prt }}
> etc.I tried to read it using Pyarrow:
> {{import pyarrow.parquet as pq}}{{pq.read_parquet('/local/year=2017')}}
> and it starts reading. The problem is that the local Parquet files are around 
> 15GB total, and I blew up my machine memory a couple of times because when 
> reading these files, Pyarrow is using more than 60GB of RAM, and I'm not sure 
> how much it will take because it never finishes. Is this expected? Is there a 
> workaround?
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-4479) [Plasma] Add S3 as external store for Plasma

2019-05-30 Thread Wes McKinney (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852589#comment-16852589
 ] 

Wes McKinney commented on ARROW-4479:
-

What is the status of this project?

> [Plasma] Add S3 as external store for Plasma
> 
>
> Key: ARROW-4479
> URL: https://issues.apache.org/jira/browse/ARROW-4479
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++ - Plasma
>Affects Versions: 0.12.0
>Reporter: Anurag Khandelwal
>Assignee: Anurag Khandelwal
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Adding S3 as an external store will allow objects to be evicted to S3 when 
> Plasma runs out of memory capacity.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4482) [Website] Add blog archive page

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4482:

Fix Version/s: (was: 0.14.0)
   0.15.0

> [Website] Add blog archive page
> ---
>
> Key: ARROW-4482
> URL: https://issues.apache.org/jira/browse/ARROW-4482
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Website
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.15.0
>
>
> There's no easy way to get a bulleted list of all blog posts on the Arrow 
> website. See example archive on my personal blog 
> http://wesmckinney.com/archives.html
> It would be great to have such a generated archive on our website



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4473) [Website] Add instructions to do a test-deploy of Arrow website and fix bugs

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4473:

Fix Version/s: (was: 0.14.0)
   0.15.0

> [Website] Add instructions to do a test-deploy of Arrow website and fix bugs
> 
>
> Key: ARROW-4473
> URL: https://issues.apache.org/jira/browse/ARROW-4473
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Website
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.15.0
>
>
> This will help with testing and proofing the website.
> I have noticed that there are bugs in the website when the baseurl is not a 
> foo.bar.baz, e.g. if you deploy at root foo.bar.baz/test-site many images and 
> links are broken



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4470) [Python] Pyarrow using considerable more memory when reading partitioned Parquet file

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4470:

Labels: datasets parquet  (was: parquet)

> [Python] Pyarrow using considerable more memory when reading partitioned 
> Parquet file
> -
>
> Key: ARROW-4470
> URL: https://issues.apache.org/jira/browse/ARROW-4470
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.12.0
>Reporter: Ivan SPM
>Priority: Major
>  Labels: datasets, parquet
> Fix For: 0.14.0
>
>
> Hi,
> I have a partitioned Parquet table in Impala in HDFS, using Hive metastore, 
> with the following structure:
> {{/data/myparquettable/year=2016}}{{/data/myparquettable/year=2016/myfile_1.prt}}
> {{/data/myparquettable/year=2016/myfile_2.prt}}
> {{/data/myparquettable/year=2016/myfile_3.prt}}
> {{/data/myparquettable/year=2017}}
> {{/data/myparquettable/year=2017/myfile_1.prt}}
> {{/data/myparquettable/year=2017/myfile_2.prt}}
> {{/data/myparquettable/year=2017/myfile_3.prt}}
> and so on. I need to work with one partition, so I copied one partition to a 
> local filesystem:
> {{hdfs fs -get /data/myparquettable/year=2017 /local/}}
> so now I have some data on the local disk:
> {{/local/year=2017/myfile_1.prt }}{{/local/year=2017/myfile_2.prt }}
> etc.I tried to read it using Pyarrow:
> {{import pyarrow.parquet as pq}}{{pq.read_parquet('/local/year=2017')}}
> and it starts reading. The problem is that the local Parquet files are around 
> 15GB total, and I blew up my machine memory a couple of times because when 
> reading these files, Pyarrow is using more than 60GB of RAM, and I'm not sure 
> how much it will take because it never finishes. Is this expected? Is there a 
> workaround?
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-5452) [R] Add documentation website (pkgdown)

2019-05-30 Thread Wes McKinney (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-5452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852326#comment-16852326
 ] 

Wes McKinney commented on ARROW-5452:
-

Yeah, for generated API docs that is fine, if we start writing prose 
documentation for R we should consider doing it in a common place

> [R] Add documentation website (pkgdown)
> ---
>
> Key: ARROW-5452
> URL: https://issues.apache.org/jira/browse/ARROW-5452
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> pkgdown ([https://pkgdown.r-lib.org/]) is the standard for R package 
> documentation websites. Build this for arrow and deploy it at 
> https://arrow.apache.org/docs/r.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-3054) [Packaging] Tooling to enable nightly conda packages to be updated to some anaconda.org channel

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3054:

Fix Version/s: (was: 0.14.0)

> [Packaging] Tooling to enable nightly conda packages to be updated to some 
> anaconda.org channel
> ---
>
> Key: ARROW-3054
> URL: https://issues.apache.org/jira/browse/ARROW-3054
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Packaging
>Affects Versions: 0.10.0
>Reporter: Phillip Cloud
>Assignee: Krisztian Szucs
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-3082) [C++] Add SSL support for hiveserver2

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3082:

Fix Version/s: (was: 0.14.0)

> [C++] Add SSL support for hiveserver2
> -
>
> Key: ARROW-3082
> URL: https://issues.apache.org/jira/browse/ARROW-3082
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>  Labels: HiveServer2
>
> This amounts to using the TSSLSocket in Thrift



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-3806) [Python] When converting nested types to pandas, use tuples

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3806:

Fix Version/s: (was: 0.14.0)

> [Python] When converting nested types to pandas, use tuples
> ---
>
> Key: ARROW-3806
> URL: https://issues.apache.org/jira/browse/ARROW-3806
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Affects Versions: 0.11.1
> Environment: Fedora 29, pyarrow installed with conda
>Reporter: Suvayu Ali
>Priority: Minor
>  Labels: pandas
>
> When converting to pandas, convert nested types (e.g. list) to tuples.  
> Columns with lists are difficult to query.  Here are a few unsuccessful 
> attempts:
> {code}
> >>> mini
> CHROMPOS   IDREFALTS  QUAL
> 80 20  63521  rs191905748  G [A]   100
> 81 20  63541  rs117322527  C [A]   100
> 82 20  63548  rs541129280  G[GT]   100
> 83 20  63553  rs536661806  T [C]   100
> 84 20  63555  rs553463231  T [C]   100
> 85 20  63559  rs138359120  C [A]   100
> 86 20  63586  rs545178789  T [G]   100
> 87 20  63636  rs374311122  G [A]   100
> 88 20  63696  rs149160003  A [G]   100
> 89 20  63698  rs544072005  A [C]   100
> 90 20  63729  rs181483669  G [A]   100
> 91 20  63733   rs75670495  C [T]   100
> 92 20  63799rs1418258  C [T]   100
> 93 20  63808   rs76004960  G [C]   100
> 94 20  63813  rs532151719  G [A]   100
> 95 20  63857  rs543686274  CCTGGAAAGGATT [C]   100
> 96 20  63865  rs551938596  G [A]   100
> 97 20  63902  rs571779099  A [T]   100
> 98 20  63963  rs531152674  G [A]   100
> 99 20  63967  rs116770801  A [G]   100
> 10020  63977  rs199703510  C [G]   100
> 10120  64016  rs143263863  G [A]   100
> 10220  64062  rs148297240  G [A]   100
> 10320  64139  rs186497980  G  [A, T]   100
> 10420  64150rs7274499  C [A]   100
> 10520  64151  rs190945171  C [T]   100
> 10620  64154  rs537656456  T [G]   100
> 10720  64175  rs116531220  A [G]   100
> 10820  64186  rs141793347  C [G]   100
> 10920  64210  rs182418654  G [C]   100
> 11020  64303  rs559929739  C [A]   100
> {code}
> # I think this one fails because it tries to broadcast the comparison.
> {code}
> >>> mini[mini.ALTS == ["A", "T"]]
> Traceback (most recent call last):
>   File "", line 1, in 
>   File 
> "/home/user/miniconda3/lib/python3.6/site-packages/pandas/core/ops.py", line 
> 1283, in wrapper
> res = na_op(values, other)
>   File 
> "/home/user/miniconda3/lib/python3.6/site-packages/pandas/core/ops.py", line 
> 1143, in na_op
> result = _comp_method_OBJECT_ARRAY(op, x, y)
>   File 
> "/home/user/miniconda3/lib/python3.6/site-packages/pandas/core/ops.py", line 
> 1120, in _comp_method_OBJECT_ARRAY
> result = libops.vec_compare(x, y, op)
>   File "pandas/_libs/ops.pyx", line 128, in pandas._libs.ops.vec_compare
> ValueError: Arrays were different lengths: 31 vs 2
> {code}
> # I think this fails due to a similar reason, but the broadcasting is 
> happening at a different place.
> {code}
> >>> mini[mini.ALTS.apply(lambda x: x == ["A", "T"])]
> Traceback (most recent call last):
>   File "", line 1, in 
>   File 
> "/home/user/miniconda3/lib/python3.6/site-packages/pandas/core/frame.py", 
> line 2682, in __getitem__
> return self._getitem_array(key)
>   File 
> "/home/user/miniconda3/lib/python3.6/site-packages/pandas/core/frame.py", 
> line 2726, in _getitem_array
> indexer = self.loc._convert_to_indexer(key, axis=1)
>   File 
> "/home/user/miniconda3/lib/python3.6/site-packages/pandas/core/indexing.py", 
> line 1314, in _convert_to_indexer
> indexer = check = labels.get_indexer(objarr)
>   File 
> "/home/user/miniconda3/lib/python3.6/site-packages/pandas/core/indexes/base.py",
>  line 3259, in get_indexer
> indexer = self._engine.get_indexer(target._ndarray_values)
>   File "pandas/_libs/index.pyx", line 301, in 
> pandas._libs.index.IndexEngine.get_indexer
>   File "pandas/_libs/hashtable_class_helper.pxi", line 1544, in 
> pandas._libs.hashtable.PyObjectHashTable.lookup
> TypeError: unhashable type: 'numpy.ndarray'
> >>> mini.ALTS.apply(lambda x: x == ["A", "T"]).head()
> 80 [True, False]
> 81 [True, False]
> 82[False, False]
> 83

[jira] [Updated] (ARROW-3789) [Python] Enable calling object in Table.to_pandas to "self-destruct" for improved memory use

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3789:

Fix Version/s: (was: 0.14.0)

> [Python] Enable calling object in Table.to_pandas to "self-destruct" for 
> improved memory use
> 
>
> Key: ARROW-3789
> URL: https://issues.apache.org/jira/browse/ARROW-3789
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>
> One issue with using {{Table.to_pandas}} is that it results in a memory 
> doubling (at least, more if there are a lot of Python objects created). It 
> would be useful if there was an option to destroy the {{arrow::Column}} 
> references once they've been transferred into the target data frame. This 
> would render the {{pyarrow.Table}} object useless afterward



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-3764) [C++] Port Python "ParquetDataset" business logic to C++

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3764:

Fix Version/s: (was: 0.14.0)
   0.15.0

> [C++] Port Python "ParquetDataset" business logic to C++
> 
>
> Key: ARROW-3764
> URL: https://issues.apache.org/jira/browse/ARROW-3764
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>  Labels: datasets, parquet
> Fix For: 0.15.0
>
>
> Along with defining appropriate abstractions for dealing with generic 
> filesystems in C++, we should implement the machinery for reading multiple 
> Parquet files in C++ so that it can reused in GLib, R, and Ruby. Otherwise 
> these languages will have to reimplement things, and this would surely result 
> in inconsistent features, bugs in some implementations but not others



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-3759) [R][CI] Build and test on Windows in Appveyor

2019-05-30 Thread Wes McKinney (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-3759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852548#comment-16852548
 ] 

Wes McKinney commented on ARROW-3759:
-

cc [~npr]

> [R][CI] Build and test on Windows in Appveyor
> -
>
> Key: ARROW-3759
> URL: https://issues.apache.org/jira/browse/ARROW-3759
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, R
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-3873) [C++] Build shared libraries consistently with -fvisibility=hidden

2019-05-30 Thread Wes McKinney (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852552#comment-16852552
 ] 

Wes McKinney commented on ARROW-3873:
-

I might take another crack at this to see if it is doable, but after 0.14

> [C++] Build shared libraries consistently with -fvisibility=hidden
> --
>
> Key: ARROW-3873
> URL: https://issues.apache.org/jira/browse/ARROW-3873
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> See https://github.com/apache/arrow/pull/2437



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-3801) [Python] Pandas-Arrow roundtrip makes pd categorical index not writeable

2019-05-30 Thread Wes McKinney (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-3801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852549#comment-16852549
 ] 

Wes McKinney commented on ARROW-3801:
-

cc [~jorisvandenbossche]

> [Python] Pandas-Arrow roundtrip makes pd categorical index not writeable
> 
>
> Key: ARROW-3801
> URL: https://issues.apache.org/jira/browse/ARROW-3801
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Affects Versions: 0.10.0
>Reporter: Thomas Buhrmann
>Priority: Major
> Fix For: 0.14.0
>
>
> Serializing and deserializing a pandas series with categorical dtype will 
> make the categorical index non-writeable, which in turn trips up pandas when 
> e.g. reordering the categories, raising "ValueError: buffer source array is 
> read-only" :
> {code}
> import pandas as pd
> import pyarrow as pa
> df = pd.Series([1,2,3], dtype='category', name="c1").to_frame()
> print("DType before:", repr(df.c1.dtype))
> print("Writeable:", df.c1.cat.categories.values.flags.writeable)
> ro = df.c1.cat.reorder_categories([3,2,1])
> print("DType reordered:", repr(ro.dtype), "\n")
> tbl = pa.Table.from_pandas(df)
> df2 = tbl.to_pandas()
> print("DType after:", repr(df2.c1.dtype))
> print("Writeable:", df2.c1.cat.categories.values.flags.writeable)
> ro = df2.c1.cat.reorder_categories([3,2,1])
> print("DType reordered:", repr(ro.dtype), "\n")
> {code}
>  
> Outputs:
>  
> {code:java}
> DType before: CategoricalDtype(categories=[1, 2, 3], ordered=False)
> Writeable: True
> DType reordered: CategoricalDtype(categories=[3, 2, 1], ordered=False)
> DType after: CategoricalDtype(categories=[1, 2, 3], ordered=False)
> Writeable: False
> ---
> ValueError Traceback (most recent call last)
>  in 
>  12 print("DType after:", repr(df2.c1.dtype))
>  13 print("Writeable:", df2.c1.cat.categories.values.flags.writeable)
> ---> 14 ro = df2.c1.cat.reorder_categories([3,2,1])
>  15 print("DType reordered:", repr(ro.dtype), "\n")
> {code}
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-3827) [Rust] Implement UnionArray

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3827:

Fix Version/s: (was: 0.14.0)

> [Rust] Implement UnionArray
> ---
>
> Key: ARROW-3827
> URL: https://issues.apache.org/jira/browse/ARROW-3827
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: Paddy Horan
>Assignee: Paddy Horan
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4208) [CI/Python] Have automatized tests for S3

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4208:

Labels: filesystem s3  (was: s3)

> [CI/Python] Have automatized tests for S3
> -
>
> Key: ARROW-4208
> URL: https://issues.apache.org/jira/browse/ARROW-4208
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, Python
>Reporter: Krisztian Szucs
>Priority: Major
>  Labels: filesystem, s3
> Fix For: 0.14.0
>
>
> Currently We don't run S3 integration tests regularly. 
> Possible solutions:
> - mock it within python/pytest
> - simply run the s3 tests with an S3 credential provided
> - create a hdfs-integration like docker-compose setup and run an S3 mock 
> server (e.g.: https://github.com/adobe/S3Mock, 
> https://github.com/jubos/fake-s3, https://github.com/gaul/s3proxy, 
> https://github.com/jserver/mock-s3)
> For more see discussion https://github.com/apache/arrow/pull/3286



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4095) [C++] Implement optimizations for dictionary unification where dictionaries are prefixes of the unified dictionary

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4095:

Fix Version/s: (was: 0.14.0)
   0.15.0

> [C++] Implement optimizations for dictionary unification where dictionaries 
> are prefixes of the unified dictionary
> --
>
> Key: ARROW-4095
> URL: https://issues.apache.org/jira/browse/ARROW-4095
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.15.0
>
>
> In the event that the unified dictionary contains other dictionaries as 
> prefixes (e.g. as the result of delta dictionaries), we can avoid memory 
> allocation and index transposition.
> See discussion at 
> https://github.com/apache/arrow/pull/3165#discussion_r243020982



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4133) [C++/Python] ORC adapter should fail gracefully if /etc/timezone is missing instead of aborting

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4133:

Fix Version/s: (was: 0.14.0)

> [C++/Python] ORC adapter should fail gracefully if /etc/timezone is missing 
> instead of aborting
> ---
>
> Key: ARROW-4133
> URL: https://issues.apache.org/jira/browse/ARROW-4133
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Reporter: Krisztian Szucs
>Priority: Major
>  Labels: orc
>
> The following core was genereted by nightly build: 
> https://travis-ci.org/kszucs/crossbow/builds/473397855
> {code}
> Core was generated by `/opt/conda/bin/python /opt/conda/bin/pytest -v 
> --pyargs pyarrow'.
> Program terminated with signal SIGABRT, Aborted.
> #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
> 51  ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
> [Current thread is 1 (Thread 0x7fea61f9e740 (LWP 179))]
> (gdb) bt
> #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
> #1  0x7fea608c8801 in __GI_abort () at abort.c:79
> #2  0x7fea4b3483df in __gnu_cxx::__verbose_terminate_handler ()
> at 
> /opt/conda/conda-bld/compilers_linux-64_1534514838838/work/.build/x86_64-conda_cos6-linux-gnu/src/gcc/libstdc++-v3/libsupc++/vterminate.cc:95
> #3  0x7fea4b346b16 in __cxxabiv1::__terminate (handler=)
> at 
> /opt/conda/conda-bld/compilers_linux-64_1534514838838/work/.build/x86_64-conda_cos6-linux-gnu/src/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:47
> #4  0x7fea4b346b4c in std::terminate ()
> at 
> /opt/conda/conda-bld/compilers_linux-64_1534514838838/work/.build/x86_64-conda_cos6-linux-gnu/src/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:57
> #5  0x7fea4b346d28 in __cxxabiv1::__cxa_throw (obj=0x2039220,
> tinfo=0x7fea494803d0 ,
> dest=0x7fea49087e52 )
> at 
> /opt/conda/conda-bld/compilers_linux-64_1534514838838/work/.build/x86_64-conda_cos6-linux-gnu/src/gcc/libstdc++-v3/libsupc++/eh_throw.cc:95
> #6  0x7fea49086824 in orc::getTimezoneByFilename (filename=...)
> at /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:704
> #7  0x7fea490868d2 in orc::getLocalTimezone () at 
> /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:713   
>   
> #8  0x7fea49063e59 in 
> orc::RowReaderImpl::RowReaderImpl (this=0x204fe30, _contents=..., opts=...)
> at /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Reader.cc:185
> #9  0x7fea4906651e in orc::ReaderImpl::createRowReader (this=0x1fb41b0, 
> opts=...)
> at /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Reader.cc:630
> #10 0x7fea48c2d904 in 
> arrow::adapters::orc::ORCFileReader::Impl::ReadSchema (this=0x1270600, 
> opts=..., 
>
> out=0x7ffe0ccae7b0) at /arrow/cpp/src/arrow/adapters/orc/adapter.cc:264
> #11 0x7fea48c2e18d in arrow::adapters::orc::ORCFileReader::Impl::Read 
> (this=0x1270600, out=0x7ffe0ccaea00)
> at /arrow/cpp/src/arrow/adapters/orc/adapter.cc:302
> #12 0x7fea48c2a8b9 in arrow::adapters::orc::ORCFileReader::Read 
> (this=0x1e14d10, out=0x7ffe0ccaea00)
> at /arrow/cpp/src/arrow/adapters/orc/adapter.cc:697   
>   
>   
> #13 0x7fea48218c9d in __pyx_pf_7pyarrow_4_orc_9ORCReader_12read 
> (__pyx_v_self=0x7fea43de8688,
> __pyx_v_include_indices=0x7fea61d07b70 <_Py_NoneStruct>) at _orc.cpp:3865
> #14 0x7fea48218b31 in __pyx_pw_7pyarrow_4_orc_9ORCReader_13read 
> (__pyx_v_self=0x7fea43de8688,
> __pyx_args=0x7fea61f5e048, __pyx_kwds=0x7fea444f78b8) at _orc.cpp:3813
> #15 0x7fea61910cbd in _PyCFunction_FastCallDict 
> (func_obj=func_obj@entry=0x7fea444b9558,
> args=args@entry=0x7fea44a40fa8, nargs=nargs@entry=0, 
> kwargs=kwargs@entry=0x7fea444f78b8)
> at Objects/methodobject.c:231
> #16 0x7fea61910f16 in _PyCFunction_FastCallKeywords 
> (func=func@entry=0x7fea444b9558,
> stack=stack@entry=0x7fea44a40fa8, nargs=0, 
> kwnames=kwnames@entry=0x7fea47d81d30) at Objects/methodobject.c:294
> #17 0x7fea619aa0da in call_function 
> (pp_stack=pp_stack@entry=0x7ffe0ccaecf0, oparg=,
> kwnames=kwnames@entry=0x7fea47d81d30) at Python/ceval.c:4837
> #18 0x7fea619abb46 in _PyEval_EvalFrameDefault (f=, 
> throwflag=)
> at Python/ceval.c:3351
> #19 0x7fea619a9cde in _PyEval_EvalCodeWithName (_co=0x7fea47d9f6f0, 
>

[jira] [Updated] (ARROW-4090) [Python] Table.flatten() doesn't work recursively

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4090:

Fix Version/s: (was: 0.14.0)

> [Python] Table.flatten() doesn't work recursively
> -
>
> Key: ARROW-4090
> URL: https://issues.apache.org/jira/browse/ARROW-4090
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Francisco Sanchez
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> It seems that the pyarrow.Table.flatten() function is not working recursively 
> nor providing a parameter to do it.
> {code}
> test1c_data = {'level1-A': 'abc',
>'level1-B': 112233,
>'level1-C': {'x': 123.111, 'y': 123.222, 'z': 123.333}
>   }
> test1c_type = pa.struct([('level1-A', pa.string()),
>  ('level1-B', pa.int32()),
>  ('level1-C', pa.struct([('x', pa.float64()),
>  ('y', pa.float64()),
>  ('z', pa.float64())
> ]))
> ])
> test1c_array = pa.array([test1c_data]*5, type=test1c_type)
> test1c_table = pa.Table.from_arrays([test1c_array], names=['msg']) 
> print('{}\n\n{}\n\n{}'.format(test1c_table.schema,
>   test1c_table.flatten().schema,
>   test1c_table.flatten().flatten().schema))
> {code}
> output:
> {quote}msg: struct double, y: double, z: double>>
>  child 0, level1-A: string
>  child 1, level1-B: int32
>  child 2, level1-C: struct
>  child 0, x: double
>  child 1, y: double
>  child 2, z: double
> msg.level1-A: string
>  msg.level1-B: int32
>  msg.level1-C: struct
>  child 0, x: double
>  child 1, y: double
>  child 2, z: double
> msg.level1-A: string
>  msg.level1-B: int32
>  msg.level1-C.x: double
>  msg.level1-C.y: double
>  msg.level1-C.z: double
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4202) [Gandiva] use ArrayFromJson in tests

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4202:

Fix Version/s: (was: 0.14.0)

> [Gandiva] use ArrayFromJson in tests
> 
>
> Key: ARROW-4202
> URL: https://issues.apache.org/jira/browse/ARROW-4202
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++ - Gandiva
>Reporter: Pindikura Ravindra
>Priority: Major
>
> Most of the gandiva tests use wrappers over ArrowFromVector. These will 
> become a lot more readable if we switch to ArrayFromJSON.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4146) [C++] Extend visitor functions to include ArrayBuilder and allow callable visitors

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4146:

Fix Version/s: (was: 0.14.0)

> [C++] Extend visitor functions to include ArrayBuilder and allow callable 
> visitors
> --
>
> Key: ARROW-4146
> URL: https://issues.apache.org/jira/browse/ARROW-4146
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Benjamin Kietzman
>Priority: Minor
>
> In addition to accepting objects with Visit methods for the visited type, 
> {{Visit(Array|Type)}} and {{Visit(Array|Type)Inline}} should accept objects 
> with overloaded call operators.
> In addition for inline visitation if a visitor can only visit one of the 
> potential unboxings then this can be detected at compile time and the full 
> type_id switch can be avoided (if the unboxed object cannot be visited then 
> do nothing). For example:
> {code}
> VisitTypeInline(some_type, [](const StructType& s) {
>   // only execute this if some_type.id() == Type::STRUCT
> });
> {code}
> Finally, visit functions should be added for visiting ArrayBuilders



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4201) [C++][Gandiva] integrate test utils with arrow

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4201:

Fix Version/s: (was: 0.14.0)

> [C++][Gandiva] integrate test utils with arrow
> --
>
> Key: ARROW-4201
> URL: https://issues.apache.org/jira/browse/ARROW-4201
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++ - Gandiva
>Reporter: Pindikura Ravindra
>Priority: Major
>
> The following tasks to be addressed as part of this Jira :
>  # move (or consolidate) data generators in generate_data.h to arrow
>  # move convenience fns in gandiva/tests/test_util.h to arrow
>  # move (or consolidate) EXPECT_ARROW_* fns to arrow



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4208) [CI/Python] Have automatized tests for S3

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4208:

Fix Version/s: (was: 0.14.0)
   0.15.0

> [CI/Python] Have automatized tests for S3
> -
>
> Key: ARROW-4208
> URL: https://issues.apache.org/jira/browse/ARROW-4208
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, Python
>Reporter: Krisztian Szucs
>Priority: Major
>  Labels: filesystem, s3
> Fix For: 0.15.0
>
>
> Currently We don't run S3 integration tests regularly. 
> Possible solutions:
> - mock it within python/pytest
> - simply run the s3 tests with an S3 credential provided
> - create a hdfs-integration like docker-compose setup and run an S3 mock 
> server (e.g.: https://github.com/adobe/S3Mock, 
> https://github.com/jubos/fake-s3, https://github.com/gaul/s3proxy, 
> https://github.com/jserver/mock-s3)
> For more see discussion https://github.com/apache/arrow/pull/3286



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (ARROW-5456) [GLib][Plasma] Installed plasma-glib may be used on building document

2019-05-30 Thread Kouhei Sutou (JIRA)

Kouhei Sutou created ARROW-5456:
---

 Summary: [GLib][Plasma] Installed plasma-glib may be used on 
building document
 Key: ARROW-5456
 URL: https://issues.apache.org/jira/browse/ARROW-5456
 Project: Apache Arrow
  Issue Type: Bug
  Components: GLib
Affects Versions: 0.13.0
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-5456) [GLib][Plasma] Installed plasma-glib may be used on building document

2019-05-30 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-5456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-5456:
--
Labels: pull-request-available  (was: )

> [GLib][Plasma] Installed plasma-glib may be used on building document
> -
>
> Key: ARROW-5456
> URL: https://issues.apache.org/jira/browse/ARROW-5456
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: GLib
>Affects Versions: 0.13.0
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-5458) Apache Arrow parallel CRC32c computation optimization

2019-05-30 Thread Yuqi Gu (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-5458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852610#comment-16852610
 ] 

Yuqi Gu commented on ARROW-5458:


PR: https://github.com/apache/arrow/pull/4427

> Apache Arrow parallel CRC32c computation optimization
> -
>
> Key: ARROW-5458
> URL: https://issues.apache.org/jira/browse/ARROW-5458
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Yuqi Gu
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> ARMv8 defines VMULL/PMULL crypto instruction.
> This patch optimizes crc32c calculate with the instruction when
> available rather than original linear crc instructions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-5452) [R] Add documentation website (pkgdown)

2019-05-30 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-5452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-5452:
--
Labels: pull-request-available  (was: )

> [R] Add documentation website (pkgdown)
> ---
>
> Key: ARROW-5452
> URL: https://issues.apache.org/jira/browse/ARROW-5452
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>
> pkgdown ([https://pkgdown.r-lib.org/]) is the standard for R package 
> documentation websites. Build this for arrow and deploy it at 
> https://arrow.apache.org/docs/r.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-1988) [Python] Extend flavor=spark in Parquet writing to handle INT types

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1988:

Fix Version/s: (was: 0.14.0)
   0.15.0

> [Python] Extend flavor=spark in Parquet writing to handle INT types
> ---
>
> Key: ARROW-1988
> URL: https://issues.apache.org/jira/browse/ARROW-1988
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Uwe L. Korn
>Priority: Major
>  Labels: parquet
> Fix For: 0.15.0
>
>
> See the relevant code sections at 
> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala#L139
> We should cater for them in the {{pyarrow}} code and also reach out to Spark 
> developers so that they are supported there in the longterm.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-1987) [Website] Enable Docker-based documentation generator to build at a specific Arrow commit

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1987:

Fix Version/s: 0.15.0

> [Website] Enable Docker-based documentation generator to build at a specific 
> Arrow commit
> -
>
> Key: ARROW-1987
> URL: https://issues.apache.org/jira/browse/ARROW-1987
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Website
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.15.0
>
>
> Currently both the Docker setup and the Arrow repo have to be at the same 
> commit. It would be useful to create a checkout in the Docker image and 
> enable the build version to be passed in



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-1989) [Python] Better UX on timestamp conversion to Pandas

2019-05-30 Thread Wes McKinney (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852495#comment-16852495
 ] 

Wes McKinney commented on ARROW-1989:
-

[~jorisvandenbossche] potentially of interest?

> [Python] Better UX on timestamp conversion to Pandas
> 
>
> Key: ARROW-1989
> URL: https://issues.apache.org/jira/browse/ARROW-1989
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Uwe L. Korn
>Priority: Major
> Fix For: 0.14.0
>
>
> Converting timestamp columns to Pandas, users often have the problem that 
> they have dates that are larger than Pandas can represent with their 
> nanosecond representation. Currently they simply see an Arrow exception and 
> think that this problem is caused by Arrow. We should try to change the error 
> from
> {code}
> ArrowInvalid: Casting from timestamp[ns] to timestamp[us] would lose data: XX
> {code}
> to something along the lines of 
> {code}
> ArrowInvalid: Casting from timestamp[ns] to timestamp[us] would lose data: 
> XX. This conversion is needed as Pandas does only support nanosecond 
> timestamps. Your data is likely out of the range that can be represented with 
> nanosecond resolution.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-2006) [C++] Add option to trim excess padding when writing IPC messages

2019-05-30 Thread Wes McKinney (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852496#comment-16852496
 ] 

Wes McKinney commented on ARROW-2006:
-

Our IPC methods lack configurability in general. We may want to introduce an 
IpcOptions struct

> [C++] Add option to trim excess padding when writing IPC messages
> -
>
> Key: ARROW-2006
> URL: https://issues.apache.org/jira/browse/ARROW-2006
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>
> This will help with situations like 
> [https://github.com/apache/arrow/issues/1467] where we don't really need the 
> extra padding bytes



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-1987) [Website] Enable Docker-based documentation generator to build at a specific Arrow commit

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1987:

Fix Version/s: (was: 0.14.0)

> [Website] Enable Docker-based documentation generator to build at a specific 
> Arrow commit
> -
>
> Key: ARROW-1987
> URL: https://issues.apache.org/jira/browse/ARROW-1987
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Website
>Reporter: Wes McKinney
>Priority: Major
>
> Currently both the Docker setup and the Arrow repo have to be at the same 
> commit. It would be useful to create a checkout in the Docker image and 
> enable the build version to be passed in



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (ARROW-1957) [Python] Write nanosecond timestamps using new NANO LogicalType Parquet unit

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-1957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-1957:
---

Assignee: TP Boudreau

> [Python] Write nanosecond timestamps using new NANO LogicalType Parquet unit
> 
>
> Key: ARROW-1957
> URL: https://issues.apache.org/jira/browse/ARROW-1957
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Affects Versions: 0.8.0
> Environment: Python 3.6.4.  Mac OSX and CentOS Linux release 
> 7.3.1611.  Pandas 0.21.1 .
>Reporter: Jordan Samuels
>Assignee: TP Boudreau
>Priority: Minor
>  Labels: parquet
> Fix For: 0.14.0
>
>
> The following code
> {code}
> import pyarrow as pa
> import pyarrow.parquet as pq
> import pandas as pd
> n=3
> df = pd.DataFrame({'x': range(n)}, index=pd.DatetimeIndex(start='2017-01-01', 
> freq='1n', periods=n))
> pq.write_table(pa.Table.from_pandas(df), '/tmp/t.parquet'){code}
> results in:
> {{ArrowInvalid: Casting from timestamp[ns] to timestamp[us] would lose data: 
> 14832288001}}
> The desired effect is that we can save nanosecond resolution without losing 
> precision (e.g. conversion to ms).  Note that if {{freq='1u'}} is used, the 
> code runs properly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-1959) [Python] Add option for "lossy" conversions (overflow -> null) from timestamps to datetime.datetime / pandas.Timestamp

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1959:

Fix Version/s: (was: 0.14.0)
   0.15.0

> [Python] Add option for "lossy" conversions (overflow -> null) from 
> timestamps to datetime.datetime / pandas.Timestamp
> --
>
> Key: ARROW-1959
> URL: https://issues.apache.org/jira/browse/ARROW-1959
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.15.0
>
>
> See discussion in 
> https://stackoverflow.com/questions/47946038/overflow-error-using-datetimes-with-pyarrow



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-1846) [C++] Implement "any" reduction kernel for boolean data

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-1846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1846:

Fix Version/s: (was: 0.14.0)
   0.15.0

> [C++] Implement "any" reduction kernel for boolean data
> ---
>
> Key: ARROW-1846
> URL: https://issues.apache.org/jira/browse/ARROW-1846
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>  Labels: analytics, dataframe
> Fix For: 0.15.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-1957) [Python] Write nanosecond timestamps using new NANO LogicalType Parquet unit

2019-05-30 Thread Wes McKinney (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-1957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852494#comment-16852494
 ] 

Wes McKinney commented on ARROW-1957:
-

[~tpboudreau] I assume this is on your critical path

> [Python] Write nanosecond timestamps using new NANO LogicalType Parquet unit
> 
>
> Key: ARROW-1957
> URL: https://issues.apache.org/jira/browse/ARROW-1957
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Affects Versions: 0.8.0
> Environment: Python 3.6.4.  Mac OSX and CentOS Linux release 
> 7.3.1611.  Pandas 0.21.1 .
>Reporter: Jordan Samuels
>Assignee: TP Boudreau
>Priority: Minor
>  Labels: parquet
> Fix For: 0.14.0
>
>
> The following code
> {code}
> import pyarrow as pa
> import pyarrow.parquet as pq
> import pandas as pd
> n=3
> df = pd.DataFrame({'x': range(n)}, index=pd.DatetimeIndex(start='2017-01-01', 
> freq='1n', periods=n))
> pq.write_table(pa.Table.from_pandas(df), '/tmp/t.parquet'){code}
> results in:
> {{ArrowInvalid: Casting from timestamp[ns] to timestamp[us] would lose data: 
> 14832288001}}
> The desired effect is that we can save nanosecond resolution without losing 
> precision (e.g. conversion to ms).  Note that if {{freq='1u'}} is used, the 
> code runs properly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-1837) [Java] Unable to read unsigned integers outside signed range for bit width in integration tests

2019-05-30 Thread Wes McKinney (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852493#comment-16852493
 ] 

Wes McKinney commented on ARROW-1837:
-

[~emkornfi...@gmail.com] if you are interested in unsigned integers this would 
benefit from some attention

> [Java] Unable to read unsigned integers outside signed range for bit width in 
> integration tests
> ---
>
> Key: ARROW-1837
> URL: https://issues.apache.org/jira/browse/ARROW-1837
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Reporter: Wes McKinney
>Priority: Blocker
>  Labels: columnar-format-1.0
> Fix For: 0.14.0
>
> Attachments: generated_primitive.json
>
>
> I believe this was introduced recently (perhaps in the refactors), but there 
> was a problem where the integration tests weren't being properly run that hid 
> the error from us
> see https://github.com/apache/arrow/pull/1294#issuecomment-345553066



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-2077) [Python] Document on how to use Storefact & Arrow to read Parquet from S3/Azure/...

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2077:

Fix Version/s: (was: 0.14.0)

> [Python] Document on how to use Storefact & Arrow to read Parquet from 
> S3/Azure/...
> ---
>
> Key: ARROW-2077
> URL: https://issues.apache.org/jira/browse/ARROW-2077
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: parquet
>
> We're using this happily in production, also with column projection down to 
> the storage layer. Others should also benefit from this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (ARROW-2057) [Python] Configure size of data pages in pyarrow.parquet.write_table

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-2057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-2057:
---

Assignee: (was: Uwe L. Korn)

> [Python] Configure size of data pages in pyarrow.parquet.write_table
> 
>
> Key: ARROW-2057
> URL: https://issues.apache.org/jira/browse/ARROW-2057
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: beginner, parquet
> Fix For: 0.14.0
>
>
> It would be useful to be able to set the size of data pages (within Parquet 
> column chunks) from Python. The current default is set to 1MiB at 
> https://github.com/apache/parquet-cpp/blob/0875e43010af485e1c0b506d77d7e0edc80c66cc/src/parquet/properties.h#L81.
>  It might be useful in some situations to lower this for more granular access.
> We should provide this value as a parameter to 
> {{pyarrow.parquet.write_table}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-2098) [Python] Implement "errors as null" option when coercing Python object arrays to Arrow format

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2098:

Fix Version/s: (was: 0.14.0)

> [Python] Implement "errors as null" option when coercing Python object arrays 
> to Arrow format
> -
>
> Key: ARROW-2098
> URL: https://issues.apache.org/jira/browse/ARROW-2098
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: parquet
>
> Inspired by 
> https://stackoverflow.com/questions/48611998/type-error-on-first-steps-with-apache-parquet
>  where the user has a string inside a mostly integer column



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-2037) [Python]: Add tests for ARROW-1941 cases where pandas inferred type is 'empty'

2019-05-30 Thread Wes McKinney (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852497#comment-16852497
 ] 

Wes McKinney commented on ARROW-2037:
-

cc [~jorisvandenbossche]

> [Python]: Add tests for ARROW-1941 cases where pandas inferred type is 'empty'
> --
>
> Key: ARROW-2037
> URL: https://issues.apache.org/jira/browse/ARROW-2037
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.8.0
>Reporter: Phillip Cloud
>Assignee: Phillip Cloud
>Priority: Major
> Fix For: 0.14.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Closed] (ARROW-2186) [C++] Clean up architecture specific compiler flags

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-2186.
---
Resolution: Not A Problem

> [C++] Clean up architecture specific compiler flags
> ---
>
> Key: ARROW-2186
> URL: https://issues.apache.org/jira/browse/ARROW-2186
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>
> I noticed that {{-maltivec}} is being passed to the compiler on Linux, with 
> an x86_64 processor. That seemed odd to me. It prompted me to look more 
> generally at our compiler flags related to hardware optimizations. We have 
> the ability to pass {{-msse3}}, but there is a {{ARROW_USE_SSE}} which is 
> only used as a define in some headers. There is {{ARROW_ALTIVEC}}, but no 
> option to pass {{-march}}. Nothing related to AVX/AVX2/AVX512. I think this 
> could do for an overhaul



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-2130) [Python] Support converting pandas.Timestamp in pyarrow.array

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2130:

Fix Version/s: (was: 0.14.0)

> [Python] Support converting pandas.Timestamp in pyarrow.array
> -
>
> Key: ARROW-2130
> URL: https://issues.apache.org/jira/browse/ARROW-2130
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Uwe L. Korn
>Priority: Major
>
> This is follow up work to ARROW-2106; since pandas.Timestamp supports 
> nanoseconds, this will require a slightly different code path. Tests should 
> also include using {{Table.from_pandas}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-2127) [Plasma] Transfer of objects between CPUs and GPUs

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-2127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2127:

Fix Version/s: (was: 0.14.0)

> [Plasma] Transfer of objects between CPUs and GPUs
> --
>
> Key: ARROW-2127
> URL: https://issues.apache.org/jira/browse/ARROW-2127
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++ - Plasma
>Reporter: Philipp Moritz
>Priority: Major
>
> It should be possible to transfer an object that was created on the CPU to 
> the GPU and vice versa. One natural implementation is to introduce a flag to 
> plasma::Get that specifies where the object should end up and then transfer 
> the object under the hood and return the appropriate buffer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-2041) [Python] pyarrow.serialize has high overhead for list of NumPy arrays

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-2041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2041:

Fix Version/s: (was: 0.14.0)
   0.15.0

> [Python] pyarrow.serialize has high overhead for list of NumPy arrays
> -
>
> Key: ARROW-2041
> URL: https://issues.apache.org/jira/browse/ARROW-2041
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Richard Shin
>Priority: Minor
>  Labels: Performance
> Fix For: 0.15.0
>
>
> {{Python 2.7.12 (default, Nov 20 2017, 18:23:56)}}
> {{[GCC 5.4.0 20160609] on linux2}}
> {{Type "help", "copyright", "credits" or "license" for more information.}}
> {{>>> import pyarrow as pa, numpy as np}}
> {{>>> arrays = [np.arange(100, dtype=np.int32) for _ in range(1)]}}
> {{>>> with open('test.pyarrow', 'w') as f:}}
> {{... f.write(pa.serialize(arrays).to_buffer().to_pybytes())}}
> {{...}}
> {{>>> import cPickle as pickle}}
> {{>>> pickle.dump(arrays, open('test.pkl', 'w'), pickle.HIGHEST_PROTOCOL)}}
> test.pyarrow is 6.2 MB, while test.pkl is only 4.2 MB.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-1848) [Python] Add documentation examples for reading single Parquet files and datasets from HDFS

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1848:

Fix Version/s: (was: 0.14.0)
   0.15.0

> [Python] Add documentation examples for reading single Parquet files and 
> datasets from HDFS
> ---
>
> Key: ARROW-1848
> URL: https://issues.apache.org/jira/browse/ARROW-1848
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: filesystem, parquet
> Fix For: 0.15.0
>
>
> see 
> https://stackoverflow.com/questions/47443151/read-a-parquet-files-from-hdfs-using-pyarrow



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-2939) [Python] Provide links to documentation pages for old versions

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2939:

Summary: [Python] Provide links to documentation pages for old versions  
(was: [Python] API documentation version doesn't match latest on PyPI)

> [Python] Provide links to documentation pages for old versions
> --
>
> Key: ARROW-2939
> URL: https://issues.apache.org/jira/browse/ARROW-2939
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Ian Robertson
>Priority: Minor
>  Labels: documentation
> Fix For: 0.14.0
>
>
> Hey folks, apologies if this isn't the right place to raise this.  In poking 
> around the web documentation (for pyarrow specifically), it looks like the 
> auto-generated API docs contain commits past the release of 0.9.0.  For 
> example:
>  * 
> [https://arrow.apache.org/docs/python/generated/pyarrow.Table.html#pyarrow.Table.column]
>  * Contains differences merged here: 
> [https://github.com/apache/arrow/pull/1923]
>  * But latest pypi/conda versions of pyarrow are 0.9.0, which don't include 
> that change.
> Not sure if the docs are auto-built off master somewhere, I couldn't find 
> anything about building docs in the docs itself.  I would guess that you may 
> want some of the usage docs to be published in between releases if they're 
> not about new functionality, but the API reference being out of date can be 
> confusing.  Is it possible to anchor the API docs to the latest released 
> version?  Or even something like how Pandas has a whole bunch of old versions 
> still available? (e.g. [https://pandas.pydata.org/pandas-docs/stable/] vs. 
> old versions like [http://pandas.pydata.org/pandas-docs/version/0.17.0/])



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-2984) [JS] Refactor release verification script to share code with main source release verification script

2019-05-30 Thread Wes McKinney (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852519#comment-16852519
 ] 

Wes McKinney commented on ARROW-2984:
-

To close this, let us remove the old JavaScript release scripts

> [JS] Refactor release verification script to share code with main source 
> release verification script
> 
>
> Key: ARROW-2984
> URL: https://issues.apache.org/jira/browse/ARROW-2984
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>
> There is some possible code duplication. See discussion in ARROW-2977 
> https://github.com/apache/arrow/pull/2369



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-3052) [C++] Detect ORC system packages

2019-05-30 Thread Wes McKinney (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-3052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852517#comment-16852517
 ] 

Wes McKinney commented on ARROW-3052:
-

ORC is now in conda-forge https://github.com/conda-forge/orc-feedstock

> [C++] Detect ORC system packages
> 
>
> Key: ARROW-3052
> URL: https://issues.apache.org/jira/browse/ARROW-3052
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>
> See 
> https://github.com/apache/arrow/blob/master/cpp/cmake_modules/ThirdpartyToolchain.cmake#L155.
>  After the CMake refactor it is possible to use built ORC packages with 
> {{$ORC_HOME}} but not detected like the other toolchain dependencies



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-3016) [C++] Add ability to enable call stack logging for each memory allocation

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3016:

Fix Version/s: (was: 0.14.0)

> [C++] Add ability to enable call stack logging for each memory allocation
> -
>
> Key: ARROW-3016
> URL: https://issues.apache.org/jira/browse/ARROW-3016
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>
> It is possible to gain programmatic access to the call stack in C/C++, e.g.
> https://eli.thegreenplace.net/2015/programmatic-access-to-the-call-stack-in-c/
> It would be valuable to have a debugging option to log the sizes of memory 
> allocations as well as showing the call stack where that allocation is 
> performed. In complex programs, this could help determine the origin of a 
> memory leak



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-3702) [R] POSIXct mapped to DateType not TimestampType?

2019-05-30 Thread Wes McKinney (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852546#comment-16852546
 ] 

Wes McKinney commented on ARROW-3702:
-

cc [~npr]

> [R] POSIXct mapped to DateType not TimestampType?
> -
>
> Key: ARROW-3702
> URL: https://issues.apache.org/jira/browse/ARROW-3702
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Javier Luraschi
>Priority: Major
> Fix For: 0.14.0
>
>
> Why was POSIXct mapped to 
> [DataType|https://arrow.apache.org/docs/cpp/classarrow_1_1_date_type.html#a6aea1fcfd9f998e8fa50f5ae62dbd7e6]
>  not 
> [TimestampType|https://arrow.apache.org/docs/cpp/classarrow_1_1_timestamp_type.html#a88e0ba47b82571b3fc3798b6c099499b]?
>  What are the PRO/CONs from each approach?
> This is mostly to interoperate with Spark which choose to map POSIXct to 
> Timestamps since in Spark, not Arrow, dates do not have a time component. 
> There is a way to make this work in Spark with POSIXct mapped to DateType by 
> mapping DateType to timestamps, so mostly looking to understand tradeoffs.
> One particular question, timestamps in arrow seem to support timezones, 
> wouldn't it make more sense to map POSIXct to timestamps?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-3706) [Rust] Add record batch reader trait.

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3706:

Fix Version/s: (was: 0.14.0)

> [Rust] Add record batch reader trait.
> -
>
> Key: ARROW-3706
> URL: https://issues.apache.org/jira/browse/ARROW-3706
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: Renjie Liu
>Assignee: Renjie Liu
>Priority: Major
>
> Add an RecordBatchReader trait.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-3686) [Python] Support for masked arrays in to/from numpy

2019-05-30 Thread Wes McKinney (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-3686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852545#comment-16852545
 ] 

Wes McKinney commented on ARROW-3686:
-

cc [~jorisvandenbossche]

> [Python] Support for masked arrays in to/from numpy
> ---
>
> Key: ARROW-3686
> URL: https://issues.apache.org/jira/browse/ARROW-3686
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Affects Versions: 0.11.1
>Reporter: Maarten Breddels
>Priority: Major
> Fix For: 0.14.0
>
>
> Again, in this PR for vaex: 
> [https://github.com/maartenbreddels/vaex/pull/116] I support masked arrays, 
> it would be nice if this goes into pyarrow. If this approach looks good I 
> could do a PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-3705) [Python] Add "nrows" argument to parquet.read_table read indicated number of rows from file instead of whole file

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3705:

Labels: datasets parquet  (was: parquet)

> [Python] Add "nrows" argument to parquet.read_table read indicated number of 
> rows from file instead of whole file
> -
>
> Key: ARROW-3705
> URL: https://issues.apache.org/jira/browse/ARROW-3705
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: datasets, parquet
> Fix For: 0.14.0
>
>
> This patterns {{nrows}} in {{pandas.read_csv}}
> inspired by 
> https://stackoverflow.com/questions/53152671/how-to-read-sample-records-parquet-file-in-s3



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-3655) [Gandiva] switch away from default_memory_pool

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3655:

Fix Version/s: (was: 0.14.0)

> [Gandiva] switch away from default_memory_pool
> --
>
> Key: ARROW-3655
> URL: https://issues.apache.org/jira/browse/ARROW-3655
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++ - Gandiva
>Reporter: Pindikura Ravindra
>Priority: Major
>
> After changes to ARROW-3519, Gandiva uses default_memory_pool for some 
> allocations. This needs to be replaced with the pool passed in the Evaluate 
> call. 
>  
> Also, change signatures of all Evaluate APIs (both in project and filter) to 
> take a pool argument.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-3709) [CI/Docker/Python] Plasma tests are failing in the docker-compose setup

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3709:

Fix Version/s: (was: 0.14.0)

> [CI/Docker/Python] Plasma tests are failing in the docker-compose setup
> ---
>
> Key: ARROW-3709
> URL: https://issues.apache.org/jira/browse/ARROW-3709
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration
>Reporter: Krisztian Szucs
>Priority: Major
>  Labels: docker
>
> {code}
> rc = proc.poll()
> if rc is not None:
> raise RuntimeError("plasma_store exited unexpectedly with "
> >  "code %d" % (rc,))
> E  RuntimeError: plasma_store exited 
> unexpectedly with code 127
> opt/conda/lib/python3.6/site-packages/pyarrow-0.11.1.dev62+g669c5bca-py3.6-linux-x86_64.egg/pyarrow/plasma.py:138:
>  RuntimeError
>  Captured stderr call 
> -
> /opt/conda/lib/python3.6/site-packages/pyarrow-0.11.1.dev62+g669c5bca-py3.6-linux-x86_64.egg/pyarrow/plasma_store_server:
>  error while loading shared libraries: libboost_system.so.1.68.0: cannot open 
> shared object file: No such file or dir
> ectory
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-3730) [Python] Output a representation of pyarrow.Schema that can be used to reconstruct a schema in a script

2019-05-30 Thread Wes McKinney (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-3730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852547#comment-16852547
 ] 

Wes McKinney commented on ARROW-3730:
-

cc [~jorisvandenbossche]

> [Python] Output a representation of pyarrow.Schema that can be used to 
> reconstruct a schema in a script
> ---
>
> Key: ARROW-3730
> URL: https://issues.apache.org/jira/browse/ARROW-3730
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>
> This would be like what {{__repr__}} is used for in many built-in Python 
> types, or a schema as a list of tuples that can be passed to 
> {{pyarrow.schema}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-3758) [R] Build R library on Windows, document build instructions for Windows developers

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3758:

Fix Version/s: (was: 0.14.0)
   0.15.0

> [R] Build R library on Windows, document build instructions for Windows 
> developers
> --
>
> Key: ARROW-3758
> URL: https://issues.apache.org/jira/browse/ARROW-3758
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.15.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-3710) [CI/Python] Run nightly tests against pandas master

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3710:

Fix Version/s: (was: 0.14.0)

> [CI/Python] Run nightly tests against pandas master
> ---
>
> Key: ARROW-3710
> URL: https://issues.apache.org/jira/browse/ARROW-3710
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, Python
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Follow-up of [https://github.com/apache/arrow/pull/2758] and 
> https://github.com/apache/arrow/pull/2755



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4759) [Rust] [DataFusion] It should be possible to share an execution context between threads

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4759:

Fix Version/s: (was: 0.14.0)

> [Rust] [DataFusion] It should be possible to share an execution context 
> between threads
> ---
>
> Key: ARROW-4759
> URL: https://issues.apache.org/jira/browse/ARROW-4759
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Rust, Rust - DataFusion
>Affects Versions: 0.12.0
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
>
> I am working on a PR for this now.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4429) Add git rebase tips to the 'Contributing' page in the developer docs

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4429:

Fix Version/s: (was: 0.14.0)

> Add git rebase tips to the 'Contributing' page in the developer docs
> 
>
> Key: ARROW-4429
> URL: https://issues.apache.org/jira/browse/ARROW-4429
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Documentation
>Reporter: Tanya Schlusser
>Priority: Major
>
> A recent discussion on the listserv (link below) asked about how contributors 
> should handle rebasing. It would be helpful if the tips made it into the 
> developer documentation somehow. I suggest in the ["Contributing to Apache 
> Arrow"|https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow]
>  page—currently a wiki, but hopefully eventually part of the Sphinx docs 
> ARROW-4427.
> Here is the relevant thread:
> [https://lists.apache.org/thread.html/c74d8027184550b8d9041e3f2414b517ffb76ccbc1d5aa4563d364b6@%3Cdev.arrow.apache.org%3E]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4752) [Rust] Add explicit SIMD vectorization for the divide kernel

2019-05-30 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4752:

Fix Version/s: (was: 0.14.0)

> [Rust] Add explicit SIMD vectorization for the divide kernel
> 
>
> Key: ARROW-4752
> URL: https://issues.apache.org/jira/browse/ARROW-4752
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Paddy Horan
>Assignee: Paddy Horan
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

< 1 2 3 4 >

101 - 200 of 310 matches

Mail list logo