date:20190206

[jira] [Updated] (ARROW-4462) [C++] Upgrade LZ4 v1.7.5 to v1.8.3 to compile with VS2017

2019-02-06 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-4462:
--
Labels: pull-request-available  (was: )

> [C++] Upgrade LZ4 v1.7.5 to v1.8.3 to compile with VS2017
> -
>
> Key: ARROW-4462
> URL: https://issues.apache.org/jira/browse/ARROW-4462
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Developer Tools
>Reporter: Areg Melik-Adamyan
>Priority: Minor
>  Labels: pull-request-available
>
> By upgrading to LZ4 v1.8.3 the patch and patching step can be removed as it 
> is incorporated into a newer version of VS2010 solution and also VS2017 
> solution is provided which ease the usage with newer versions. Is there a 
> reason or fixed dependency on v1.7.5?
> There is still an issue with newer than v8.1 MS Build Tools, and requires 
> manual retargeting. Which can be fixed in CMake by introducing complex logic 
> of reading registry tree 
> HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\MSBuild\ToolsVersions\4.0\1*.0 and 
> analyzing which version of tools are installed and then patching the solution 
> and projects. But as this is an external dependency better to submit patch 
> there.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-4480) [Python] Drive letter removed when writing parquet file

2019-02-06 Thread Michael Peleshenko (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762321#comment-16762321
 ] 

Michael Peleshenko commented on ARROW-4480:
---

We ran into this issue as well, and a coworker dug into it and narrowed it down 
to {{pyarrow.filesystem.resolve_filesystem_and_path()}}. The very last return 
always returns {{parsed_uri.path}}, regardless of the schema. It seems for the 
non-hdfs schema, just {{path}} should be returned, as otherwise the drive 
letter seems to be stripped by {{urlparse()}}. A workaround for now is to call 
{{pyarrow.parquet.write_table()}} with 
{{filesystem=LocalFileSystem.get_instance()}}
{code}
def resolve_filesystem_and_path(where, filesystem=None):
"""
return filesystem from path which could be an HDFS URI
"""
if not _is_path_like(where):
if filesystem is not None:
raise ValueError("filesystem passed but where is file-like, so"
 " there is nothing to open with filesystem.")
return filesystem, where

# input can be hdfs URI such as hdfs://host:port/myfile.parquet
path = _stringify_path(where)

if filesystem is not None:
return _ensure_filesystem(filesystem), path

parsed_uri = urlparse(path)
if parsed_uri.scheme == 'hdfs':
netloc_split = parsed_uri.netloc.split(':')
host = netloc_split[0]
if host == '':
host = 'default'
port = 0
if len(netloc_split) == 2 and netloc_split[1].isnumeric():
port = int(netloc_split[1])
fs = pa.hdfs.connect(host=host, port=port)
else:
fs = LocalFileSystem.get_instance()

return fs, parsed_uri.path
{code}

> [Python] Drive letter removed when writing parquet file 
> 
>
> Key: ARROW-4480
> URL: https://issues.apache.org/jira/browse/ARROW-4480
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.12.0
>Reporter: Seb Fru
>Priority: Major
>  Labels: parquet
> Fix For: 0.13.0
>
>
> Hi everyone,
>   
>  importing this from Github:
>   
>  I encountered a problem while working with pyarrow: I am working on Windows 
> 10. When I want to save a table using pq.write_table(tab, 
> r'E:\parquetfiles\file1.parquet'), I get the Error "No such file or 
> directory".
>   After searching a bit, i found out that the drive letter is getting removed 
> while parsing the where string, but I could not find a way solve my problem: 
> I can write the files on my C:\ drive without problems, but I am not able to 
> write a parquet file on another drive than C:.
>  Am I doing something wrong or is this just how it works? I would really 
> appreciate any help, because I just cannot fit my files on C: drive.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-4383) [C++] Use the CMake's standard find features

2019-02-06 Thread Kouhei Sutou (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-4383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762272#comment-16762272
 ] 

Kouhei Sutou commented on ARROW-4383:
-

I agree with the idea that we use {{find_package()}} with {{CONFIGS}} as the 
first try.

This is not a scope of this issue but I want to change the current vendored 
strategy. I want to change to use the system libraries by default and vendored 
libraries as fallback (and option). Now, we use vendored libraries by default. 
(e.g.: double-conversion)
If we can use the system libraries by default, users don't need to build 
existing libraries by default.

> [C++] Use the CMake's standard find features
> 
>
> Key: ARROW-4383
> URL: https://issues.apache.org/jira/browse/ARROW-4383
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Kouhei Sutou
>Priority: Major
> Fix For: 0.13.0
>
>
> From https://github.com/apache/arrow/pull/3469#discussion_r250862542
> We implement our custom find codes to find libraries by 
> {{find_library()}}/{{find_path()}} with {{NO_DEFAULT_PATH}}. So we need to 
> handle {{lib64/}} (on Red Hat) and {{lib/x86_64-linux-gnu/}} (on Debian) 
> paths manually.
> If we use the CMake's standard find features such as {{CMAKE_PREFIX_PATH}} 
> https://cmake.org/cmake/help/v3.13/variable/CMAKE_PREFIX_PATH.html#variable:CMAKE_PREFIX_PATH
>  , we can remove our custom find codes.
> CMake has package specific find path feature by {{ 3.12. It's equivalent of our {{ https://cmake.org/cmake/help/v3.12/command/find_library.html
> {quote}
> If called from within a find module loaded by 
> {{find_package()}}, search prefixes unique to the current 
> package being found. Specifically look in the {{_ROOT}} CMake 
> variable and the {{_ROOT}} environment variable. The package 
> root variables are maintained as a stack so if called from nested find 
> modules, root paths from the parent’s find module will be searched after 
> paths from the current module, i.e. {{_ROOT}}, 
> {{ENV\{_ROOT}}}, {{_ROOT}}, 
> {{ENV\{_ROOT}}}, etc.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (ARROW-4495) [C++][Gandiva] TestCastTimestampErrors failed in gandiva-precompiled-time_test in MSVC

2019-02-06 Thread Wes McKinney (JIRA)

Wes McKinney created ARROW-4495:
---

 Summary: [C++][Gandiva] TestCastTimestampErrors failed in 
gandiva-precompiled-time_test in MSVC
 Key: ARROW-4495
 URL: https://issues.apache.org/jira/browse/ARROW-4495
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++ - Gandiva
Reporter: Wes McKinney
 Fix For: 0.13.0


See discussion in https://github.com/apache/arrow/pull/3567. This test is 
disabled for now



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (ARROW-3426) [CI] Java integration test very verbose

2019-02-06 Thread Bryan Cutler (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler resolved ARROW-3426.
-
Resolution: Not A Problem

> [CI] Java integration test very verbose
> ---
>
> Key: ARROW-3426
> URL: https://issues.apache.org/jira/browse/ARROW-3426
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration, Java
>Affects Versions: 0.11.0
>Reporter: Antoine Pitrou
>Priority: Minor
> Fix For: 0.13.0
>
>
> The Java integration tests are very verbose. An example here:
> https://travis-ci.org/apache/arrow/jobs/436731293
> It would be nice to cut down on the logging.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-3426) [CI] Java integration test very verbose

2019-02-06 Thread Wes McKinney (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762198#comment-16762198
 ] 

Wes McKinney commented on ARROW-3426:
-

Thanks [~bryanc]

> [CI] Java integration test very verbose
> ---
>
> Key: ARROW-3426
> URL: https://issues.apache.org/jira/browse/ARROW-3426
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration, Java
>Affects Versions: 0.11.0
>Reporter: Antoine Pitrou
>Priority: Minor
> Fix For: 0.13.0
>
>
> The Java integration tests are very verbose. An example here:
> https://travis-ci.org/apache/arrow/jobs/436731293
> It would be nice to cut down on the logging.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-3426) [CI] Java integration test very verbose

2019-02-06 Thread Bryan Cutler (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762196#comment-16762196
 ] 

Bryan Cutler commented on ARROW-3426:
-

So the integration tests send the maven build output to a log file and only 
display it if there is an error. I didn't see any recent examples with a 
failure, but since ARROW-4180 and ARROW-4344 reduced the output quite a bit I 
think it should be fine now. We probably don't need to output to a log file 
anymore either. I'll go ahead and close this.

> [CI] Java integration test very verbose
> ---
>
> Key: ARROW-3426
> URL: https://issues.apache.org/jira/browse/ARROW-3426
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration, Java
>Affects Versions: 0.11.0
>Reporter: Antoine Pitrou
>Priority: Minor
> Fix For: 0.13.0
>
>
> The Java integration tests are very verbose. An example here:
> https://travis-ci.org/apache/arrow/jobs/436731293
> It would be nice to cut down on the logging.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4406) Ignore "*_$folder$" files on S3

2019-02-06 Thread George Sakkis (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

George Sakkis updated ARROW-4406:
-
Priority: Minor  (was: Major)

> Ignore "*_$folder$" files on S3
> ---
>
> Key: ARROW-4406
> URL: https://issues.apache.org/jira/browse/ARROW-4406
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: George Sakkis
>Priority: Minor
>  Labels: easyfix, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently reading parquet files generated by Hadoop (EMR) from S3 fails with 
> "ValueError: Found files in an intermediate directory" because of the 
> [_$folder$|http://stackoverflow.com/questions/42876195/avoid-creation-of-folder-keys-in-s3-with-hadoop-emr]
>  empty files. 
> The fix should be easy, just an extra condition in 
> [ParquetManifest._should_silently_exclude|https://github.com/apache/arrow/blob/master/python/pyarrow/parquet.py#L770].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4406) Ignore "*_$folder$" files on S3

2019-02-06 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-4406:
--
Labels: easyfix pull-request-available  (was: easyfix)

> Ignore "*_$folder$" files on S3
> ---
>
> Key: ARROW-4406
> URL: https://issues.apache.org/jira/browse/ARROW-4406
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: George Sakkis
>Priority: Major
>  Labels: easyfix, pull-request-available
>
> Currently reading parquet files generated by Hadoop (EMR) from S3 fails with 
> "ValueError: Found files in an intermediate directory" because of the 
> [_$folder$|http://stackoverflow.com/questions/42876195/avoid-creation-of-folder-keys-in-s3-with-hadoop-emr]
>  empty files. 
> The fix should be easy, just an extra condition in 
> [ParquetManifest._should_silently_exclude|https://github.com/apache/arrow/blob/master/python/pyarrow/parquet.py#L770].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (ARROW-3972) [C++] Update to LLVM and Clang bits to 7.0

2019-02-06 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-3972.
-
Resolution: Fixed

Issue resolved by pull request 3499
[https://github.com/apache/arrow/pull/3499]

> [C++] Update to LLVM and Clang bits to 7.0
> --
>
> Key: ARROW-3972
> URL: https://issues.apache.org/jira/browse/ARROW-3972
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, C++ - Gandiva
>Reporter: Uwe L. Korn
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> As {{llvmlite}}, the other package in the Python ecosystem moved to LLVM 7, 
> we should follow along to avoid problems when we use it in the same Python 
> environment as Gandiva.
> Reference: https://github.com/numba/llvmlite/pull/412



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-3495) [Java] Optimize bit operations performance

2019-02-06 Thread Animesh Trivedi (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-3495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762174#comment-16762174
 ] 

Animesh Trivedi commented on ARROW-3495:


Hi [~wesmckinn], thanks for catching up. I have been on an extended break since 
beginning of the year, but I am starting to catch up my pending items now. I 
will push the bitmap optimizations in coming weeks. Hope that is ok. 

> [Java] Optimize bit operations performance
> --
>
> Key: ARROW-3495
> URL: https://issues.apache.org/jira/browse/ARROW-3495
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Affects Versions: 0.11.0
>Reporter: Li Jin
>Assignee: Animesh Trivedi
>Priority: Major
> Fix For: 0.13.0
>
>
> From [~atrivedi]'s benchmark finding:
> 2) Materialize values from Validity and Value direct buffers instead of
> calling getInt() function on the IntVector. This is implemented as a new
> Unsafe reader type (
> [https://github.com/animeshtrivedi/benchmarking-arrow/blob/master/src/main/java/com/github/animeshtrivedi/benchmark/ArrowReaderUnsafe.java#L31]
> )
> 3) Optimize bitmap operation to check if a bit is set or not (
> [https://github.com/animeshtrivedi/benchmarking-arrow/blob/master/src/main/java/com/github/animeshtrivedi/benchmark/ArrowReaderUnsafe.java#L23]
> )



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-3495) [Java] Optimize bit operations performance

2019-02-06 Thread Wes McKinney (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-3495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762175#comment-16762175
 ] 

Wes McKinney commented on ARROW-3495:
-

OK, sounds good!

> [Java] Optimize bit operations performance
> --
>
> Key: ARROW-3495
> URL: https://issues.apache.org/jira/browse/ARROW-3495
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Affects Versions: 0.11.0
>Reporter: Li Jin
>Assignee: Animesh Trivedi
>Priority: Major
> Fix For: 0.13.0
>
>
> From [~atrivedi]'s benchmark finding:
> 2) Materialize values from Validity and Value direct buffers instead of
> calling getInt() function on the IntVector. This is implemented as a new
> Unsafe reader type (
> [https://github.com/animeshtrivedi/benchmarking-arrow/blob/master/src/main/java/com/github/animeshtrivedi/benchmark/ArrowReaderUnsafe.java#L31]
> )
> 3) Optimize bitmap operation to check if a bit is set or not (
> [https://github.com/animeshtrivedi/benchmarking-arrow/blob/master/src/main/java/com/github/animeshtrivedi/benchmark/ArrowReaderUnsafe.java#L23]
> )



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4341) [C++] Use TypedBufferBuilder in BooleanBuilder

2019-02-06 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-4341:
--
Labels: pull-request-available  (was: )

> [C++] Use TypedBufferBuilder in BooleanBuilder
> 
>
> Key: ARROW-4341
> URL: https://issues.apache.org/jira/browse/ARROW-4341
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Benjamin Kietzman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> Follow up work to ARROW-4031



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-3265) [C++] Restore CPACK support for Parquet libraries

2019-02-06 Thread Kouhei Sutou (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-3265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762157#comment-16762157
 ] 

Kouhei Sutou commented on ARROW-3265:
-

I'm not sure the context but it's not important.

CPack can generate .deb, .rpm, .tar.gz, .zip (Windows binary package) and .exe 
(Windows installer) packages. (I just list packages what we are interested in.)

For .deb and .rpm, we already have them. And we can't use CPack based packages 
in the official Debian/Fedora repository. For example, MariaDB can use CPack 
for .rpm https://mariadb.org/get-involved/packaging/ but Fedora doesn't use 
MariaDB's one: https://src.fedoraproject.org/rpms/mariadb/tree/master

For .tar.gz, we already have it.

For .zip (Windows binary package), we don't have it. CPack may be useful for 
this case. For example, I'm using CPack to generate .zip for PGroonga 
https://github.com/pgroonga/pgroonga/blob/master/CMakeLists.txt#L199 .
But we can do the same thing by {{cmake --build . --target install}} and {{7z a 
%INSTALL_FOLDER%}}:
https://github.com/groonga/groonga/blob/master/appveyor.yml#L94
https://github.com/groonga/groonga/blob/master/appveyor.yml#L119
In Groonga, I'm creating .zip for each commit on AppVeyor: 
https://ci.appveyor.com/project/groonga/groonga/builds/22159025/job/p6h7qxqgh2txgb9c/artifacts
For MSYS2 users, .zip is not needed. We should maintain 
https://github.com/Alexpux/MINGW-packages/tree/master/mingw-w64-arrow than .zip.
For Visual C++ users, vcpkg https://github.com/Microsoft/vcpkg support may be 
useful. But I don't use it yet.

For .exe (Windows installer), we don't have it. But I think that it's needless. 
Because Apache Arrow is a library not an application.

> [C++] Restore CPACK support for Parquet libraries
> -
>
> Key: ARROW-3265
> URL: https://issues.apache.org/jira/browse/ARROW-3265
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>  Labels: parquet
>
> See https://github.com/apache/parquet-cpp/blob/master/CMakeLists.txt#L32



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4493) [Rust] [DataFusion] make accumulate_scalar somewhat exhaustive and easier to read

2019-02-06 Thread Neville Dipale (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neville Dipale updated ARROW-4493:
--
Issue Type: Improvement  (was: Bug)

> [Rust] [DataFusion] make accumulate_scalar somewhat exhaustive and easier to 
> read
> -
>
> Key: ARROW-4493
> URL: https://issues.apache.org/jira/browse/ARROW-4493
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust - DataFusion
>Affects Versions: 0.12.0
>Reporter: Tupshin Harper
>Priority: Trivial
>
> make accumulate_scalar somewhat exhaustive and easier to read
>  
> The current implementation doesn't leverage any of the exhaustiveness 
> checking of matching.This can be made simpler and partially exhaustive.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4493) [Rust] [DataFusion] make accumulate_scalar somewhat exhaustive and easier to read

2019-02-06 Thread Neville Dipale (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neville Dipale updated ARROW-4493:
--
Affects Version/s: 0.12.0

> [Rust] [DataFusion] make accumulate_scalar somewhat exhaustive and easier to 
> read
> -
>
> Key: ARROW-4493
> URL: https://issues.apache.org/jira/browse/ARROW-4493
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust - DataFusion
>Affects Versions: 0.12.0
>Reporter: Tupshin Harper
>Priority: Trivial
>
> make accumulate_scalar somewhat exhaustive and easier to read
>  
> The current implementation doesn't leverage any of the exhaustiveness 
> checking of matching.This can be made simpler and partially exhaustive.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4493) [Rust] [DataFusion] make accumulate_scalar somewhat exhaustive and easier to read

2019-02-06 Thread Neville Dipale (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neville Dipale updated ARROW-4493:
--
Summary: [Rust] [DataFusion] make accumulate_scalar somewhat exhaustive and 
easier to read  (was: [Rust] make accumulate_scalar somewhat exhaustive and 
easier to read)

> [Rust] [DataFusion] make accumulate_scalar somewhat exhaustive and easier to 
> read
> -
>
> Key: ARROW-4493
> URL: https://issues.apache.org/jira/browse/ARROW-4493
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust - DataFusion
>Reporter: Tupshin Harper
>Priority: Trivial
>
> make accumulate_scalar somewhat exhaustive and easier to read
>  
> The current implementation doesn't leverage any of the exhaustiveness 
> checking of matching.This can be made simpler and partially exhaustive.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Closed] (ARROW-3500) [C++] Provide public API for constructing IPC message headers like RecordBatch

2019-02-06 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-3500.
---
Resolution: Won't Fix

I can't remember what I was thinking when I opened this issue. Closing until 
there is a clear need

> [C++] Provide public API for constructing IPC message headers like RecordBatch
> --
>
> Key: ARROW-3500
> URL: https://issues.apache.org/jira/browse/ARROW-3500
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>
> It would be useful to be able to construct a Flatbuffers message via a public 
> API rather than having to use Flatbuffers directly



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-3496) [Java] Add microbenchmark code to Java

2019-02-06 Thread Wes McKinney (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-3496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762077#comment-16762077
 ] 

Wes McKinney commented on ARROW-3496:
-

Since we're planning to start a benchmark database, it would be good timing to 
start writing some Java benchmarks. Is anyone interested in this?

> [Java] Add microbenchmark code to Java
> --
>
> Key: ARROW-3496
> URL: https://issues.apache.org/jira/browse/ARROW-3496
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Affects Versions: 0.11.0
>Reporter: Li Jin
>Assignee: Animesh Trivedi
>Priority: Major
> Fix For: 0.13.0
>
>
> [~atrivedi] has done some microbenchmarking with the Java API. Let's consider 
> adding them to the codebase.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-3495) [Java] Optimize bit operations performance

2019-02-06 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3495:

Fix Version/s: 0.13.0

> [Java] Optimize bit operations performance
> --
>
> Key: ARROW-3495
> URL: https://issues.apache.org/jira/browse/ARROW-3495
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Affects Versions: 0.11.0
>Reporter: Li Jin
>Assignee: Animesh Trivedi
>Priority: Major
> Fix For: 0.13.0
>
>
> From [~atrivedi]'s benchmark finding:
> 2) Materialize values from Validity and Value direct buffers instead of
> calling getInt() function on the IntVector. This is implemented as a new
> Unsafe reader type (
> [https://github.com/animeshtrivedi/benchmarking-arrow/blob/master/src/main/java/com/github/animeshtrivedi/benchmark/ArrowReaderUnsafe.java#L31]
> )
> 3) Optimize bitmap operation to check if a bit is set or not (
> [https://github.com/animeshtrivedi/benchmarking-arrow/blob/master/src/main/java/com/github/animeshtrivedi/benchmark/ArrowReaderUnsafe.java#L23]
> )



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Closed] (ARROW-3507) [C++] Build against jemalloc from conda-forge

2019-02-06 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-3507.
---
Resolution: Won't Fix

> [C++] Build against jemalloc from conda-forge
> -
>
> Key: ARROW-3507
> URL: https://issues.apache.org/jira/browse/ARROW-3507
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Krisztian Szucs
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (ARROW-3508) [C++] Build against double-conversion from conda-forge

2019-02-06 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-3508.
-
   Resolution: Fixed
Fix Version/s: 0.12.0

We are building against double-conversion from CF now

> [C++] Build against double-conversion from conda-forge
> --
>
> Key: ARROW-3508
> URL: https://issues.apache.org/jira/browse/ARROW-3508
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Krisztian Szucs
>Priority: Major
> Fix For: 0.12.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-3503) [Python] Allow config hadoop_bin in pyarrow hdfs.py

2019-02-06 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3503:

Fix Version/s: 0.13.0

> [Python] Allow config hadoop_bin in pyarrow hdfs.py 
> 
>
> Key: ARROW-3503
> URL: https://issues.apache.org/jira/browse/ARROW-3503
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Wenbo Zhao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Currently, the hadoop_bin is either from `HADOOP_HOME` or the `hadoop` 
> command. 
> [https://github.com/apache/arrow/blob/master/python/pyarrow/hdfs.py#L130]
> However, in some of environment setup, hadoop_bin could be some other 
> location. Can we do something like 
>  
> {code:java}
> if 'HADOOP_BIN' in os.environ:
>     hadoop_bin = os.environ['HADOOP_BIN']
> elif 'HADOOP_HOME' in os.environ:
>     hadoop_bin = '{0}/bin/hadoop'.format(os.environ['HADOOP_HOME'])
> else:
>     hadoop_bin = 'hadoop'
> {code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Closed] (ARROW-3502) [C++] parquet-column_scanner-test failure building ARROW_PARQUET build 11.

2019-02-06 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-3502.
---
Resolution: Cannot Reproduce

Seems like an ABI conflict. The indicated solution of passing ARROW_CXXFLAGS 
should resolve it

> [C++] parquet-column_scanner-test failure building ARROW_PARQUET build 11.
> --
>
> Key: ARROW-3502
> URL: https://issues.apache.org/jira/browse/ARROW-3502
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Tanveer
>Priority: Major
>  Labels: parquet
> Attachments: Screenshot from 2018-10-11 12-25-13.png
>
>
> For building Arrow Apache, I have enabled following flags and got error in 
> the attachment (parquet-
>  column_scanner-test failure) in making arrow build 11.
> cmake .. -DCMAKE_BUILD_TYPE=Release -DARROW_PARQUET=ON -DARROW_PLASMA=ON 
> -DARROW_PLASMA_JAVA_CLIENT=ON



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-3503) [Python] Allow config hadoop_bin in pyarrow hdfs.py

2019-02-06 Thread Wes McKinney (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-3503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762081#comment-16762081
 ] 

Wes McKinney commented on ARROW-3503:
-

Can you update your PR so we can merge this?

> [Python] Allow config hadoop_bin in pyarrow hdfs.py 
> 
>
> Key: ARROW-3503
> URL: https://issues.apache.org/jira/browse/ARROW-3503
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Wenbo Zhao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Currently, the hadoop_bin is either from `HADOOP_HOME` or the `hadoop` 
> command. 
> [https://github.com/apache/arrow/blob/master/python/pyarrow/hdfs.py#L130]
> However, in some of environment setup, hadoop_bin could be some other 
> location. Can we do something like 
>  
> {code:java}
> if 'HADOOP_BIN' in os.environ:
>     hadoop_bin = os.environ['HADOOP_BIN']
> elif 'HADOOP_HOME' in os.environ:
>     hadoop_bin = '{0}/bin/hadoop'.format(os.environ['HADOOP_HOME'])
> else:
>     hadoop_bin = 'hadoop'
> {code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-3496) [Java] Add microbenchmark code to Java

2019-02-06 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3496:

Fix Version/s: 0.13.0

> [Java] Add microbenchmark code to Java
> --
>
> Key: ARROW-3496
> URL: https://issues.apache.org/jira/browse/ARROW-3496
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Affects Versions: 0.11.0
>Reporter: Li Jin
>Assignee: Animesh Trivedi
>Priority: Major
> Fix For: 0.13.0
>
>
> [~atrivedi] has done some microbenchmarking with the Java API. Let's consider 
> adding them to the codebase.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-3495) [Java] Optimize bit operations performance

2019-02-06 Thread Wes McKinney (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-3495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762076#comment-16762076
 ] 

Wes McKinney commented on ARROW-3495:
-

Is there still interest in this?

> [Java] Optimize bit operations performance
> --
>
> Key: ARROW-3495
> URL: https://issues.apache.org/jira/browse/ARROW-3495
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Affects Versions: 0.11.0
>Reporter: Li Jin
>Assignee: Animesh Trivedi
>Priority: Major
> Fix For: 0.13.0
>
>
> From [~atrivedi]'s benchmark finding:
> 2) Materialize values from Validity and Value direct buffers instead of
> calling getInt() function on the IntVector. This is implemented as a new
> Unsafe reader type (
> [https://github.com/animeshtrivedi/benchmarking-arrow/blob/master/src/main/java/com/github/animeshtrivedi/benchmark/ArrowReaderUnsafe.java#L31]
> )
> 3) Optimize bitmap operation to check if a bit is set or not (
> [https://github.com/animeshtrivedi/benchmarking-arrow/blob/master/src/main/java/com/github/animeshtrivedi/benchmark/ArrowReaderUnsafe.java#L23]
> )



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-3475) [C++] Int64Builder.Finish(NumericArray)

2019-02-06 Thread Wes McKinney (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-3475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762074#comment-16762074
 ] 

Wes McKinney commented on ARROW-3475:
-

Agreed. We have {{TypeTraits::ArrayType}} so the implementation can delegate 
and {{std::static_pointer_cast}} (or similar)

> [C++] Int64Builder.Finish(NumericArray)
> --
>
> Key: ARROW-3475
> URL: https://issues.apache.org/jira/browse/ARROW-3475
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wolf Vollprecht
>Priority: Minor
> Fix For: 0.13.0
>
>
> I was intuitively thinking that the following code would work:
> {{Status s;}}
> {{Int64Builder builder;}}
> {{s = builder.Append(1);}}
> {{s = builder.Append(2);}}
> {{std::shared_ptr> array;}}
> {{builder.Finish();}}
> However, it does not seem to work, as the finish operation is not overloaded 
> in the Int64 (or the numeric builder).
> Would it make sense to add this interface?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-3475) [C++] Int64Builder.Finish(NumericArray)

2019-02-06 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3475:

Fix Version/s: 0.13.0

> [C++] Int64Builder.Finish(NumericArray)
> --
>
> Key: ARROW-3475
> URL: https://issues.apache.org/jira/browse/ARROW-3475
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wolf Vollprecht
>Priority: Minor
> Fix For: 0.13.0
>
>
> I was intuitively thinking that the following code would work:
> {{Status s;}}
> {{Int64Builder builder;}}
> {{s = builder.Append(1);}}
> {{s = builder.Append(2);}}
> {{std::shared_ptr> array;}}
> {{builder.Finish();}}
> However, it does not seem to work, as the finish operation is not overloaded 
> in the Int64 (or the numeric builder).
> Would it make sense to add this interface?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-3486) [CI] Add gandiva to the docker-compose setup

2019-02-06 Thread Wes McKinney (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-3486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762075#comment-16762075
 ] 

Wes McKinney commented on ARROW-3486:
-

[~kszucs] it would be good to do this before 0.13

> [CI] Add gandiva to the docker-compose setup
> 
>
> Key: ARROW-3486
> URL: https://issues.apache.org/jira/browse/ARROW-3486
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: docker, gandiva
> Fix For: 0.13.0
>
>
> Similarly like the cpp build 
> https://github.com/apache/arrow/blob/master/docker-compose.yml#L33



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-3486) [CI] Add gandiva to the docker-compose setup

2019-02-06 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3486:

Fix Version/s: 0.13.0

> [CI] Add gandiva to the docker-compose setup
> 
>
> Key: ARROW-3486
> URL: https://issues.apache.org/jira/browse/ARROW-3486
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: docker, gandiva
> Fix For: 0.13.0
>
>
> Similarly like the cpp build 
> https://github.com/apache/arrow/blob/master/docker-compose.yml#L33



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-3471) [C++][Gandiva] Investigate caching isomorphic expressions

2019-02-06 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3471:

Summary: [C++][Gandiva] Investigate caching isomorphic expressions  (was: 
[Gandiva] Investigate caching isomorphic expressions)

> [C++][Gandiva] Investigate caching isomorphic expressions
> -
>
> Key: ARROW-3471
> URL: https://issues.apache.org/jira/browse/ARROW-3471
> Project: Apache Arrow
>  Issue Type: Task
>Reporter: Praveen Kumar Desabandu
>Priority: Major
>  Labels: gandiva
> Fix For: 0.13.0
>
>
> Two expressions say add(a+b) and add(c+d), could potentially be reused if the 
> only thing differing are the names.
> Test E2E.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-3459) [C++][Gandiva] Add support for variable length output vectors

2019-02-06 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3459:

Summary: [C++][Gandiva] Add support for variable length output vectors  
(was: Add support for variable length output vectors)

> [C++][Gandiva] Add support for variable length output vectors
> -
>
> Key: ARROW-3459
> URL: https://issues.apache.org/jira/browse/ARROW-3459
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++ - Gandiva
>Reporter: Pindikura Ravindra
>Assignee: Pindikura Ravindra
>Priority: Major
>
> Gandiva can currently handle variable length input vectors but requires the 
> output vectors to be fixed-length. This is because we do not have a handle to 
> allocate or resize arrow vectors from inside the LLVM code. Due to this 
> limitation, we are not able to support a lot of utf8 related functions 
> (convert-numeric-to-string, toupper, strstr, replace, ..).
>  
> This needs to be fixed for both C++ and Java.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4337) [C#] Array / RecordBatch Builder Fluent API

2019-02-06 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-4337:
--
Labels: c# pull-request-available  (was: c#)

> [C#] Array / RecordBatch Builder Fluent API
> ---
>
> Key: ARROW-4337
> URL: https://issues.apache.org/jira/browse/ARROW-4337
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C#
>Reporter: Chris Hutchinson
>Assignee: Chris Hutchinson
>Priority: Major
>  Labels: c#, pull-request-available
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> Implement a fluent API for building arrays and record batches from Arrow 
> buffers, flat arrays, spans, enumerables, etc.
> A future implementation could extend this API with support for ADO.NET 
> DataTables.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-1425) [Python] Document semantic differences between Spark timestamps and Arrow timestamps

2019-02-06 Thread Li Jin (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762024#comment-16762024
 ] 

Li Jin commented on ARROW-1425:
---

[~emkornfi...@gmail.com] Feel free to finish it up.

> [Python] Document semantic differences between Spark timestamps and Arrow 
> timestamps
> 
>
> Key: ARROW-1425
> URL: https://issues.apache.org/jira/browse/ARROW-1425
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Micah Kornfield
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> The way that Spark treats non-timezone-aware timestamps as session local can 
> be problematic when using pyarrow which may view the data coming from 
> toPandas() as time zone naive (but with fields as though it were UTC, not 
> session local). We should document carefully how to properly handle the data 
> coming from Spark to avoid problems.
> cc [~bryanc] [~holdenkarau]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (ARROW-4494) [Java] arrow-jdbc JAR is not uploaded on release

2019-02-06 Thread Uwe L. Korn (JIRA)

Uwe L. Korn created ARROW-4494:
--

 Summary: [Java] arrow-jdbc JAR is not uploaded on release
 Key: ARROW-4494
 URL: https://issues.apache.org/jira/browse/ARROW-4494
 Project: Apache Arrow
  Issue Type: Bug
  Components: Java, Packaging
Affects Versions: 0.12.0
Reporter: Uwe L. Korn
Assignee: Uwe L. Korn
 Fix For: 0.13.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Closed] (ARROW-4494) [Java] arrow-jdbc JAR is not uploaded on release

2019-02-06 Thread Uwe L. Korn (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn closed ARROW-4494.
--
Resolution: Invalid

My maven skills are just a bit rusty.

> [Java] arrow-jdbc JAR is not uploaded on release
> 
>
> Key: ARROW-4494
> URL: https://issues.apache.org/jira/browse/ARROW-4494
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java, Packaging
>Affects Versions: 0.12.0
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-3454) [Python] Tab complete doesn't work for plasma client.

2019-02-06 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3454:

Summary: [Python] Tab complete doesn't work for plasma client.  (was: Tab 
complete doesn't work for plasma client.)

> [Python] Tab complete doesn't work for plasma client.
> -
>
> Key: ARROW-3454
> URL: https://issues.apache.org/jira/browse/ARROW-3454
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Robert Nishihara
>Priority: Minor
>
> In IPython, tab complete on a plasma client object should reveal the client's 
> methods. I think this is the same thing as making sure {{dir(client)}} 
> returns all of the relevant methods/fields.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4493) [Rust] make accumulate_scalar somewhat exhaustive and easier to read

2019-02-06 Thread Tupshin Harper (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tupshin Harper updated ARROW-4493:
--
Summary: [Rust] make accumulate_scalar somewhat exhaustive and easier to 
read  (was: [Rust] Clean up match)

> [Rust] make accumulate_scalar somewhat exhaustive and easier to read
> 
>
> Key: ARROW-4493
> URL: https://issues.apache.org/jira/browse/ARROW-4493
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust - DataFusion
>Reporter: Tupshin Harper
>Priority: Trivial
>
> make accumulate_scalar somewhat exhaustive and easier to read
>  
> The current implementation doesn't leverage any of the exhaustiveness 
> checking of matching.This can be made simpler and partially exhaustive.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (ARROW-4493) [Rust] Clean up match

2019-02-06 Thread Tupshin Harper (JIRA)

Tupshin Harper created ARROW-4493:
-

 Summary: [Rust] Clean up match
 Key: ARROW-4493
 URL: https://issues.apache.org/jira/browse/ARROW-4493
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust - DataFusion
Reporter: Tupshin Harper


make accumulate_scalar somewhat exhaustive and easier to read

 

The current implementation doesn't leverage any of the exhaustiveness checking 
of matching.This can be made simpler and partially exhaustive.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (ARROW-3966) [Java] JDBC-to-Arrow Conversion: JDBC Metadata in Schema Fields

2019-02-06 Thread Uwe L. Korn (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-3966.

   Resolution: Fixed
Fix Version/s: 0.13.0

Issue resolved by pull request 3134
[https://github.com/apache/arrow/pull/3134]

> [Java] JDBC-to-Arrow Conversion: JDBC Metadata in Schema Fields
> ---
>
> Key: ARROW-3966
> URL: https://issues.apache.org/jira/browse/ARROW-3966
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Michael Pigott
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> This feature extends ARROW-3965 and adds a new option to include JDBC 
> metadata in the Arrow Schema Field Metadata.  To start, the following items 
> are added:
>  * Catalog Name
>  * Table Name
>  * Column Name
>  * Column Type Name
> These are configured by setting the "Include Metadata" boolean flag on the 
> configuration object.  If the value is set to _true_, the above fields will 
> be added.  Otherwise, they will not.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (ARROW-3966) [Java] JDBC-to-Arrow Conversion: JDBC Metadata in Schema Fields

2019-02-06 Thread Uwe L. Korn (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn reassigned ARROW-3966:
--

Assignee: Michael Pigott

> [Java] JDBC-to-Arrow Conversion: JDBC Metadata in Schema Fields
> ---
>
> Key: ARROW-3966
> URL: https://issues.apache.org/jira/browse/ARROW-3966
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Michael Pigott
>Assignee: Michael Pigott
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> This feature extends ARROW-3965 and adds a new option to include JDBC 
> metadata in the Arrow Schema Field Metadata.  To start, the following items 
> are added:
>  * Catalog Name
>  * Table Name
>  * Column Name
>  * Column Type Name
> These are configured by setting the "Include Metadata" boolean flag on the 
> configuration object.  If the value is set to _true_, the above fields will 
> be added.  Otherwise, they will not.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-3434) [Packaging] Add Apache ORC C++ library to conda-forge

2019-02-06 Thread Wes McKinney (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761908#comment-16761908
 ] 

Wes McKinney commented on ARROW-3434:
-

[~jim.crist] would anyone on your end be able to help with this? 

> [Packaging] Add Apache ORC C++ library to conda-forge
> -
>
> Key: ARROW-3434
> URL: https://issues.apache.org/jira/browse/ARROW-3434
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>  Labels: toolchain
> Fix For: 0.14.0
>
>
> In the vein of "toolchain all the things", it would be useful to be able to 
> obtain the ORC static libraries from a conda package rather than building 
> from source every time



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-3434) [Packaging] Add Apache ORC C++ library to conda-forge

2019-02-06 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3434:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [Packaging] Add Apache ORC C++ library to conda-forge
> -
>
> Key: ARROW-3434
> URL: https://issues.apache.org/jira/browse/ARROW-3434
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>  Labels: toolchain
> Fix For: 0.14.0
>
>
> In the vein of "toolchain all the things", it would be useful to be able to 
> obtain the ORC static libraries from a conda package rather than building 
> from source every time



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-3448) [Python] Pandas roundtrip doesn't preserve list of datetime objects

2019-02-06 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3448:

Fix Version/s: 0.14.0

> [Python] Pandas roundtrip doesn't preserve list of datetime objects
> ---
>
> Key: ARROW-3448
> URL: https://issues.apache.org/jira/browse/ARROW-3448
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Krisztian Szucs
>Priority: Major
> Fix For: 0.14.0
>
>
> Adding the following to the pandas_example.py::dataframe_with_lists functionn:
> {code:python}
> datetime_data = [
>  [datetime(2015, 1, 5, 12, 0, 0), datetime(2020, 8, 22, 10, 5, 0)],
>  [datetime(2024, 5, 5, 5, 49, 1), datetime(2015, 12, 24, 22, 10, 17)],
>  [datetime(1996, 4, 30, 2, 38, 11)],
>  None,
>  [datetime(1987, 1, 27, 8, 21, 59)]
> ]
> type = pa.timestamp('s'|'ms'|'us'|'ns')
> {code}
> breaks the tests cases, because the roundtrip doesn't preserve the object 
> type.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-4491) [Python] Remove usage of std::to_string and std::stoi

2019-02-06 Thread Antoine Pitrou (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761895#comment-16761895
 ] 

Antoine Pitrou commented on ARROW-4491:
---

Yes, this is a pitfall of C++ streams as well. You have to cast to/from a 
larger integer type.

> [Python] Remove usage of std::to_string and std::stoi
> -
>
> Key: ARROW-4491
> URL: https://issues.apache.org/jira/browse/ARROW-4491
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Not sure why this is happening, but for some older compilers I'm seeing
> {code:java}
> terminate called after throwing an instance of 'std::invalid_argument'
>   what():  stoi{code}
> since 
> [https://github.com/apache/arrow/pull/3423|https://github.com/apache/arrow/pull/3423.]
> Possible cause is that there is no int8_t version of 
> [https://en.cppreference.com/w/cpp/string/basic_string/to_string|https://en.cppreference.com/w/cpp/string/basic_string/to_string,]
>  so it might not convert it to a proper string representation of the number.
> Any insight on why this could be happening is appreciated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (ARROW-3433) [C++] Validate re2 with Windows toolchain, EP

2019-02-06 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-3433.
-
   Resolution: Fixed
Fix Version/s: (was: 0.13.0)
   0.12.0

Confirmed locally that {{re2_ep}} works fine with MSVC. 

> [C++] Validate re2 with Windows toolchain, EP
> -
>
> Key: ARROW-3433
> URL: https://issues.apache.org/jira/browse/ARROW-3433
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
> Fix For: 0.12.0
>
>
> Follow up to ARROW-3331



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-3424) [Python] Improved workflow for loading an arbitrary collection of Parquet files

2019-02-06 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3424:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [Python] Improved workflow for loading an arbitrary collection of Parquet 
> files
> ---
>
> Key: ARROW-3424
> URL: https://issues.apache.org/jira/browse/ARROW-3424
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: parquet
> Fix For: 0.14.0
>
>
> See SO question for use case: 
> https://stackoverflow.com/questions/52613682/load-multiple-parquet-files-into-dataframe-for-analysis



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4484) [Java] improve Flight DoPut busy wait

2019-02-06 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4484:

Fix Version/s: 0.14.0

> [Java] improve Flight DoPut busy wait
> -
>
> Key: ARROW-4484
> URL: https://issues.apache.org/jira/browse/ARROW-4484
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: FlightRPC, Java
>Reporter: David Li
>Priority: Major
>  Labels: flight
> Fix For: 0.14.0
>
>
> Currently the implementation of putNext in FlightClient.java busy-waits until 
> gRPC indicates that the server can receive a message. We should either 
> improve the busy-wait (e.g. add sleep times), or rethink the API and make it 
> non-blocking.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-3426) [CI] Java integration test very verbose

2019-02-06 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3426:

Fix Version/s: 0.13.0

> [CI] Java integration test very verbose
> ---
>
> Key: ARROW-3426
> URL: https://issues.apache.org/jira/browse/ARROW-3426
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration, Java
>Affects Versions: 0.11.0
>Reporter: Antoine Pitrou
>Priority: Minor
> Fix For: 0.13.0
>
>
> The Java integration tests are very verbose. An example here:
> https://travis-ci.org/apache/arrow/jobs/436731293
> It would be nice to cut down on the logging.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-4419) [Flight] Deal with body buffers in FlightData

2019-02-06 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-4419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4419:

Fix Version/s: 0.14.0

> [Flight] Deal with body buffers in FlightData
> -
>
> Key: ARROW-4419
> URL: https://issues.apache.org/jira/browse/ARROW-4419
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: FlightRPC
>Reporter: David Li
>Priority: Minor
>  Labels: flight
> Fix For: 0.14.0
>
>
> The Java implementation will fail to decode a schema message if the message 
> also contains (empty) body buffers (see ArrowMessage.asSchema's precondition 
> checks). However, clients using default Protobuf serialization will likely 
> write an empty body buffer by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-3426) [CI] Java integration test very verbose

2019-02-06 Thread Wes McKinney (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761879#comment-16761879
 ] 

Wes McKinney commented on ARROW-3426:
-

I think this might be fixed now, but let's confirm and maybe [~bryanc] can take 
a look if it's still too verbose

> [CI] Java integration test very verbose
> ---
>
> Key: ARROW-3426
> URL: https://issues.apache.org/jira/browse/ARROW-3426
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration, Java
>Affects Versions: 0.11.0
>Reporter: Antoine Pitrou
>Priority: Minor
> Fix For: 0.13.0
>
>
> The Java integration tests are very verbose. An example here:
> https://travis-ci.org/apache/arrow/jobs/436731293
> It would be nice to cut down on the logging.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Closed] (ARROW-3594) [Packaging] Build "cares" library in conda-forge

2019-02-06 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-3594.
---
Resolution: Not A Problem

This is being maintained in conda-forge now

> [Packaging] Build "cares" library in conda-forge
> 
>
> Key: ARROW-3594
> URL: https://issues.apache.org/jira/browse/ARROW-3594
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> Dependency of grpc 
> https://github.com/grpc/grpc/blob/master/cmake/cares.cmake



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-3422) [C++] Add "toolchain" target to ensure that all required toolchain libraries are built

2019-02-06 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-3422:
--
Labels: pull-request-available  (was: )

> [C++] Add "toolchain" target to ensure that all required toolchain libraries 
> are built
> --
>
> Key: ARROW-3422
> URL: https://issues.apache.org/jira/browse/ARROW-3422
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> There are cases, such as ARROW-3419, where we need all the external projects 
> to get built



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-3769) [C++] Support reading non-dictionary encoded binary Parquet columns directly as DictionaryArray

2019-02-06 Thread Wes McKinney (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761855#comment-16761855
 ] 

Wes McKinney commented on ARROW-3769:
-

yes, I think that's the basic idea

> [C++] Support reading non-dictionary encoded binary Parquet columns directly 
> as DictionaryArray
> ---
>
> Key: ARROW-3769
> URL: https://issues.apache.org/jira/browse/ARROW-3769
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Hatem Helal
>Priority: Major
>  Labels: parquet
> Fix For: 0.13.0
>
>
> If the goal is to hash this data anyway into a categorical-type array, then 
> it would be better to offer the option to "push down" the hashing into the 
> Parquet read hot path rather than first fully materializing a dense vector of 
> {{ByteArray}} values, which could use a lot of memory after decompression



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-3419) [C++] Run include-what-you-use checks as nightly build

2019-02-06 Thread Wes McKinney (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761839#comment-16761839
 ] 

Wes McKinney commented on ARROW-3419:
-

I changed the title to make this a nightly build

> [C++] Run include-what-you-use checks as nightly build
> --
>
> Key: ARROW-3419
> URL: https://issues.apache.org/jira/browse/ARROW-3419
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> As part of linting (and running linter checks in a separate Travis entry), we 
> should also run include-what-you-use on changed files so that we can force 
> include cleanliness



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (ARROW-3422) [C++] Add "toolchain" target to ensure that all required toolchain libraries are built

2019-02-06 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-3422:
---

Assignee: Wes McKinney

> [C++] Add "toolchain" target to ensure that all required toolchain libraries 
> are built
> --
>
> Key: ARROW-3422
> URL: https://issues.apache.org/jira/browse/ARROW-3422
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> There are cases, such as ARROW-3419, where we need all the external projects 
> to get built



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-3419) [C++] Run include-what-you-use checks as nightly build

2019-02-06 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3419:

Summary: [C++] Run include-what-you-use checks as nightly build  (was: 
[C++] Run include-what-you-use checks in Travis CI)

> [C++] Run include-what-you-use checks as nightly build
> --
>
> Key: ARROW-3419
> URL: https://issues.apache.org/jira/browse/ARROW-3419
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> As part of linting (and running linter checks in a separate Travis entry), we 
> should also run include-what-you-use on changed files so that we can force 
> include cleanliness



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-3422) [C++] Add "toolchain" target to ensure that all required toolchain libraries are built

2019-02-06 Thread Wes McKinney (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761842#comment-16761842
 ] 

Wes McKinney commented on ARROW-3422:
-

This is almost there. I'll review the EP's and make sure they are all attached 
to the "toolchain" target (which is already added)

> [C++] Add "toolchain" target to ensure that all required toolchain libraries 
> are built
> --
>
> Key: ARROW-3422
> URL: https://issues.apache.org/jira/browse/ARROW-3422
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> There are cases, such as ARROW-3419, where we need all the external projects 
> to get built



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-3419) [C++] Run include-what-you-use checks as nightly build

2019-02-06 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3419:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [C++] Run include-what-you-use checks as nightly build
> --
>
> Key: ARROW-3419
> URL: https://issues.apache.org/jira/browse/ARROW-3419
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>
> As part of linting (and running linter checks in a separate Travis entry), we 
> should also run include-what-you-use on changed files so that we can force 
> include cleanliness



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-3401) [C++] Pluggable statistics collector API for unconvertible CSV values

2019-02-06 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3401:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [C++] Pluggable statistics collector API for unconvertible CSV values
> -
>
> Key: ARROW-3401
> URL: https://issues.apache.org/jira/browse/ARROW-3401
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>
> It would be useful to be able to collect statistics (e.g. distinct value 
> counts) about values in a column of a CSV file that cannot be converted to a 
> desired data type. 
> When conversion fails, the converters can call into an abstract API like
> {code}
> statistics_->CannotConvert(token, size);
> {code}
> or something similar



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-3265) [C++] Restore CPACK support for Parquet libraries

2019-02-06 Thread Wes McKinney (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-3265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761820#comment-16761820
 ] 

Wes McKinney commented on ARROW-3265:
-

[~kou] I broke this during the parquet-cpp migration. Is it important?

> [C++] Restore CPACK support for Parquet libraries
> -
>
> Key: ARROW-3265
> URL: https://issues.apache.org/jira/browse/ARROW-3265
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>  Labels: parquet
>
> See https://github.com/apache/parquet-cpp/blob/master/CMakeLists.txt#L32



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-3399) [Python] Cannot serialize numpy matrix object

2019-02-06 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3399:

Summary: [Python] Cannot serialize numpy matrix object  (was: Cannot 
serialize numpy matrix object)

> [Python] Cannot serialize numpy matrix object
> -
>
> Key: ARROW-3399
> URL: https://issues.apache.org/jira/browse/ARROW-3399
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: 0.10.0
>Reporter: Mitar
>Priority: Major
>
> This is a regression from 0.9.0 and happens with 0.10.0 with Python 3.6.5 on 
> Linux.
> {code:java}
> from pyarrow import plasma
> import numpy
> import time
> import subprocess
> import os
> import signal
> m = numpy.matrix(numpy.array([[1, 2], [3, 4]]))
> process = subprocess.Popen(['plasma_store', '-m', '100', '-s', 
> '/tmp/plasma', '-d', '/dev/shm'], stdout=subprocess.DEVNULL, 
> stderr=subprocess.DEVNULL, encoding='utf8', preexec_fn=os.setpgrp)
> time.sleep(5)
> client = plasma.connect('/tmp/plasma', '', 0)
> try:
> client.put(m)
> finally:
> client.disconnect()
> os.killpg(os.getpgid(process.pid), signal.SIGTERM)
> {code}
> Error:
> {noformat}
>   File "pyarrow/_plasma.pyx", line 397, in pyarrow._plasma.PlasmaClient.put
>   File "pyarrow/serialization.pxi", line 338, in pyarrow.lib.serialize
>   File "pyarrow/error.pxi", line 89, in pyarrow.lib.check_status
> pyarrow.lib.ArrowNotImplementedError: This object exceeds the maximum 
> recursion depth. It may contain itself recursively.{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-3379) [C++] Implement regex/multichar delimiter tokenizer

2019-02-06 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3379:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [C++] Implement regex/multichar delimiter tokenizer
> ---
>
> Key: ARROW-3379
> URL: https://issues.apache.org/jira/browse/ARROW-3379
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>  Labels: csv
> Fix For: 0.14.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-3121) [C++] Incremental Mean aggregator

2019-02-06 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3121:

Fix Version/s: 0.13.0

> [C++] Incremental Mean aggregator
> -
>
> Key: ARROW-3121
> URL: https://issues.apache.org/jira/browse/ARROW-3121
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>  Labels: analytics
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-3406) [C++] Create a caching memory pool implementation

2019-02-06 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3406:

Fix Version/s: 0.14.0

> [C++] Create a caching memory pool implementation
> -
>
> Key: ARROW-3406
> URL: https://issues.apache.org/jira/browse/ARROW-3406
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 0.11.0
>Reporter: Antoine Pitrou
>Priority: Major
> Fix For: 0.14.0
>
>
> A caching memory pool implementation would be able to recycle freed memory 
> blocks instead of returning them to the system immediately. Two different 
> policies may be chosen:
> * either an unbounded cache
> * or a size-limited cache, perhaps with some kind of LRU mechanism
> Such a feature might help e.g. for CSV parsing, when reading and parsing data 
> into temporary memory buffers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-3378) [C++] Implement whitespace CSV tokenizer

2019-02-06 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3378:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [C++] Implement whitespace CSV tokenizer
> 
>
> Key: ARROW-3378
> URL: https://issues.apache.org/jira/browse/ARROW-3378
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-3372) [C++] Introduce SlicedBuffer class

2019-02-06 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3372:

Fix Version/s: (was: 0.13.0)

> [C++] Introduce SlicedBuffer class
> --
>
> Key: ARROW-3372
> URL: https://issues.apache.org/jira/browse/ARROW-3372
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>
> The purpose of this class will be to forward certain function calls to the 
> parent buffer, like a request for the device (CPU, GPU, etc.).
> As a result of this, we can remove the {{parent_}} member from {{Buffer}} as 
> that member is only there to support slices. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-3364) [Doc] Document docker compose setup

2019-02-06 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3364:

Fix Version/s: 0.13.0

> [Doc] Document docker compose setup
> ---
>
> Key: ARROW-3364
> URL: https://issues.apache.org/jira/browse/ARROW-3364
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Krisztian Szucs
>Priority: Major
> Fix For: 0.13.0
>
>
> Introduced by https://github.com/apache/arrow/pull/2572



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-3361) [R] Run cpp/build-support/cpplint.py on C++ source files

2019-02-06 Thread Wes McKinney (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-3361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761834#comment-16761834
 ] 

Wes McKinney commented on ARROW-3361:
-

Hm, we are running clang-format. Let me see about running cpplint also

> [R] Run cpp/build-support/cpplint.py on C++ source files
> 
>
> Key: ARROW-3361
> URL: https://issues.apache.org/jira/browse/ARROW-3361
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> This will help with additional code cleanliness



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Closed] (ARROW-3326) [Python] Expose stream alignment function in pyarrow.NativeFile

2019-02-06 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-3326.
---
Resolution: Won't Fix

> [Python] Expose stream alignment function in pyarrow.NativeFile
> ---
>
> Key: ARROW-3326
> URL: https://issues.apache.org/jira/browse/ARROW-3326
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> See also ARROW-3319



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-3162) [Python] Enable Flight servers to be implemented in pure Python

2019-02-06 Thread ASF GitHub Bot (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-3162:
--
Labels: flight pull-request-available  (was: flight)

> [Python] Enable Flight servers to be implemented in pure Python
> ---
>
> Key: ARROW-3162
> URL: https://issues.apache.org/jira/browse/ARROW-3162
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Wes McKinney
>Assignee: David Li
>Priority: Major
>  Labels: flight, pull-request-available
> Fix For: 0.13.0
>
>
> While it will be straightforward to offer a Flight client to Python users, 
> enabling _servers_ to be written _in Python_ will require a glue class to 
> invoke methods on a provided server implementation, coercing to and from 
> various Python objects and Arrow wrapper classes



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-3345) [C++] Specify expected behavior of table concatenation, creating table from multiple record batches, if schema metadata is unequal

2019-02-06 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3345:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [C++] Specify expected behavior of table concatenation, creating table from 
> multiple record batches, if schema metadata is unequal
> --
>
> Key: ARROW-3345
> URL: https://issues.apache.org/jira/browse/ARROW-3345
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-3308) [R] Convert R character vector with data exceeding 2GB to chunked array

2019-02-06 Thread Wes McKinney (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-3308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761831#comment-16761831
 ] 

Wes McKinney commented on ARROW-3308:
-

[~romainfrancois] do you want to look at this? Fixing it amounts to using 
{{arrow::internal::ChunkedBinaryBuilder}} instead of {{arrow::BinaryBuilder}}

> [R] Convert R character vector with data exceeding 2GB to chunked array
> ---
>
> Key: ARROW-3308
> URL: https://issues.apache.org/jira/browse/ARROW-3308
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: R
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-3302) [JS] Reduce number of dependencies

2019-02-06 Thread Wes McKinney (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-3302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761830#comment-16761830
 ] 

Wes McKinney commented on ARROW-3302:
-

Anything that can be done here in the short term?

> [JS] Reduce number of dependencies
> --
>
> Key: ARROW-3302
> URL: https://issues.apache.org/jira/browse/ARROW-3302
> Project: Apache Arrow
>  Issue Type: Task
>  Components: JavaScript
>Reporter: Dominik Moritz
>Assignee: Brian Hulette
>Priority: Major
>  Labels: refactoring
>
> I installed the dependencies for arrow js today and it took 10 minutes (5 
> minutes the for the second run) with npm and less than one minute with yarn 
> (I think there is an argument for using yarn instead of npm). Arrow js has 
> 2792 dependencies right now. Installation takes very long but there is also 
> some danger that the library will be hard to maintain in the future with this 
> many dependencies. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-3078) [Python] Docker integration tests should not contaminate the local Python development environment

2019-02-06 Thread Krisztian Szucs (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-3078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761832#comment-16761832
 ] 

Krisztian Szucs commented on ARROW-3078:


[~wesmckinn] Python directory is not contaminated (in case of ubuntu) but 
others are, see 
https://github.com/apache/arrow/blob/master/docker-compose.yml#L24

> [Python] Docker integration tests should not contaminate the local Python 
> development environment
> -
>
> Key: ARROW-3078
> URL: https://issues.apache.org/jira/browse/ARROW-3078
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> I hit the following error when running run_docker_compose.sh hdfs_integration
> {code}
> ___ ERROR collecting pyarrow/tests/test_builder.py 
> 
> import file mismatch:
> imported module 'pyarrow.tests.test_builder' has this __file__ attribute:
>   /home/wesm/code/arrow/python/pyarrow/tests/test_builder.py
> which is not the same as the test file we want to collect:
>   /apache-arrow/arrow/python/pyarrow/tests/test_builder.py
> HINT: remove __pycache__ / .pyc files and/or use a unique basename for your 
> test file modules
> ___ ERROR collecting pyarrow/tests/test_convert_builtin.py 
> 
> import file mismatch:
> imported module 'pyarrow.tests.test_convert_builtin' has this __file__ 
> attribute:
>   /home/wesm/code/arrow/python/pyarrow/tests/test_convert_builtin.py
> which is not the same as the test file we want to collect:
>   /apache-arrow/arrow/python/pyarrow/tests/test_convert_builtin.py
> HINT: remove __pycache__ / .pyc files and/or use a unique basename for your 
> test file modules
>  ERROR collecting pyarrow/tests/test_convert_pandas.py 
> 
> {code}
> The Docker tests should ideally be isolated from the state of the local git 
> clone



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (ARROW-3297) [Python] Python bindings for Flight C++ client

2019-02-06 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-3297:
---

Assignee: David Li

> [Python] Python bindings for Flight C++ client
> --
>
> Key: ARROW-3297
> URL: https://issues.apache.org/jira/browse/ARROW-3297
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Assignee: David Li
>Priority: Major
> Fix For: 0.13.0
>
>
> Develop {{pyarrow.flight.Client}} that can connect to Flight services and 
> issue RPCs, retrieve datasets



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-3298) [C++] Move murmur3 hash implementation to arrow/util

2019-02-06 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3298:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [C++] Move murmur3 hash implementation to arrow/util
> 
>
> Key: ARROW-3298
> URL: https://issues.apache.org/jira/browse/ARROW-3298
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>
> It would be good to consolidate hashing utility code in a central place (this 
> is currently in src/parquet)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Closed] (ARROW-3295) [Packaging] Package gRPC libraries in conda-forge for use in builds, packaging

2019-02-06 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-3295.
---
Resolution: Duplicate

duplicate of ARROW-3596

> [Packaging] Package gRPC libraries in conda-forge for use in builds, packaging
> --
>
> Key: ARROW-3295
> URL: https://issues.apache.org/jira/browse/ARROW-3295
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Packaging
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> This includes Linux, macOS, and Windows packages, along with gRPC's 
> dependencies (some of which, like BoringSSL, are not in conda-forge yet). 
> This may require patching gRPC's build system (or copying files manually) 
> since it wants to install all its dependencies when you {{make install}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-3203) [C++] Build error on Debian Buster

2019-02-06 Thread albertoramon (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-3203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761829#comment-16761829
 ] 

albertoramon commented on ARROW-3203:
-

Still? I don't know, I'm using Debian Strech as Base in my docker images, for 
now and works fine


I can Debian Buster only for testing proposes, give me some days I will report 
here

> [C++] Build error on Debian Buster
> --
>
> Key: ARROW-3203
> URL: https://issues.apache.org/jira/browse/ARROW-3203
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.10.0
>Reporter: albertoramon
>Priority: Major
> Attachments: DockerfileRV, flatbuffers_ep-build-err.log
>
>
> There is a error with Debian Buster (In Debian Stretch works fine)
> You can test it easily change the first line from dockerfile (attached)
>  
> *To reproduce it:*
> {code:java}
> docker build -f DockerfileRV -t arrow_rw .
> docker run -it arrow_rw bash
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (ARROW-3596) [Packaging] Build gRPC in conda-forge

2019-02-06 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-3596:
---

Assignee: Antoine Pitrou

> [Packaging] Build gRPC in conda-forge
> -
>
> Key: ARROW-3596
> URL: https://issues.apache.org/jira/browse/ARROW-3596
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Antoine Pitrou
>Priority: Major
> Fix For: 0.13.0
>
>
> This may be a bit annoying as grpc's CMake files want to also install its 
> third party dependencies. [~msarahan] pointed me to a version for 
> AnacondaRecipes where there was an effort to "unvendor" these components
> grpc depends on at least
> * protobuf
> * zlib
> * either boringssl or openssl
> * cares
> * gflags
> It looks like the grpc developers want to take on Abseil as a dependency 
> eventually, but it is not required yet



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (ARROW-3290) [C++] Toolchain support for secure gRPC

2019-02-06 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-3290.
-
Resolution: Fixed
  Assignee: David Li

This is done -- we aren't using the unsecure grpc libraries anymore

> [C++] Toolchain support for secure gRPC 
> 
>
> Key: ARROW-3290
> URL: https://issues.apache.org/jira/browse/ARROW-3290
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: David Li
>Priority: Major
> Fix For: 0.13.0
>
>
> In ARROW-3146 I added support for the narrow use case of CMake-installed gRPC 
> and linking with the unsecure libraries. There are a number of additional 
> dependencies to be able to connect to secure services



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-3200) [C++] Add support for reading Flight streams with dictionaries

2019-02-06 Thread Wes McKinney (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-3200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761811#comment-16761811
 ] 

Wes McKinney commented on ARROW-3200:
-

cc [~pitrou], for thoughts. My opinion is that we should make 
{{arrow::DictionaryType}} into a virtual interface and define a 
{{MutableDictionary}} subclass to permit the Flight client or server to add the 
dictionary when the message comes across the wire. This will require some 
refactoring in the IPC code to permit schemas to be decoded while leaving the 
dictionaries temporarily null

> [C++] Add support for reading Flight streams with dictionaries
> --
>
> Key: ARROW-3200
> URL: https://issues.apache.org/jira/browse/ARROW-3200
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>  Labels: flight
> Fix For: 0.13.0
>
>
> Some work is needed to handle schemas sent separately from their 
> dictionaries, i.e. ARROW-3144. I'm going to punt on implementing support for 
> this in the initial C++ Flight client



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Closed] (ARROW-3595) [Packaging] Build boringssl in conda-forge

2019-02-06 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-3595.
---
Resolution: Won't Fix

We are using OpenSSL

> [Packaging] Build boringssl in conda-forge
> --
>
> Key: ARROW-3595
> URL: https://issues.apache.org/jira/browse/ARROW-3595
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> I'm not sure whether boringssl or OpenSSL are preferred for use with gRPC. It 
> would be useful to have boringssl available to have the option to use either
> https://github.com/grpc/grpc/blob/master/cmake/ssl.cmake



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-3769) [C++] Support reading non-dictionary encoded binary Parquet columns directly as DictionaryArray

2019-02-06 Thread Hatem Helal (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761825#comment-16761825
 ] 

Hatem Helal commented on ARROW-3769:


Made a start on the unittests here:

[https://github.com/mathworks/arrow/pull/12]

[~wesmckinn], could you take a look and let me know if this is heading in the 
right direction?

> [C++] Support reading non-dictionary encoded binary Parquet columns directly 
> as DictionaryArray
> ---
>
> Key: ARROW-3769
> URL: https://issues.apache.org/jira/browse/ARROW-3769
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Hatem Helal
>Priority: Major
>  Labels: parquet
> Fix For: 0.13.0
>
>
> If the goal is to hash this data anyway into a categorical-type array, then 
> it would be better to offer the option to "push down" the hashing into the 
> Parquet read hot path rather than first fully materializing a dense vector of 
> {{ByteArray}} values, which could use a lot of memory after decompression



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-3208) [Python] Segmentation fault when reading a Parquet partitioned dataset to a Parquet file

2019-02-06 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3208:

Fix Version/s: 0.13.0

> [Python] Segmentation fault when reading a Parquet partitioned dataset to a 
> Parquet file
> 
>
> Key: ARROW-3208
> URL: https://issues.apache.org/jira/browse/ARROW-3208
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
> Environment: Ubuntu 16.04 LTS; System76 Oryx Pro
>Reporter: Ying Wang
>Priority: Major
>  Labels: parquet
> Fix For: 0.13.0
>
>
> Steps to reproduce:
>  # Create a partitioned dataset with the following code:
> ```python
> import numpy as np
> import pandas as pd
> import pyarrow as pa
> import pyarrow.parquet as pq
> df = pd.DataFrame({ 'one': [-1, 10, 2.5, 100, 1000, 1, 29.2], 'two': [-1, 10, 
> 2, 100, 1000, 1, 11], 'three': [0, 0, 0, 0, 0, 0, 0] })
> table = pa.Table.from_pandas(df)
> pq.write_to_dataset(table, root_path='/home/yingw787/misc/example_dataset', 
> partition_cols=['one', 'two'])
> ```
>  # Create a Parquet file from a PyArrow Table created from the partitioned 
> Parquet dataset:
> ```python
> import pyarrow.parquet as pq
> table = pq.ParquetDataset('/path/to/dataset').read()
> pq.write_table(table, '/path/to/example.parquet')
> ```
> EXPECTED:
>  * Successful write
> GOT:
>  * Segmentation fault
> Issue reference on GitHub mirror: https://github.com/apache/arrow/issues/2511



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Closed] (ARROW-3255) [C++/Python] Migrate Travis CI jobs off Xcode 6.4

2019-02-06 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-3255.
---
Resolution: Fixed

We aren't building on Xcode 6.4 anymore in CI. What is conda-forge doing?

> [C++/Python] Migrate Travis CI jobs off Xcode 6.4
> -
>
> Key: ARROW-3255
> URL: https://issues.apache.org/jira/browse/ARROW-3255
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> Travis CI says they are winding down their support for Xcode 6.4, which we 
> use in our CI as the minimum Xcode which can build Arrow libraries
> "Running builds with Xcode 6.4 in Travis CI is deprecated and will be removed 
> in January 2019.
> If Xcode 6.4 is critical to your builds, please contact our support team at 
> supp...@travis-ci.com to discuss options.
> Services are not supported on osx"
> We should decide if we want to continue to support this version of Xcode, and 
> what are the implications if we do not



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Closed] (ARROW-3280) [Python] Difficulty running tests after conda install

2019-02-06 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-3280.
---
Resolution: Cannot Reproduce

> [Python] Difficulty running tests after conda install
> -
>
> Key: ARROW-3280
> URL: https://issues.apache.org/jira/browse/ARROW-3280
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.10.0
> Environment: conda create -n test-arrow pytest ipython pandas nomkl 
> pyarrow -c conda-forge
> Ubuntu 16.04
>Reporter: Matthew Rocklin
>Priority: Minor
>  Labels: python
>
> I install PyArrow from conda-forge, and then try running tests (or import 
> generally)
> {code:java}
> conda create -n test-arrow pytest ipython pandas nomkl pyarrow -c conda-forge 
> {code}
> {code:java}
> mrocklin@carbon:~/workspace/arrow/python$ py.test 
> pyarrow/tests/test_parquet.py 
> Traceback (most recent call last):
> File 
> "/home/mrocklin/Software/anaconda/lib/python3.6/site-packages/_pytest/config.py",
>  line 328, in _getconftestmodules
> return self._path2confmods[path]
> KeyError: 
> local('/home/mrocklin/workspace/arrow/python/pyarrow/tests/test_parquet.py')During
>  handling of the above exception, another exception occurred:
> Traceback (most recent call last):
> File 
> "/home/mrocklin/Software/anaconda/lib/python3.6/site-packages/_pytest/config.py",
>  line 328, in _getconftestmodules
> return self._path2confmods[path]
> KeyError: local('/home/mrocklin/workspace/arrow/python/pyarrow/tests')During 
> handling of the above exception, another exception occurred:
> Traceback (most recent call last):
> File 
> "/home/mrocklin/Software/anaconda/lib/python3.6/site-packages/_pytest/config.py",
>  line 359, in _importconftest
> return self._conftestpath2mod[conftestpath]
> KeyError: 
> local('/home/mrocklin/workspace/arrow/python/pyarrow/tests/conftest.py')During
>  handling of the above exception, another exception occurred:
> Traceback (most recent call last):
> File 
> "/home/mrocklin/Software/anaconda/lib/python3.6/site-packages/_pytest/config.py",
>  line 365, in _importconftest
> mod = conftestpath.pyimport()
> File 
> "/home/mrocklin/Software/anaconda/lib/python3.6/site-packages/py/_path/local.py",
>  line 668, in pyimport
> __import__(modname)
> File "/home/mrocklin/workspace/arrow/python/pyarrow/__init__.py", line 54, in 
> 
> from pyarrow.lib import cpu_count, set_cpu_count
> ModuleNotFoundError: No module named 'pyarrow.lib'
> ERROR: could not load 
> /home/mrocklin/workspace/arrow/python/pyarrow/tests/conftest.py{code}
> Probably this is something wrong with my environment, but I thought I'd 
> report it as a usability bug



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-3245) [Python] Infer index and/or filtering from parquet column statistics

2019-02-06 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3245:

Fix Version/s: 0.14.0

> [Python] Infer index and/or filtering from parquet column statistics
> 
>
> Key: ARROW-3245
> URL: https://issues.apache.org/jira/browse/ARROW-3245
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Martin Durant
>Priority: Major
>  Labels: parquet
> Fix For: 0.14.0
>
>
> The metadata included in parquet generally gives the min/max of data for each 
> chunk of each column. This allows early filtering out of whole chunks if they 
> do not meet some criterion, and can greatly reduce reading burden in some 
> circumstances. In Dask, we care about this for setting an index and its 
> "divisions" (start/stop values for each data partition) and for directly 
> avoiding including some chunks in the graph of tasks to be processed. 
> Similarly, filtering may be applied on the values of fields defined by the 
> directory partitioning.
> Currently, dask using the fastparquet backend is able to infer possible 
> columns to use as an index, perform filtering on that index and do general 
> filtering on any column which has statistical or partitioning information. It 
> would be very helpful to have such facilities via pyarrow also.
>  This is probably the most important of the requests from Dask.
> (please forgive that some of this has already been mentioned elsewhere; this 
> is one of the entries in the list at 
> [https://github.com/dask/fastparquet/issues/374] as a feature that is useful 
> in fastparquet)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-3246) [Python] direct reading/writing of pandas categoricals in parquet

2019-02-06 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3246:

Fix Version/s: 0.13.0

> [Python] direct reading/writing of pandas categoricals in parquet
> -
>
> Key: ARROW-3246
> URL: https://issues.apache.org/jira/browse/ARROW-3246
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Martin Durant
>Priority: Minor
>  Labels: parquet
> Fix For: 0.13.0
>
>
> Parquet supports "dictionary encoding" of column data in a manner very 
> similar to the concept of Categoricals in pandas. It is natural to use this 
> encoding for a column which originated as a categorical. Conversely, when 
> loading, if the file metadata says that a given column came from a pandas (or 
> arrow) categorical, then we can trust that the whole of the column is 
> dictionary-encoded and load the data directly into a categorical column, 
> rather than expanding the labels upon load and recategorising later.
> If the data does not have the pandas metadata, then the guarantee cannot 
> hold, and we cannot assume either that the whole column is dictionary encoded 
> or that the labels are the same throughout. In this case, the current 
> behaviour is fine.
>  
> (please forgive that some of this has already been mentioned elsewhere; this 
> is one of the entries in the list at 
> [https://github.com/dask/fastparquet/issues/374] as a feature that is useful 
> in fastparquet)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-3244) [Python] Multi-file parquet loading without scan

2019-02-06 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3244:

Fix Version/s: 0.14.0

> [Python] Multi-file parquet loading without scan
> 
>
> Key: ARROW-3244
> URL: https://issues.apache.org/jira/browse/ARROW-3244
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Martin Durant
>Priority: Major
>  Labels: parquet
> Fix For: 0.14.0
>
>
> A number of mechanism are possible to avoid having to access and read the 
> parquet footers in a data set consisting of a number of files. In the case of 
> a large number of data files (perhaps split with directory partitioning) and 
> remote storage, this can be a significant overhead. This is significant from 
> the point of view of Dask, which must have the metadata available in the 
> client before setting up computational graphs.
>  
> Here are some suggestions of what could be done.
>  
>  * some parquet writing frameworks include a `_metadata` file, which contains 
> all the information from the footers of the various files. If this file is 
> present, then this data can be read from one place, with a single file 
> access. For a large number of files, parsing the thrift information may, by 
> itself, be a non-negligible overhead≥
>  * the schema (dtypes) can be found in a `_common_metadata`, or from any one 
> of the data-files, then the schema could be assumed (perhaps at the user's 
> option) to be the same for all of the files. However, the information about 
> the directory partitioning would not be available. Although Dask may infer 
> the information from the filenames, it would be preferable to go through the 
> machinery with parquet-cpp, and view the whole data-set as a single object. 
> Note that the files will still need to have the footer read to access the 
> data, for the bytes offsets, but from Dask's point of view, this would be 
> deferred to tasks running in parallel.
> (please forgive that some of this has already been mentioned elsewhere; this 
> is one of the entries in the list at 
> [https://github.com/dask/fastparquet/issues/374] as a feature that is useful 
> in fastparquet)
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-3055) [C++] Use byte pop counts to bypass bit checking in some cases in BitmapReader classes

2019-02-06 Thread Antoine Pitrou (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761806#comment-16761806
 ] 

Antoine Pitrou commented on ARROW-3055:
---

If there is a particular workload we want to optimize, I think experimenting 
with an unrolled version of `ArrayDataVisitor` might be a more promising track.

> [C++] Use byte pop counts to bypass bit checking in some cases in 
> BitmapReader classes
> --
>
> Key: ARROW-3055
> URL: https://issues.apache.org/jira/browse/ARROW-3055
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>
> When performing a scan of a bitmap, skipping bit checking math could improve 
> performance, but we would need to investigate



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-3210) [Python] Creating ParquetDataset creates partitioned ParquetFiles with mismatched Parquet schemas

2019-02-06 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3210:

Fix Version/s: 0.13.0

> [Python] Creating ParquetDataset creates partitioned ParquetFiles with 
> mismatched Parquet schemas
> -
>
> Key: ARROW-3210
> URL: https://issues.apache.org/jira/browse/ARROW-3210
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
> Environment: Ubuntu 16.04 LTS, System76 Oryx Pro
>Reporter: Ying Wang
>Priority: Major
>  Labels: parquet
> Fix For: 0.13.0
>
> Attachments: environment.yml, repro.csv, repro.py, repro_2.py
>
>
> STEPS TO REPRODUCE:
> 1. Create a conda environment reflecting [^environment.yml]
> 2. Execute script [^repro.py], replacing various config variables to create a 
> ParquetDataset on S3 given [^repro.csv]
> 3. Create reference of ParquetDataset using script [^repro_2.py], again 
> replacing various config variables.
>  
> EXPECTED:
> Reference is created correctly.
> GOT:
> Mismatched Arrow schemas in validate_schemas() method:
>  
> ```python
> *** ValueError: Schema in partition[Draught=1, Name=1, VesselType=0, x=1, 
> Heading=1] 
> s3://kio-tests-files/_tmp/test_parquet_dataset/Draught=10.3/Name=MSC 
> RAFAELA/VesselType=Cargo/x=130.43158/Heading=270.0/e9e3cea5a5c24c4da587c263ec817c98.parquet
>  was different. 
> Record_ID: int64
> y: double
> TRACKID: string
> MMSI: int64
> IMO: int64
> AgeMinutes: double
> SoG: double
> Width: int64
> Length: int64
> Callsign: string
> Destination: string
> ETA: int64
> Status: string
> ExtraInfo: string
> TIMESTAMP: int64
> __index_level_0__: int64
> metadata
> 
> {b'pandas': b'{"index_columns": ["__index_level_0__"], "column_indexes": 
> [{"na'
>  b'me": null, "field_name": null, "pandas_type": "unicode", "numpy_'
>  b'type": "object", "metadata": \{"encoding": "UTF-8"}}], "columns":'
>  b' [{"name": "Record_ID", "field_name": "Record_ID", "pandas_type"'
>  b': "int64", "numpy_type": "int64", "metadata": null}, {"name": "y'
>  b'", "field_name": "y", "pandas_type": "float64", "numpy_type": "f'
>  b'loat64", "metadata": null}, {"name": "TRACKID", "field_name": "T'
>  b'RACKID", "pandas_type": "unicode", "numpy_type": "object", "meta'
>  b'data": null}, {"name": "MMSI", "field_name": "MMSI", "pandas_typ'
>  b'e": "int64", "numpy_type": "int64", "metadata": null}, {"name": '
>  b'"IMO", "field_name": "IMO", "pandas_type": "int64", "numpy_type"'
>  b': "int64", "metadata": null}, {"name": "AgeMinutes", "field_name'
>  b'": "AgeMinutes", "pandas_type": "float64", "numpy_type": "float6'
>  b'4", "metadata": null}, {"name": "SoG", "field_name": "SoG", "pan'
>  b'das_type": "float64", "numpy_type": "float64", "metadata": null}'
>  b', {"name": "Width", "field_name": "Width", "pandas_type": "int64'
>  b'", "numpy_type": "int64", "metadata": null}, {"name": "Length", '
>  b'"field_name": "Length", "pandas_type": "int64", "numpy_type": "i'
>  b'nt64", "metadata": null}, {"name": "Callsign", "field_name": "Ca'
>  b'llsign", "pandas_type": "unicode", "numpy_type": "object", "meta'
>  b'data": null}, {"name": "Destination", "field_name": "Destination'
>  b'", "pandas_type": "unicode", "numpy_type": "object", "metadata":'
>  b' null}, {"name": "ETA", "field_name": "ETA", "pandas_type": "int'
>  b'64", "numpy_type": "int64", "metadata": null}, {"name": "Status"'
>  b', "field_name": "Status", "pandas_type": "unicode", "numpy_type"'
>  b': "object", "metadata": null}, {"name": "ExtraInfo", "field_name'
>  b'": "ExtraInfo", "pandas_type": "unicode", "numpy_type": "object"'
>  b', "metadata": null}, {"name": "TIMESTAMP", "field_name": "TIMEST'
>  b'AMP", "pandas_type": "int64", "numpy_type": "int64", "metadata":'
>  b' null}, {"name": null, "field_name": "__index_level_0__", "panda'
>  b's_type": "int64", "numpy_type": "int64", "metadata": null}], "pa'
>  b'ndas_version": "0.21.0"}'}
> vs
> Record_ID: int64
> y: double
> TRACKID: string
> MMSI: int64
> IMO: int64
> AgeMinutes: double
> SoG: double
> Width: int64
> Length: int64
> Callsign: string
> Destination: string
> ETA: int64
> Status: string
> ExtraInfo: null
> TIMESTAMP: int64
> __index_level_0__: int64
> metadata
> 
> {b'pandas': b'{"index_columns": ["__index_level_0__"], "column_indexes": 
> [{"na'
>  b'me": null, "field_name": null, "pandas_type": "unicode", "numpy_'
>  b'type": "object", "metadata": \{"encoding": "UTF-8"}}], "columns":'
>  b' [{"name": "Record_ID", "field_name": "Record_ID", "pandas_type"'
>  b': "int64", "numpy_type": "int64", "metadata": null}, {"name": "y'
>  b'", "field_name": "y", "pandas_type": "float64", "numpy_type": "f'
>  b'loat64", "metadata": null}, {"name": "TRACKID",

[jira] [Updated] (ARROW-3226) [C++][Plasma] Plasma Store will crash using small memory

2019-02-06 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3226:

Summary: [C++][Plasma] Plasma Store will crash using small memory  (was: 
Plasma Store will crash using small memory)

> [C++][Plasma] Plasma Store will crash using small memory
> 
>
> Key: ARROW-3226
> URL: https://issues.apache.org/jira/browse/ARROW-3226
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Yuhong Guo
>Assignee: Yuhong Guo
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Plasma Store will do the eviction when the memory allocation is not enough. 
> When specified a smaller store limit, Plasma Store will crash when limit 
> memory reached. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-3232) [Python] Return an ndarray from Column.to_pandas

2019-02-06 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3232:

Fix Version/s: 0.13.0

> [Python] Return an ndarray from Column.to_pandas
> 
>
> Key: ARROW-3232
> URL: https://issues.apache.org/jira/browse/ARROW-3232
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Krisztian Szucs
>Priority: Major
> Fix For: 0.13.0
>
>
> See discussion: 
> https://github.com/apache/arrow/pull/2535#discussion_r216299243



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-3220) [Python] Add writeat method to writeable NativeFile

2019-02-06 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3220:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [Python] Add writeat method to writeable NativeFile
> ---
>
> Key: ARROW-3220
> URL: https://issues.apache.org/jira/browse/ARROW-3220
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Pearu Peterson
>Priority: Major
> Fix For: 0.14.0
>
>
> See https://github.com/apache/arrow/pull/2536#discussion_r216384311



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-3203) [C++] Build error on Debian Buster

2019-02-06 Thread Wes McKinney (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-3203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761813#comment-16761813
 ] 

Wes McKinney commented on ARROW-3203:
-

Is this still a problem?

> [C++] Build error on Debian Buster
> --
>
> Key: ARROW-3203
> URL: https://issues.apache.org/jira/browse/ARROW-3203
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.10.0
>Reporter: albertoramon
>Priority: Major
> Attachments: DockerfileRV, flatbuffers_ep-build-err.log
>
>
> There is a error with Debian Buster (In Debian Stretch works fine)
> You can test it easily change the first line from dockerfile (attached)
>  
> *To reproduce it:*
> {code:java}
> docker build -f DockerfileRV -t arrow_rw .
> docker run -it arrow_rw bash
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-3191) [Java] Add support for ArrowBuf to point to arbitrary memory.

2019-02-06 Thread Wes McKinney (JIRA)



[ 
https://issues.apache.org/jira/browse/ARROW-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761808#comment-16761808
 ] 

Wes McKinney commented on ARROW-3191:
-

Would anyone like to take a look at this? [~pravindra]

> [Java] Add support for ArrowBuf to point to arbitrary memory.
> -
>
> Key: ARROW-3191
> URL: https://issues.apache.org/jira/browse/ARROW-3191
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java
>Reporter: Jacques Nadeau
>Priority: Major
> Fix For: 0.13.0
>
>
> Right now ArrowBuf can only point to memory managed by an Arrow Allocator. 
> This is because in many cases we want to be able to support hierarchical 
> accounting of memory and the ability to transfer memory ownership between 
> separate allocators within the same hierarchy.
> At the same time, there are definitely times where someone might want to map 
> some amount of arbitrary off-heap memory. In these situations they should 
> still be able to use ArrowBuf.
> I propose we have a new ArrowBuf constructor that takes an input that 
> subclasses an interface similar to:
> {code}
> public abstract class Memory  {
>   protected final int length;
>   protected final long address;
>   protected abstract void release();
> }
> {code}
> We then make it so all the memory transfer semantics and accounting behavior 
> are noops for this type of memory. The target of this work will be to make 
> sure that all the fast paths continue to be efficient but some of the other 
> paths like transfer can include a conditional (either directly or through 
> alternative implementations of things like ledger).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-3192) [Java] Implement "ArrowBufReadChannel" abstraction and alternate MessageSerializer that uses this

2019-02-06 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3192:

Fix Version/s: (was: 0.13.0)
   0.14.0

> [Java] Implement "ArrowBufReadChannel" abstraction and alternate 
> MessageSerializer that uses this
> -
>
> Key: ARROW-3192
> URL: https://issues.apache.org/jira/browse/ARROW-3192
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>
> The current MessageSerializer implementation is wasteful when used to read an 
> IPC payload that is already in-memory in an {{ArrowBuf}}. In particular, 
> reads out of a {{ReadChannel}} require memory allocation
> * 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageSerializer.java#L569
> * 
> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageSerializer.java#L290
> In C++, we have abstracted memory allocation out of the IPC read path so that 
> zero-copy is possible. I suggest that a similar mechanism can be developed 
> for Java to improve deserialization performance for in-memory messages. The 
> new interface would return {{ArrowBuf}} when performing reads, which could be 
> zero-copy when possible, but when not the current strategy of allocate-copy 
> could be used



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARROW-3191) [Java] Add support for ArrowBuf to point to arbitrary memory.

2019-02-06 Thread Wes McKinney (JIRA)



 [ 
https://issues.apache.org/jira/browse/ARROW-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3191:

Fix Version/s: 0.13.0

> [Java] Add support for ArrowBuf to point to arbitrary memory.
> -
>
> Key: ARROW-3191
> URL: https://issues.apache.org/jira/browse/ARROW-3191
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java
>Reporter: Jacques Nadeau
>Priority: Major
> Fix For: 0.13.0
>
>
> Right now ArrowBuf can only point to memory managed by an Arrow Allocator. 
> This is because in many cases we want to be able to support hierarchical 
> accounting of memory and the ability to transfer memory ownership between 
> separate allocators within the same hierarchy.
> At the same time, there are definitely times where someone might want to map 
> some amount of arbitrary off-heap memory. In these situations they should 
> still be able to use ArrowBuf.
> I propose we have a new ArrowBuf constructor that takes an input that 
> subclasses an interface similar to:
> {code}
> public abstract class Memory  {
>   protected final int length;
>   protected final long address;
>   protected abstract void release();
> }
> {code}
> We then make it so all the memory transfer semantics and accounting behavior 
> are noops for this type of memory. The target of this work will be to make 
> sure that all the fast paths continue to be efficient but some of the other 
> paths like transfer can include a conditional (either directly or through 
> alternative implementations of things like ledger).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

1 2 >

1 - 100 of 139 matches

Mail list logo