RE: Arrow sync call Wednesday 17:00 UTC / 12p US-Eastern

2019-02-06 Thread Nick Haddad
Notes from Today's Sync Call: # Attendees - Wes McKinney (Ursa Labs) - Jacque Nadeau - Dremio - Ben Keitzman (Ursa Labs) - Uwe Korn - Praveen Kumar - Dremio - Francois Saint-Jacques - Hatem Helal (MathWorks) - Nick Haddad (MathWorks) # Agenda Items - Qualify MATLAB code in Arrow Travis

Re: Distributing Arrow in Debian and Fedora

2019-02-06 Thread Kouhei Sutou
Hi, > Kou, would you be interested in taking the task of submitting to Debian and > Fedora the arrow library? I'm sorry but I'm not interested in it. I have only limited time because I'm not a full-time Apache Arrow developer. I want to spent my time for Ruby related work as much as possible.

[jira] [Created] (ARROW-4495) [C++][Gandiva] TestCastTimestampErrors failed in gandiva-precompiled-time_test in MSVC

2019-02-06 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-4495: --- Summary: [C++][Gandiva] TestCastTimestampErrors failed in gandiva-precompiled-time_test in MSVC Key: ARROW-4495 URL: https://issues.apache.org/jira/browse/ARROW-4495

[C++] LLVM / clang bits in Arrow upgraded to LLVM 7

2019-02-06 Thread Wes McKinney
I just merged ARROW-3972 https://github.com/apache/arrow/commit/c81fbaa20ce4d14ae08bbb2f4a959778f1069d83 with Ravindra's and Uwe's help which upgrades the codebase to LLVM 7. You will want to upgrade your clang, LLVM, and clang-format for C++ development. Luckily it seems that the behavior of

Re: Distributing Arrow in Debian and Fedora

2019-02-06 Thread Javier Luraschi
Kou, would you be interested in taking the task of submitting to Debian and Fedora the arrow library? It would be ideal since it seems like you have a clear understanding of what needs to be done. Is there a timeline you would want to follow? Is there anything we can do to help? On Wed, Feb 6,

Re: TensorFlow, PyTorch, and manylinux1

2019-02-06 Thread Philipp Moritz
Would building our manylinux2010 wheels against https://github.com/pypa/manylinux/pull/252 solve the C++11 problems? In that case we should just do that. Otherwise let's propose a minimally modified manylinux2011 that fixes C++11 support so we can move on and don't have to wait 9 more months till

[jira] [Created] (ARROW-4494) [Java] arrow-jdbc JAR is not uploaded on release

2019-02-06 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-4494: -- Summary: [Java] arrow-jdbc JAR is not uploaded on release Key: ARROW-4494 URL: https://issues.apache.org/jira/browse/ARROW-4494 Project: Apache Arrow Issue

[jira] [Created] (ARROW-4493) [Rust] Clean up match

2019-02-06 Thread Tupshin Harper (JIRA)
Tupshin Harper created ARROW-4493: - Summary: [Rust] Clean up match Key: ARROW-4493 URL: https://issues.apache.org/jira/browse/ARROW-4493 Project: Apache Arrow Issue Type: Bug

Re: TensorFlow, PyTorch, and manylinux1

2019-02-06 Thread Philipp Moritz
The problems arose if some functionality of C++11 were used. It led to certain symbols being statically linked into the shared library which clashed with other shared libraries that had the same symbols in the same address space, linked against a different version of libstdc++ (specifically,

Re: UTF-8 and Binary logical types

2019-02-06 Thread Hatem Helal
Hi Wes, Yes, the UTF8 ConvertedType is what I was after. Thanks for the helpful references. I don't have a good feel for how common this is but the following test file caused my confusion between UTF8 and Binary types in Arrow:

Re: UTF-8 and Binary logical types

2019-02-06 Thread Wes McKinney
hi Hatem, Are you talking about the UTF8 ConvertedType in Parquet? https://github.com/apache/arrow/blob/master/cpp/src/parquet/parquet.thrift#L52 AFAIK we do respect that if it is set, otherwise we do not guess https://github.com/apache/arrow/blob/master/cpp/src/parquet/arrow/schema.cc#L65 -

Re: TensorFlow, PyTorch, and manylinux1

2019-02-06 Thread Antoine Pitrou
Le 06/02/2019 à 14:27, Manuel Klimek a écrit : > On Wed, Feb 6, 2019 at 12:38 PM Antoine Pitrou > wrote: > > > Le 06/02/2019 à 01:06, Philipp Moritz a écrit : > > Thanks for the meeting! One question concerning a point that is still > > not super clear

Re: UTF-8 and Binary logical types

2019-02-06 Thread Hatem Helal
Thanks Antoine, that makes good sense. We are writing string data using the utf8 data type. This question came up when trying to read this fastparquet project test file into arrow memory: fastparquet/test-data/nation.dict.parquet The name and comment columns results in a binary

Re: UTF-8 and Binary logical types

2019-02-06 Thread Antoine Pitrou
Hi Hatem, It is intended that the convention is application-dependent. From Arrow's point of view, the binary string is an opaque blob of data. Depending on your application, it might be an UTF16-encoded piece of text, a JPEG image, anything. By the way, if you store ASCII text data, I would

Re: TensorFlow, PyTorch, and manylinux1

2019-02-06 Thread Antoine Pitrou
Le 06/02/2019 à 01:06, Philipp Moritz a écrit : > Thanks for the meeting! One question concerning a point that is still > not super clear to me: > > Say we define a new manylinux standard based on gcc >=5 (with stable > c++11 support). There will still be a lot of wheels form the manylinux1 >

UTF-8 and Binary logical types

2019-02-06 Thread Hatem Helal
Hi all, I wanted to make sure I understood the distinction/use cases for choosing between the utf8 and binary logical types. Based on this doc * Utf8 data is Unicode values with UTF-8 encoding * Binary is any other variable

[jira] [Created] (ARROW-4492) ValueError: Categorical categories must be unique

2019-02-06 Thread George Sakkis (JIRA)
George Sakkis created ARROW-4492: Summary: ValueError: Categorical categories must be unique Key: ARROW-4492 URL: https://issues.apache.org/jira/browse/ARROW-4492 Project: Apache Arrow Issue

Re: Distributing Arrow in Debian and Fedora

2019-02-06 Thread Jeroen Ooms
On Sat, Feb 2, 2019 at 10:29 PM Kouhei Sutou wrote: > We need to consider how to update package versions. > We'll release Apache Arrow more frequency than Debian and > Fedora. If we don't update packages on Debian and Fedora, > they will ship old Apache Arrow. I think it's initially fine to