RE: Arrow Datasets Functionality for Python

2020-02-18 Thread Matthew Turner
Noted, thanks. Will be in touch. Matthew M. Turner Email: matthew.m.tur...@outlook.com Phone: (908)-868-2786 -Original Message- From: Wes McKinney Sent: Tuesday, February 18, 2020 3:30 AM To: dev Subject: Re: Arrow Datasets Functionality for Python hi Matthew, Thanks -- our contribut

[jira] [Created] (ARROW-7881) [C++] Fix pedantic warnings

2020-02-18 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7881: -- Summary: [C++] Fix pedantic warnings Key: ARROW-7881 URL: https://issues.apache.org/jira/browse/ARROW-7881 Project: Apache Arrow Issue Type: Improvement

[jira] [Created] (ARROW-7880) [CI][R] R sanitizer job is not really working

2020-02-18 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7880: -- Summary: [CI][R] R sanitizer job is not really working Key: ARROW-7880 URL: https://issues.apache.org/jira/browse/ARROW-7880 Project: Apache Arrow Issue

Re: [VOTE] Adopt Arrow in-process C Data Interface specification

2020-02-18 Thread Wes McKinney
A week has passed, I would say we should move forward with merging patches related to this. Any last words (in the next 12 hours or so)? On Tue, Feb 18, 2020 at 7:48 AM Krisztián Szűcs wrote: > > +1 (binding) > > On Tue, Feb 18, 2020 at 10:47 AM Antoine Pitrou wrote: > > > > > > There has also b

[jira] [Created] (ARROW-7879) [C++][Doc] Add doc for the Device API

2020-02-18 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7879: - Summary: [C++][Doc] Add doc for the Device API Key: ARROW-7879 URL: https://issues.apache.org/jira/browse/ARROW-7879 Project: Apache Arrow Issue Type: Impr

Re: [Format] Dictionary edge cases (encoding nulls and nested dictionaries)

2020-02-18 Thread Wes McKinney
On Tue, Feb 18, 2020 at 2:01 AM Micah Kornfield wrote: > > >> * evaluating an expression like SUM(ISNULL($field)) is more >> semantically ambiguous (you have to check more things) when $field is >> a dictionary-encoded type and the values of the dictionary could be >> null > > It is this type of t

[jira] [Created] (ARROW-7878) [C++] Implement LogicalPlan and LogicalPlanBuilder

2020-02-18 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7878: - Summary: [C++] Implement LogicalPlan and LogicalPlanBuilder Key: ARROW-7878 URL: https://issues.apache.org/jira/browse/ARROW-7878 Project: Apache Arr

[jira] [Created] (ARROW-7877) [Packaging] Fix crossbow deployment to github artifacts

2020-02-18 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-7877: -- Summary: [Packaging] Fix crossbow deployment to github artifacts Key: ARROW-7877 URL: https://issues.apache.org/jira/browse/ARROW-7877 Project: Apache Arrow

Re: [NIGHTLY] Arrow Build Report for Job nightly-2020-02-16-0

2020-02-18 Thread Krisztián Szűcs
On Mon, Feb 17, 2020 at 5:28 PM Wes McKinney wrote: > > On Mon, Feb 17, 2020 at 10:19 AM Neal Richardson > wrote: > > > > Ok, I made https://issues.apache.org/jira/browse/ARROW-7870 to look into > > this further. > > > > If the flaky nightly builds persist, maybe we should suspend the uploading >

[jira] [Created] (ARROW-7876) [R] Installation fails in the documentation generation image

2020-02-18 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-7876: -- Summary: [R] Installation fails in the documentation generation image Key: ARROW-7876 URL: https://issues.apache.org/jira/browse/ARROW-7876 Project: Apache Arrow

[NIGHTLY] Arrow Build Report for Job nightly-2020-02-18-0

2020-02-18 Thread Crossbow
Arrow Build Report for Job nightly-2020-02-18-0 All tasks: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-18-0 Failed Tasks: - centos-7: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-18-0-azure-centos-7 - centos-8: URL: https://gi

[jira] [Created] (ARROW-7875) Decimal place getting shifted

2020-02-18 Thread Larry Parker (Jira)
Larry Parker created ARROW-7875: --- Summary: Decimal place getting shifted Key: ARROW-7875 URL: https://issues.apache.org/jira/browse/ARROW-7875 Project: Apache Arrow Issue Type: Bug

[jira] [Created] (ARROW-7874) [Python][Archery] Validate docstrings with numpydoc

2020-02-18 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-7874: -- Summary: [Python][Archery] Validate docstrings with numpydoc Key: ARROW-7874 URL: https://issues.apache.org/jira/browse/ARROW-7874 Project: Apache Arrow

Re: [VOTE] Adopt Arrow in-process C Data Interface specification

2020-02-18 Thread Krisztián Szűcs
+1 (binding) On Tue, Feb 18, 2020 at 10:47 AM Antoine Pitrou wrote: > > > There has also been interest from DuckDB: > https://github.com/cwida/duckdb/issues/151 > > Regards > > Antoine. > > > On Tue, 18 Feb 2020 02:37:43 -0600 > Wes McKinney wrote: > > As I recall TFX developers weighed in that

[jira] [Created] (ARROW-7873) Segfault in pandas version 1.0.1, read_parquet after creating a clickhouse odbc connection

2020-02-18 Thread Matt Calder (Jira)
Matt Calder created ARROW-7873: -- Summary: Segfault in pandas version 1.0.1, read_parquet after creating a clickhouse odbc connection Key: ARROW-7873 URL: https://issues.apache.org/jira/browse/ARROW-7873

Re: [VOTE] Adopt Arrow in-process C Data Interface specification

2020-02-18 Thread Antoine Pitrou
There has also been interest from DuckDB: https://github.com/cwida/duckdb/issues/151 Regards Antoine. On Tue, 18 Feb 2020 02:37:43 -0600 Wes McKinney wrote: > As I recall TFX developers weighed in that this would be helpful for > TensorFlow-related use cases where they are concerns about C++

[jira] [Created] (ARROW-7872) [Python] Support conversion of list-of-struct in Array/Table.to_pandas

2020-02-18 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-7872: --- Summary: [Python] Support conversion of list-of-struct in Array/Table.to_pandas Key: ARROW-7872 URL: https://issues.apache.org/jira/browse/ARROW-7872 Project: Apache Ar

Re: [VOTE] Adopt Arrow in-process C Data Interface specification

2020-02-18 Thread Wes McKinney
As I recall TFX developers weighed in that this would be helpful for TensorFlow-related use cases where they are concerns about C++ ABI compatibility. Since this project has been ongoing for about 5 months (see also related discussion around implementation guidelines for third parties [1]) there ha

Re: Arrow Datasets Functionality for Python

2020-02-18 Thread Wes McKinney
hi Matthew, Thanks -- our contribution workflow is roughly to define JIRA tickets and then submit pull requests. Adding documentation or examples is also helpful When there is uncertainty about the scope of a ticket or the solution approach, feel free to ask questions and we will try to provide f

Re: [Format] Dictionary edge cases (encoding nulls and nested dictionaries)

2020-02-18 Thread Micah Kornfield
> * evaluating an expression like SUM(ISNULL($field)) is more > semantically ambiguous (you have to check more things) when $field is > a dictionary-encoded type and the values of the dictionary could be > null It is this type of thing that I'm worried about (parquet just happens to be where I'm w