[jira] [Created] (ARROW-7830) [C++] Parquet library version doesn't change with releases

2020-02-10 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7830:
--

 Summary: [C++] Parquet library version doesn't change with releases
 Key: ARROW-7830
 URL: https://issues.apache.org/jira/browse/ARROW-7830
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Neal Richardson


[~jeroenooms] pointed this out to me. 

{code}
$ pkg-config --modversion arrow
0.16.0
$ pkg-config --modversion arrow-dataset
0.16.0
$ pkg-config --modversion parquet
1.5.1-SNAPSHOT
{code}

I get that parquet-cpp is technically not part of Apache Arrow, but if we're 
releasing a libparquet with libarrow at our release time, wouldn't it make 
sense to at least bump the parquet version at the same time, even if the 
version numbers aren't the same?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7829) [R] Test R bindings on clang

2020-02-10 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7829:
--

 Summary: [R] Test R bindings on clang
 Key: ARROW-7829
 URL: https://issues.apache.org/jira/browse/ARROW-7829
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration, R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 1.0.0


Followup to ARROW-7817. We're generating warnings that will fail a CRAN 
submission but due to Travis stuff we aren't catching them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7828) [Release] Remove SSH keys for internal use

2020-02-10 Thread Kouhei Sutou (Jira)
Kouhei Sutou created ARROW-7828:
---

 Summary: [Release] Remove SSH keys for internal use
 Key: ARROW-7828
 URL: https://issues.apache.org/jira/browse/ARROW-7828
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Packaging
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou


{{dev/release/binary/id_rsa*}} SSH keys are only used to login to local Docker 
container for
releasing binary artifacts. They aren't used over network. So putting
them to this public repository is safe.

But there are people who report "they may be danger" to us. We should
remove them from this repository and generate them locally instead of
describing why they aren't danger to reporters.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7827) conda-forge pyarrow package does not have s3 enabled

2020-02-10 Thread Catherine (Jira)
Catherine created ARROW-7827:


 Summary: conda-forge pyarrow package does not have s3 enabled
 Key: ARROW-7827
 URL: https://issues.apache.org/jira/browse/ARROW-7827
 Project: Apache Arrow
  Issue Type: Bug
  Components: Continuous Integration, Packaging
Affects Versions: 0.16.0
Reporter: Catherine


{{[conda-recipes/pyarrow}}{{|[https://github.com/apache/arrow/tree/master/dev/tasks/conda-recipes/pyarrow]]
 }}has S3 and some others enabled, but{{ 
[conda-forge/pyarrow/feedstock|[https://github.com/conda-forge/pyarrow-feedstock/tree/master/recipe]]
 }}{{does not. }}It seems like{{ bld.bat }}and{{ build.sh }}in conda-forge 
feedstock are outdated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [Format] Dictionary edge cases (encoding nulls and nested dictionaries)

2020-02-10 Thread Micah Kornfield
Hi Wes and Brian,

Thanks for the feedback.  My intent in raising these issues is that they
make the spec harder to work with/implement (i.e. we have existing bugs,
etc).  I'm wondering if we should take the opportunity to simplify before
things are set in stone. If we think things are already set, then I'm OK
with that as well.

Thanks,
Micah

On Mon, Feb 10, 2020 at 12:40 PM Wes McKinney  wrote:

>
>
> On Sun, Feb 9, 2020 at 12:53 AM Micah Kornfield 
> wrote:
> >
> > I'd like to understand if any one is making use of the following features
> > and if we should revisit them before 1.0.
> >
> > 1. Dictionaries can encode null values.
> > - This become error prone for things like parquet.  We seem to be
> > calculating the definition level solely based on the null bitmap.
> >
> > I might have missed something but it appears that we only check if a
> > dictionary contains nulls on the optimized path [1] but not when
> converting
> > the dictionary array back to dense, so I think the values written could
> get
> > out of sync with the rep/def levels?
> >
> > It seems we should potentially disallow dictionaries to contain null
> > values?
>
> Are you talking about the direct DictionaryArray encoding path?
>
> Since both nulls and duplicates are allowed in Arrow dictionaries, I think
> that Parquet will need to check for the null-in-dictionary case in the
> equivalent of ArrowColumnWriter::Write [1]. You could start by checking and
> erroring out.
>
> [1]:
> https://github.com/apache/arrow/blob/master/cpp/src/parquet/arrow/writer.cc#L324
>
> >
> > 2.  Dictionaries can nested columns which are in turn dictionary encoded
> > columns.
> >
> > - Again we aren't handling this in Parquet today, and I'm wondering if it
> > worth the effort.
> > There was a PR merged a while ago [2] to add a "skipped" integration test
> > but it doesn't look like anyone has done follow-up work to make enable
> > this/make it pass.
>
> I started looking at this the other day (on the integration tests), I'd
> like to get the C++ side fully working with integration tests.
>
> As far as Parquet it doesn't seem worth the effort to try to implement a
> faithful roundtrip. Since we are serializing the Arrow types into the file
> metadata we could at least cast back but perhaps lose the ordering.
>
> >
> > It seems simpler to keep dictionary encoding at the leafs of the schema.
> >
> > Of the two I'm a little more worried that Option #1 will break people if
> we
> > decide to disallow it.
>
> I think the ship has sailed on disallowing this at the format level.
>
> >
> > Thoughts?
> >
> > Thanks,
> > Micah
> >
> >
> > [1]
> >
> https://github.com/apache/arrow/blob/bd38beec033a2fdff192273df9b08f120e635b0c/cpp/src/parquet/encoding.cc#L765
> > [2] https://github.com/apache/arrow/pull/1848
>


Re: AttributeError importing pyarrow 0.16.0

2020-02-10 Thread Tom Augspurger
Thanks for linking to that. The Python there does seems problematic.
Upgrading to
TravisCI's "bionic" image (with Python 3.7.5 instead of 3.7.1) seems to
have fixed it.

Tom

On Mon, Feb 10, 2020 at 1:34 PM Wes McKinney  wrote:

> hi Tom,
>
> Looks like it could be https://bugs.python.org/issue32973, but I'm not
> sure. I wasn't able to reproduce locally. The Python version 3.7.1
> running in CI is also potentially suspicious.
>
> This class of error seems to have a lot of bug reports based on Google
> searches
>
> Message isn't picklable so we should probably fix that regardless
>
> https://issues.apache.org/jira/browse/ARROW-7826
>
> - Wes
>
> On Mon, Feb 10, 2020 at 12:17 PM Tom Augspurger
>  wrote:
> >
> > Hi all,
> >
> > I'm seeing a strange issue when importing pyarrow on the intake CI. I
> get an
> > exception saying
> >
> > AttributeError: type object 'pyarrow.lib.Message' has no attribute
> > '__reduce_cython__'
> >
> > The full traceback is:
> >
> > __ test_arrow_import
> > ___
> >
> > def test_arrow_import():
> >
> > >   import pyarrow
> >
> > intake/cli/server/tests/test_server.py:32:
> >
> > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> _ _
> > _ _
> >
> >
> ../../../virtualenv/python3.7.1/lib/python3.7/site-packages/pyarrow/__init__.py:49:
> > in 
> >
> > from pyarrow.lib import cpu_count, set_cpu_count
> >
> > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> _ _
> > _ _
> >
> > >   ???
> >
> > E   AttributeError: type object 'pyarrow.lib.Message' has no attribute
> > '__reduce_cython__'
> >
> > pyarrow/ipc.pxi:21: AttributeError
> >
> > _ TestServerV1Source.test_read_part_compressed
> > _
> >
> >
> > I'm unable to reproduce this locally, and was wondering if anyone else
> has
> > seen something similar.
> > Pyarrow was installed using pip / a wheel (
> > https://travis-ci.org/intake/intake/jobs/648523104#L311).
> >
> > A common cause of this error message is building with too old of a
> Cython.
> > While checking this, I noticed
> > that some of the files are generated with Cython 0.29.8, while others
> were
> > generated with 0.29.14.
> > I have no idea if this is a problem in general of if it's causing this
> > specific issue.
> >
> > ```
> > _hdfs.cpp:1:/* Generated by Cython 0.29.14 */
> > include/arrow/python/pyarrow_lib.h:20:/* Generated by Cython 0.29.8 */
> > include/arrow/python/pyarrow_api.h:21:/* Generated by Cython 0.29.8 */
> > _plasma.cpp:1:/* Generated by Cython 0.29.14 */
> > _fs.cpp:1:/* Generated by Cython 0.29.14 */
> > lib_api.h:1:/* Generated by Cython 0.29.14 */
> > gandiva.cpp:1:/* Generated by Cython 0.29.14 */
> > _json.cpp:1:/* Generated by Cython 0.29.14 */
> > _parquet.cpp:1:/* Generated by Cython 0.29.14 */
> > _csv.cpp:1:/* Generated by Cython 0.29.14 */
> > _compute.cpp:1:/* Generated by Cython 0.29.14 */
> > _dataset.cpp:1:/* Generated by Cython 0.29.14 */
> > _flight.cpp:1:/* Generated by Cython 0.29.14 */
> > lib.cpp:1:/* Generated by Cython 0.29.14 */
> > ```
> >
> > See the https://travis-ci.org/intake/intake/jobs/648523104 for the full
> log.
> >
> >
> > Thanks for any pointers!
>


Re: [Format] Dictionary edge cases (encoding nulls and nested dictionaries)

2020-02-10 Thread Wes McKinney
On Sun, Feb 9, 2020 at 12:53 AM Micah Kornfield 
wrote:
>
> I'd like to understand if any one is making use of the following features
> and if we should revisit them before 1.0.
>
> 1. Dictionaries can encode null values.
> - This become error prone for things like parquet.  We seem to be
> calculating the definition level solely based on the null bitmap.
>
> I might have missed something but it appears that we only check if a
> dictionary contains nulls on the optimized path [1] but not when
converting
> the dictionary array back to dense, so I think the values written could
get
> out of sync with the rep/def levels?
>
> It seems we should potentially disallow dictionaries to contain null
> values?

Are you talking about the direct DictionaryArray encoding path?

Since both nulls and duplicates are allowed in Arrow dictionaries, I think
that Parquet will need to check for the null-in-dictionary case in the
equivalent of ArrowColumnWriter::Write [1]. You could start by checking and
erroring out.

[1]:
https://github.com/apache/arrow/blob/master/cpp/src/parquet/arrow/writer.cc#L324

>
> 2.  Dictionaries can nested columns which are in turn dictionary encoded
> columns.
>
> - Again we aren't handling this in Parquet today, and I'm wondering if it
> worth the effort.
> There was a PR merged a while ago [2] to add a "skipped" integration test
> but it doesn't look like anyone has done follow-up work to make enable
> this/make it pass.

I started looking at this the other day (on the integration tests), I'd
like to get the C++ side fully working with integration tests.

As far as Parquet it doesn't seem worth the effort to try to implement a
faithful roundtrip. Since we are serializing the Arrow types into the file
metadata we could at least cast back but perhaps lose the ordering.

>
> It seems simpler to keep dictionary encoding at the leafs of the schema.
>
> Of the two I'm a little more worried that Option #1 will break people if
we
> decide to disallow it.

I think the ship has sailed on disallowing this at the format level.

>
> Thoughts?
>
> Thanks,
> Micah
>
>
> [1]
>
https://github.com/apache/arrow/blob/bd38beec033a2fdff192273df9b08f120e635b0c/cpp/src/parquet/encoding.cc#L765
> [2] https://github.com/apache/arrow/pull/1848


Re: AttributeError importing pyarrow 0.16.0

2020-02-10 Thread Wes McKinney
hi Tom,

Looks like it could be https://bugs.python.org/issue32973, but I'm not
sure. I wasn't able to reproduce locally. The Python version 3.7.1
running in CI is also potentially suspicious.

This class of error seems to have a lot of bug reports based on Google searches

Message isn't picklable so we should probably fix that regardless

https://issues.apache.org/jira/browse/ARROW-7826

- Wes

On Mon, Feb 10, 2020 at 12:17 PM Tom Augspurger
 wrote:
>
> Hi all,
>
> I'm seeing a strange issue when importing pyarrow on the intake CI. I get an
> exception saying
>
> AttributeError: type object 'pyarrow.lib.Message' has no attribute
> '__reduce_cython__'
>
> The full traceback is:
>
> __ test_arrow_import
> ___
>
> def test_arrow_import():
>
> >   import pyarrow
>
> intake/cli/server/tests/test_server.py:32:
>
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> _ _
>
> ../../../virtualenv/python3.7.1/lib/python3.7/site-packages/pyarrow/__init__.py:49:
> in 
>
> from pyarrow.lib import cpu_count, set_cpu_count
>
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> _ _
>
> >   ???
>
> E   AttributeError: type object 'pyarrow.lib.Message' has no attribute
> '__reduce_cython__'
>
> pyarrow/ipc.pxi:21: AttributeError
>
> _ TestServerV1Source.test_read_part_compressed
> _
>
>
> I'm unable to reproduce this locally, and was wondering if anyone else has
> seen something similar.
> Pyarrow was installed using pip / a wheel (
> https://travis-ci.org/intake/intake/jobs/648523104#L311).
>
> A common cause of this error message is building with too old of a Cython.
> While checking this, I noticed
> that some of the files are generated with Cython 0.29.8, while others were
> generated with 0.29.14.
> I have no idea if this is a problem in general of if it's causing this
> specific issue.
>
> ```
> _hdfs.cpp:1:/* Generated by Cython 0.29.14 */
> include/arrow/python/pyarrow_lib.h:20:/* Generated by Cython 0.29.8 */
> include/arrow/python/pyarrow_api.h:21:/* Generated by Cython 0.29.8 */
> _plasma.cpp:1:/* Generated by Cython 0.29.14 */
> _fs.cpp:1:/* Generated by Cython 0.29.14 */
> lib_api.h:1:/* Generated by Cython 0.29.14 */
> gandiva.cpp:1:/* Generated by Cython 0.29.14 */
> _json.cpp:1:/* Generated by Cython 0.29.14 */
> _parquet.cpp:1:/* Generated by Cython 0.29.14 */
> _csv.cpp:1:/* Generated by Cython 0.29.14 */
> _compute.cpp:1:/* Generated by Cython 0.29.14 */
> _dataset.cpp:1:/* Generated by Cython 0.29.14 */
> _flight.cpp:1:/* Generated by Cython 0.29.14 */
> lib.cpp:1:/* Generated by Cython 0.29.14 */
> ```
>
> See the https://travis-ci.org/intake/intake/jobs/648523104 for the full log.
>
>
> Thanks for any pointers!


[jira] [Created] (ARROW-7826) [Python] Implement __reduce__ for pyarrow.lib.Message

2020-02-10 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-7826:
---

 Summary: [Python] Implement __reduce__ for pyarrow.lib.Message
 Key: ARROW-7826
 URL: https://issues.apache.org/jira/browse/ARROW-7826
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Wes McKinney
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


AttributeError importing pyarrow 0.16.0

2020-02-10 Thread Tom Augspurger
Hi all,

I'm seeing a strange issue when importing pyarrow on the intake CI. I get an
exception saying

AttributeError: type object 'pyarrow.lib.Message' has no attribute
'__reduce_cython__'

The full traceback is:

__ test_arrow_import
___

def test_arrow_import():

>   import pyarrow

intake/cli/server/tests/test_server.py:32:

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
_ _

../../../virtualenv/python3.7.1/lib/python3.7/site-packages/pyarrow/__init__.py:49:
in 

from pyarrow.lib import cpu_count, set_cpu_count

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
_ _

>   ???

E   AttributeError: type object 'pyarrow.lib.Message' has no attribute
'__reduce_cython__'

pyarrow/ipc.pxi:21: AttributeError

_ TestServerV1Source.test_read_part_compressed
_


I'm unable to reproduce this locally, and was wondering if anyone else has
seen something similar.
Pyarrow was installed using pip / a wheel (
https://travis-ci.org/intake/intake/jobs/648523104#L311).

A common cause of this error message is building with too old of a Cython.
While checking this, I noticed
that some of the files are generated with Cython 0.29.8, while others were
generated with 0.29.14.
I have no idea if this is a problem in general of if it's causing this
specific issue.

```
_hdfs.cpp:1:/* Generated by Cython 0.29.14 */
include/arrow/python/pyarrow_lib.h:20:/* Generated by Cython 0.29.8 */
include/arrow/python/pyarrow_api.h:21:/* Generated by Cython 0.29.8 */
_plasma.cpp:1:/* Generated by Cython 0.29.14 */
_fs.cpp:1:/* Generated by Cython 0.29.14 */
lib_api.h:1:/* Generated by Cython 0.29.14 */
gandiva.cpp:1:/* Generated by Cython 0.29.14 */
_json.cpp:1:/* Generated by Cython 0.29.14 */
_parquet.cpp:1:/* Generated by Cython 0.29.14 */
_csv.cpp:1:/* Generated by Cython 0.29.14 */
_compute.cpp:1:/* Generated by Cython 0.29.14 */
_dataset.cpp:1:/* Generated by Cython 0.29.14 */
_flight.cpp:1:/* Generated by Cython 0.29.14 */
lib.cpp:1:/* Generated by Cython 0.29.14 */
```

See the https://travis-ci.org/intake/intake/jobs/648523104 for the full log.


Thanks for any pointers!


[jira] [Created] (ARROW-7825) Have arrow::read_parquet respect options(stringsAsFactors = FALSE)

2020-02-10 Thread Keith Hughitt (Jira)
Keith Hughitt created ARROW-7825:


 Summary: Have arrow::read_parquet respect options(stringsAsFactors 
= FALSE)
 Key: ARROW-7825
 URL: https://issues.apache.org/jira/browse/ARROW-7825
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Affects Versions: 0.16.0
 Environment: Linux 64-bit 5.4.15
Reporter: Keith Hughitt


Same issue as reported for feather::read_feather 
(https://issues.apache.org/jira/browse/ARROW-7823);

 

For the R arrow package, the "read_parquet()" function currently does not 
respect "options(stringsAsFactors = FALSE)", leading to unexpected/inconsistent 
behavior.

 

*Example:*

 

 
{code:java}
library(arrow)
library(readr)
options(stringsAsFactors = FALSE)
write_tsv(head(iris), 'test.tsv')
write_parquet(head(iris), 'test.parquet')
head(read.delim('test.tsv', sep='\t')$Species)
# [1] "setosa" "setosa" "setosa" "setosa" "setosa" "setosa"
head(read_tsv('test.tsv', col_types = cols())$Species)
# [1] "setosa" "setosa" "setosa" "setosa" "setosa" "setosa"
head(read_parquet('test.parquet')$Species)
# [1] setosa setosa setosa setosa setosa setosa
# Levels: setosa versicolor virginica
{code}
 

 

*Versions:*

- R 3.6.2

- arrow_0.15.1.9000



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7824) [C++][Dataset] Provide Dataset writing to IPC format

2020-02-10 Thread Ben Kietzman (Jira)
Ben Kietzman created ARROW-7824:
---

 Summary: [C++][Dataset] Provide Dataset writing to IPC format
 Key: ARROW-7824
 URL: https://issues.apache.org/jira/browse/ARROW-7824
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, C++ - Dataset
Affects Versions: 0.16.0
Reporter: Ben Kietzman
Assignee: Ben Kietzman
 Fix For: 1.0.0


Begin with writing to IPC format since it is simpler than parquet and to 
efficiently support the "locally cached extract" workflow.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7823) Have feather::read_feather respect options(stringsAsFactors = FALSE)

2020-02-10 Thread Keith Hughitt (Jira)
Keith Hughitt created ARROW-7823:


 Summary: Have feather::read_feather respect 
options(stringsAsFactors = FALSE)
 Key: ARROW-7823
 URL: https://issues.apache.org/jira/browse/ARROW-7823
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Affects Versions: 0.16.0
 Environment: Linux 64-bit 5.4.15
Reporter: Keith Hughitt


For consistency, it seems like it would be useful to have it behave in a 
similar manner to read.delim(), read_tsv(), etc...

*Ex:*

 
{code:java}
library(feather)
 library(tidyverse)
 options(stringsAsFactors = FALSE)write_tsv(head(iris), 'test.tsv')
 write_feather(head(iris), 'test.feather')head(read.delim('test.tsv', 
sep='\t')$Species)

[1] "setosa" "setosa" "setosa" "setosa" "setosa" "setosa"
head(read_tsv('test.tsv', col_types = cols())$Species)

[1] "setosa" "setosa" "setosa" "setosa" "setosa" "setosa"
head(read_feather('test.feather')$Species)

[1] setosa setosa setosa setosa setosa setosa Levels: setosa versicolor 
virginica{code}
 

Versions:
 * R 3.6.2
- feather 0.3.5

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7822) [C++] Allocation free error Status constants

2020-02-10 Thread Ben Kietzman (Jira)
Ben Kietzman created ARROW-7822:
---

 Summary: [C++] Allocation free error Status constants
 Key: ARROW-7822
 URL: https://issues.apache.org/jira/browse/ARROW-7822
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Ben Kietzman
Assignee: Ben Kietzman


{{Status::state_}} could be made a tagged pointer without affecting the fast 
path (passing around a non error status). The extra bit could be used to mark a 
Status' state as heap allocated or not, allowing very error statuses to be 
extremely cheap when their error state is known to be immutable. For example, 
this would allow a cheap default of {{Result<>::status_}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7821) [Gandiva] Add support for literal variables

2020-02-10 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7821:
-

 Summary: [Gandiva] Add support for literal variables
 Key: ARROW-7821
 URL: https://issues.apache.org/jira/browse/ARROW-7821
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: C++ - Gandiva
Reporter: Francois Saint-Jacques
 Fix For: 1.0.0


Gandiva supports static literal constants, but doesn't support runtime literal 
constants (or simply, variables). This means that queries like `x > 1` and `x > 
2` are compiled in separate operators. The goal would be to provide something 
like prepared statement for very simple expression, e.g. ` x > ?`. This way we 
can pre-generate operators for most basic comparison filters on every type.

I'm thinking that the variables should be stashed in the context pointer as 
opposed to a new function parameter. This would minimise the implementation 
impact.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7820) [C++][Gandiva] Add CMake support for compiling LLVM's IR into a library

2020-02-10 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7820:
-

 Summary: [C++][Gandiva] Add CMake support for compiling LLVM's IR 
into a library
 Key: ARROW-7820
 URL: https://issues.apache.org/jira/browse/ARROW-7820
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: C++
Reporter: Francois Saint-Jacques
 Fix For: 1.0.0


We should be able to inject LLVM IR into libraries, assuming that `llc` is 
found on the platform.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7819) [C++][Gandiva] Implement gandiva-dump-ir tool to output llvm IR to a file

2020-02-10 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7819:
-

 Summary: [C++][Gandiva] Implement gandiva-dump-ir tool to output 
llvm IR to a file
 Key: ARROW-7819
 URL: https://issues.apache.org/jira/browse/ARROW-7819
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: C++ - Gandiva
Reporter: Francois Saint-Jacques
 Fix For: 1.0.0


The tool should take a protobuf expression from stdin and dump the IR to 
stdout. This might require some though as the schema is not always known. It 
could mean a refactor to support plain array, especially for the Filter kernel.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7818) [C++][Gandiva] Generate Filter kernels from gandiva code at compile time

2020-02-10 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7818:
-

 Summary: [C++][Gandiva] Generate Filter kernels from gandiva code 
at compile time
 Key: ARROW-7818
 URL: https://issues.apache.org/jira/browse/ARROW-7818
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++, C++ - Gandiva
Reporter: Francois Saint-Jacques
Assignee: Francois Saint-Jacques
 Fix For: 1.0.0


The goal of this feature is to support generating kernels at compile time (and 
possibly runtime if gandiva is linked) to avoid rewriting C++ kernels that 
gandiva knows how to compile. The generated kernels would be linked in the 
compute module. 

This is an experimental task that will guide future development, notably 
implementing aggregate kernels in gandiva once instead both C++ and gandiva 
implementations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] Release Apache Arrow 0.16.0 - RC2

2020-02-10 Thread Neal Richardson
We also have the release blog post to finish up and publish:
https://github.com/apache/arrow-site/pull/41

On Mon, Feb 10, 2020 at 7:32 AM Krisztián Szűcs 
wrote:

> Updated checklist:
>
> - [DONE] marking the released version as "RELEASED" on JIRA
> - [DONE] rebase the master branch on top of the release branch
> - [DONE] rebase the pull requests
> - [DONE] uploading source release artifacts to SVN
> - [DONE] uploading binary release artifacts to Bintray
> - [DONE] updating the Arrow website
> - [DONE] python source distribution
> - [DONE] ruby gems
> - [DONE] javascript npm packages
> - [DONE] .NET nuget packages
> - [DONE] java maven artifacts
> - [DONE] conda packages
> - [DONE] rust packages
> - [DONE] R packages
> - [DONE] homebrew packages
> - [DONE] announcing release
> - [kszucs] python wheels
>   Still waiting for PyPA's response.
> - [kszucs] updating website with new API documentation
>   In-progress.
>
> On Mon, Feb 10, 2020 at 4:27 PM Krisztián Szűcs
>  wrote:
> >
> > Indeed, it is here
> >
> https://lists.apache.org/thread.html/rf3a0840c152d7eeefb6862c3a54f986595f88b439b0c82780d15f998%40%3Cdev.arrow.apache.org%3E
> >
> > On Mon, Feb 10, 2020 at 4:25 PM Francois Saint-Jacques
> >  wrote:
> > >
> > > I received it.
> > >
> > > On Mon, Feb 10, 2020 at 10:24 AM Krisztián Szűcs
> > >  wrote:
> > > >
> > > > I've already announced the release 4 hours ago with my @apache
> address.
> > > > Although I cannot see it in the archive [1].
> > > >
> > > > I suppose it didn't go through, no clue why.
> > > >
> > > > [1] https://lists.apache.org/list.html?dev@arrow.apache.org
> > > >
> > > >
> > > > On Mon, Feb 10, 2020 at 3:57 PM Wes McKinney 
> wrote:
> > > > >
> > > > > Can we announce the release? Let me know if I can help with
> anything
> > > > >
> > > > > On Sun, Feb 9, 2020 at 6:23 PM Sutou Kouhei 
> wrote:
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > MSYS2 package is updated:
> > > > > > https://github.com/msys2/MINGW-packages/pull/6175
> > > > > >
> > > > > >
> > > > > > Thanks,
> > > > > > --
> > > > > > kou
> > > > > >
> > > > > > In <
> caocv4hjxyy_yukdzgrayf3bce65txmljjkdodz+f2zpk5op...@mail.gmail.com>
> > > > > >   "Re: [VOTE] Release Apache Arrow 0.16.0 - RC2" on Sun, 9 Feb
> 2020 09:06:39 -0800,
> > > > > >   Neal Richardson  wrote:
> > > > > >
> > > > > > > R package 0.16.0 has been accepted by CRAN; may take a few
> more days for
> > > > > > > CRAN to build Windows and macOS binaries.
> > > > > > >
> > > > > > > Neal
> > > > > > >
> > > > > > > On Fri, Feb 7, 2020 at 4:50 PM Neal Richardson <
> neal.p.richard...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > >> Homebrew PR is up:
> https://github.com/Homebrew/homebrew-core/pull/49908
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >> On Fri, Feb 7, 2020 at 3:44 PM Neal Richardson <
> > > > > > >> neal.p.richard...@gmail.com> wrote:
> > > > > > >>
> > > > > > >>> I'm working (with Jeroen) on the R package stuff, and I'll
> put together
> > > > > > >>> the homebrew formula PR now.
> > > > > > >>>
> > > > > > >>> Neal
> > > > > > >>>
> > > > > > >>> On Fri, Feb 7, 2020 at 6:42 AM Andy Grove <
> andygrov...@gmail.com> wrote:
> > > > > > >>>
> > > > > >  Rust crates are published. Filed
> > > > > >  https://issues.apache.org/jira/browse/ARROW-7794 for the
> arrow-flight
> > > > > >  issue.
> > > > > > 
> > > > > >  On Fri, Feb 7, 2020 at 7:05 AM Andy Grove <
> andygrov...@gmail.com> wrote:
> > > > > > 
> > > > > >  > I'll look at the Rust issue now.
> > > > > >  >
> > > > > >  > On Fri, Feb 7, 2020 at 3:27 AM Krisztián Szűcs <
> > > > > >  szucs.kriszt...@gmail.com>
> > > > > >  > wrote:
> > > > > >  >
> > > > > >  >> Status of the post release tasks:
> > > > > >  >>
> > > > > >  >> - [DONE] marking the released version as "RELEASED" on
> JIRA
> > > > > >  >> - [DONE] rebase the master branch on top of the release
> branch
> > > > > >  >>   We have failing builds on the master because of the
> bumped version
> > > > > >  >> numbers, those must be fixed.
> > > > > >  >> - [DONE] rebase the pull requests
> > > > > >  >> - [DONE] uploading source release artifacts to SVN
> > > > > >  >> - [DONE] uploading binary release artifacts to Bintray
> > > > > >  >> - [DONE] updating the Arrow website
> > > > > >  >> - [DONE] python source distribution
> > > > > >  >> - [DONE] ruby gems
> > > > > >  >> - [DONE] javascript npm packages
> > > > > >  >> - [DONE] .NET nuget packages
> > > > > >  >> - [DONE] java maven artifacts
> > > > > >  >>I had to re-stage the maven artifacts
> > > > > >  >> - [kszucs] python wheels
> > > > > >  >>   Partially done, the manylinux2010 and manylinux2014
> wheels have hit
> > > > > >  >> the 60MB limit of pypi, requested to increase it
> > > > > >  >> https://github.com/pypa/pypi-support/issues/186
> > > > > >  >> - [xhochy] conda packages
> > > > > >  >>   

Re: Arrow Datasets Functionality for Python

2020-02-10 Thread Wes McKinney
I will add that I'm interested in being involved with developing high
level Python interfaces to all of this functionality (e.g. using Ibis
[1]). It would be worth prototyping at least a datasets interface
layer for efficient data selection (predicate pushdown + filtering)
and then expanding to support more analytic operations as they are
implemented and available in pyarrow. There's just a lot of work to do
and at the moment not a lot of people to do it. Hopefully more
organizations will sponsor part- or full-time developers to get
involved in Apache Arrow development and help with maintenance and
feature development -- this is a challenging project to contribute to
on nights/weekends.

[1]: https://github.com/ibis-project/ibis

On Mon, Feb 10, 2020 at 8:34 AM Francois Saint-Jacques
 wrote:
>
> Hello Matthew,
>
> The dplyr binding is just syntactic sugar on top of the dataset API.
> There's no analytics capabilities yet [1], other than the select and
> the limited projection supported by the dataset API. It looks like it
> is doing analytics due to properly placed `collect()` calls, which
> converts from Arrow's stream of RecordBatch to R internal data frames.
> The analytic work is done by R. The same functionality exists under
> python, you invoke the dataset scan and then pass the result to
> pandas.
>
> In 2020 [2], we are actively working toward an analytic engine, with
> bindings for R *and* Python. Within this engine, we have physical
> operators, or compute kernels, that can be seen as functions that
> takes a stream of RecordBatch and yields a new stream of RecordBatch.
> The dataset API is the Scan physical operators, i.e. it materialize a
> stream of RecordBatch from files or other sources. Gandiva is a
> compiler that generates the Filter and Project physical operators.
> Think of gandiva as a physical operator factory, you give it a
> predicate (or multiple expression in the case of projection) and it
> gives you back a function pointer that knows how to evaluate this
> predicate (expressions) on a RecordBatch and yields a RecordBatch.
> There still needs to be a coordinator on top of both that "plugs"
> them, i.e. the execution engine.
>
> Hope this helps,
> François
>
> [1] 
> https://github.com/apache/arrow/blob/6600a39ffe149971afd5ad3c78c2b538cdc03cfd/r/R/dplyr.R#L255-L322
> [2] https://ursalabs.org/blog/2020-outlook/
>
>
>
> On Sun, Feb 9, 2020 at 11:24 PM Matthew Turner
>  wrote:
> >
> > Hi Wes / Arrow Dev Team,
> >
> > Following up on our brief twitter 
> > convo on the 
> > Datasets functionality in R / Python.
> >
> > To provide context to others, you had mentioned that the API in python / 
> > pyarrow was more developer centric and intended for users to consume it 
> > through higher level interfaces(i.e. IBIS).  This was in comparison to 
> > dplyr which from your demo had some nice analytic capabilities on top of 
> > Arrow Datasets.
> >
> > Seeing that demonstration made me interested to see similar Arrow Datasets 
> > functionality within Python.  But it doesn't seem that is an intended 
> > capability for pyarrow which I do generally understand.  However, I was 
> > trying to understand how Gandiva ties into the Arrow project as I 
> > understand that to be an analytic engine of sorts (maybe im 
> > misunderstanding).  I saw this 
> > implementation of Gandiva with pandas which was quite interesting and was 
> > wondering if this is the strategic goal - to have Gandiva be a component of 
> > third party tools who use arrow or if Gandiva would eventually be more of a 
> > core analytic component of Arrow.
> >
> > Extending on this I hoping to get the teams view on what they see as the 
> > likely home of dplyr datasets type functionality within the python 
> > ecosystem (i.e. IBIS or something else).
> >
> > Thanks to all for your work on the project!
> >
> > Best,
> >
> > Matthew M. Turner
> > Email: matthew.m.tur...@outlook.com
> > Phone: (908)-868-2786
> >


Re: [VOTE] Release Apache Arrow 0.16.0 - RC2

2020-02-10 Thread Krisztián Szűcs
Updated checklist:

- [DONE] marking the released version as "RELEASED" on JIRA
- [DONE] rebase the master branch on top of the release branch
- [DONE] rebase the pull requests
- [DONE] uploading source release artifacts to SVN
- [DONE] uploading binary release artifacts to Bintray
- [DONE] updating the Arrow website
- [DONE] python source distribution
- [DONE] ruby gems
- [DONE] javascript npm packages
- [DONE] .NET nuget packages
- [DONE] java maven artifacts
- [DONE] conda packages
- [DONE] rust packages
- [DONE] R packages
- [DONE] homebrew packages
- [DONE] announcing release
- [kszucs] python wheels
  Still waiting for PyPA's response.
- [kszucs] updating website with new API documentation
  In-progress.

On Mon, Feb 10, 2020 at 4:27 PM Krisztián Szűcs
 wrote:
>
> Indeed, it is here
> https://lists.apache.org/thread.html/rf3a0840c152d7eeefb6862c3a54f986595f88b439b0c82780d15f998%40%3Cdev.arrow.apache.org%3E
>
> On Mon, Feb 10, 2020 at 4:25 PM Francois Saint-Jacques
>  wrote:
> >
> > I received it.
> >
> > On Mon, Feb 10, 2020 at 10:24 AM Krisztián Szűcs
> >  wrote:
> > >
> > > I've already announced the release 4 hours ago with my @apache address.
> > > Although I cannot see it in the archive [1].
> > >
> > > I suppose it didn't go through, no clue why.
> > >
> > > [1] https://lists.apache.org/list.html?dev@arrow.apache.org
> > >
> > >
> > > On Mon, Feb 10, 2020 at 3:57 PM Wes McKinney  wrote:
> > > >
> > > > Can we announce the release? Let me know if I can help with anything
> > > >
> > > > On Sun, Feb 9, 2020 at 6:23 PM Sutou Kouhei  wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > MSYS2 package is updated:
> > > > > https://github.com/msys2/MINGW-packages/pull/6175
> > > > >
> > > > >
> > > > > Thanks,
> > > > > --
> > > > > kou
> > > > >
> > > > > In 
> > > > > 
> > > > >   "Re: [VOTE] Release Apache Arrow 0.16.0 - RC2" on Sun, 9 Feb 2020 
> > > > > 09:06:39 -0800,
> > > > >   Neal Richardson  wrote:
> > > > >
> > > > > > R package 0.16.0 has been accepted by CRAN; may take a few more 
> > > > > > days for
> > > > > > CRAN to build Windows and macOS binaries.
> > > > > >
> > > > > > Neal
> > > > > >
> > > > > > On Fri, Feb 7, 2020 at 4:50 PM Neal Richardson 
> > > > > > 
> > > > > > wrote:
> > > > > >
> > > > > >> Homebrew PR is up: 
> > > > > >> https://github.com/Homebrew/homebrew-core/pull/49908
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> On Fri, Feb 7, 2020 at 3:44 PM Neal Richardson <
> > > > > >> neal.p.richard...@gmail.com> wrote:
> > > > > >>
> > > > > >>> I'm working (with Jeroen) on the R package stuff, and I'll put 
> > > > > >>> together
> > > > > >>> the homebrew formula PR now.
> > > > > >>>
> > > > > >>> Neal
> > > > > >>>
> > > > > >>> On Fri, Feb 7, 2020 at 6:42 AM Andy Grove  
> > > > > >>> wrote:
> > > > > >>>
> > > > >  Rust crates are published. Filed
> > > > >  https://issues.apache.org/jira/browse/ARROW-7794 for the 
> > > > >  arrow-flight
> > > > >  issue.
> > > > > 
> > > > >  On Fri, Feb 7, 2020 at 7:05 AM Andy Grove 
> > > > >   wrote:
> > > > > 
> > > > >  > I'll look at the Rust issue now.
> > > > >  >
> > > > >  > On Fri, Feb 7, 2020 at 3:27 AM Krisztián Szűcs <
> > > > >  szucs.kriszt...@gmail.com>
> > > > >  > wrote:
> > > > >  >
> > > > >  >> Status of the post release tasks:
> > > > >  >>
> > > > >  >> - [DONE] marking the released version as "RELEASED" on JIRA
> > > > >  >> - [DONE] rebase the master branch on top of the release branch
> > > > >  >>   We have failing builds on the master because of the bumped 
> > > > >  >> version
> > > > >  >> numbers, those must be fixed.
> > > > >  >> - [DONE] rebase the pull requests
> > > > >  >> - [DONE] uploading source release artifacts to SVN
> > > > >  >> - [DONE] uploading binary release artifacts to Bintray
> > > > >  >> - [DONE] updating the Arrow website
> > > > >  >> - [DONE] python source distribution
> > > > >  >> - [DONE] ruby gems
> > > > >  >> - [DONE] javascript npm packages
> > > > >  >> - [DONE] .NET nuget packages
> > > > >  >> - [DONE] java maven artifacts
> > > > >  >>I had to re-stage the maven artifacts
> > > > >  >> - [kszucs] python wheels
> > > > >  >>   Partially done, the manylinux2010 and manylinux2014 wheels 
> > > > >  >> have hit
> > > > >  >> the 60MB limit of pypi, requested to increase it
> > > > >  >> https://github.com/pypa/pypi-support/issues/186
> > > > >  >> - [xhochy] conda packages
> > > > >  >>   Uwe works on it, arrow-cpp is already merged.
> > > > >  >> - [andygrove] rust packages
> > > > >  >>Arrow-flight misses the format directory, didn't dig 
> > > > >  >> deeper. Andy
> > > > >  >> could you take a look?
> > > > >  >> - [nealrichardson] R packages
> > > > >  >>   Neal should handle this.
> > > > >  >> - [ ] homebrew packages
> > > > >  >> - [ ] updating website with new API 

Re: [VOTE] Release Apache Arrow 0.16.0 - RC2

2020-02-10 Thread Krisztián Szűcs
Indeed, it is here
https://lists.apache.org/thread.html/rf3a0840c152d7eeefb6862c3a54f986595f88b439b0c82780d15f998%40%3Cdev.arrow.apache.org%3E

On Mon, Feb 10, 2020 at 4:25 PM Francois Saint-Jacques
 wrote:
>
> I received it.
>
> On Mon, Feb 10, 2020 at 10:24 AM Krisztián Szűcs
>  wrote:
> >
> > I've already announced the release 4 hours ago with my @apache address.
> > Although I cannot see it in the archive [1].
> >
> > I suppose it didn't go through, no clue why.
> >
> > [1] https://lists.apache.org/list.html?dev@arrow.apache.org
> >
> >
> > On Mon, Feb 10, 2020 at 3:57 PM Wes McKinney  wrote:
> > >
> > > Can we announce the release? Let me know if I can help with anything
> > >
> > > On Sun, Feb 9, 2020 at 6:23 PM Sutou Kouhei  wrote:
> > > >
> > > > Hi,
> > > >
> > > > MSYS2 package is updated:
> > > > https://github.com/msys2/MINGW-packages/pull/6175
> > > >
> > > >
> > > > Thanks,
> > > > --
> > > > kou
> > > >
> > > > In 
> > > >   "Re: [VOTE] Release Apache Arrow 0.16.0 - RC2" on Sun, 9 Feb 2020 
> > > > 09:06:39 -0800,
> > > >   Neal Richardson  wrote:
> > > >
> > > > > R package 0.16.0 has been accepted by CRAN; may take a few more days 
> > > > > for
> > > > > CRAN to build Windows and macOS binaries.
> > > > >
> > > > > Neal
> > > > >
> > > > > On Fri, Feb 7, 2020 at 4:50 PM Neal Richardson 
> > > > > 
> > > > > wrote:
> > > > >
> > > > >> Homebrew PR is up: 
> > > > >> https://github.com/Homebrew/homebrew-core/pull/49908
> > > > >>
> > > > >>
> > > > >>
> > > > >> On Fri, Feb 7, 2020 at 3:44 PM Neal Richardson <
> > > > >> neal.p.richard...@gmail.com> wrote:
> > > > >>
> > > > >>> I'm working (with Jeroen) on the R package stuff, and I'll put 
> > > > >>> together
> > > > >>> the homebrew formula PR now.
> > > > >>>
> > > > >>> Neal
> > > > >>>
> > > > >>> On Fri, Feb 7, 2020 at 6:42 AM Andy Grove  
> > > > >>> wrote:
> > > > >>>
> > > >  Rust crates are published. Filed
> > > >  https://issues.apache.org/jira/browse/ARROW-7794 for the 
> > > >  arrow-flight
> > > >  issue.
> > > > 
> > > >  On Fri, Feb 7, 2020 at 7:05 AM Andy Grove  
> > > >  wrote:
> > > > 
> > > >  > I'll look at the Rust issue now.
> > > >  >
> > > >  > On Fri, Feb 7, 2020 at 3:27 AM Krisztián Szűcs <
> > > >  szucs.kriszt...@gmail.com>
> > > >  > wrote:
> > > >  >
> > > >  >> Status of the post release tasks:
> > > >  >>
> > > >  >> - [DONE] marking the released version as "RELEASED" on JIRA
> > > >  >> - [DONE] rebase the master branch on top of the release branch
> > > >  >>   We have failing builds on the master because of the bumped 
> > > >  >> version
> > > >  >> numbers, those must be fixed.
> > > >  >> - [DONE] rebase the pull requests
> > > >  >> - [DONE] uploading source release artifacts to SVN
> > > >  >> - [DONE] uploading binary release artifacts to Bintray
> > > >  >> - [DONE] updating the Arrow website
> > > >  >> - [DONE] python source distribution
> > > >  >> - [DONE] ruby gems
> > > >  >> - [DONE] javascript npm packages
> > > >  >> - [DONE] .NET nuget packages
> > > >  >> - [DONE] java maven artifacts
> > > >  >>I had to re-stage the maven artifacts
> > > >  >> - [kszucs] python wheels
> > > >  >>   Partially done, the manylinux2010 and manylinux2014 wheels 
> > > >  >> have hit
> > > >  >> the 60MB limit of pypi, requested to increase it
> > > >  >> https://github.com/pypa/pypi-support/issues/186
> > > >  >> - [xhochy] conda packages
> > > >  >>   Uwe works on it, arrow-cpp is already merged.
> > > >  >> - [andygrove] rust packages
> > > >  >>Arrow-flight misses the format directory, didn't dig deeper. 
> > > >  >> Andy
> > > >  >> could you take a look?
> > > >  >> - [nealrichardson] R packages
> > > >  >>   Neal should handle this.
> > > >  >> - [ ] homebrew packages
> > > >  >> - [ ] updating website with new API documentation
> > > >  >> - [ ] announcing release
> > > >  >>
> > > >  >> On Fri, Feb 7, 2020 at 8:18 AM Krisztián Szűcs
> > > >  >>  wrote:
> > > >  >> >
> > > >  >> > The VOTE carries with 4 binding +1 votes, 3 non-binding +1 
> > > >  >> > votes and
> > > >  >> > one binding +0 vote.
> > > >  >> >
> > > >  >> > I'm starting the post-release tasks, if anyone wants to help 
> > > >  >> > please
> > > >  let
> > > >  >> me know.
> > > >  >> >
> > > >  >> > On Fri, Feb 7, 2020 at 12:25 AM Krisztián Szűcs
> > > >  >> >  wrote:
> > > >  >> > >
> > > >  >> > > So far we have the following votes:
> > > >  >> > >
> > > >  >> > > +0 (binding)
> > > >  >> > > +1 (binding)
> > > >  >> > > +1 (non-binding)
> > > >  >> > > +1 (binding)
> > > >  >> > > +1 (non-binding)
> > > >  >> > > +1 (binding)
> > > >  >> > > +1 (non-binding)
> > > >  >> > > +1 (binding)
> > > >  >> > > 
> > > >  

Re: [VOTE] Release Apache Arrow 0.16.0 - RC2

2020-02-10 Thread Francois Saint-Jacques
I received it.

On Mon, Feb 10, 2020 at 10:24 AM Krisztián Szűcs
 wrote:
>
> I've already announced the release 4 hours ago with my @apache address.
> Although I cannot see it in the archive [1].
>
> I suppose it didn't go through, no clue why.
>
> [1] https://lists.apache.org/list.html?dev@arrow.apache.org
>
>
> On Mon, Feb 10, 2020 at 3:57 PM Wes McKinney  wrote:
> >
> > Can we announce the release? Let me know if I can help with anything
> >
> > On Sun, Feb 9, 2020 at 6:23 PM Sutou Kouhei  wrote:
> > >
> > > Hi,
> > >
> > > MSYS2 package is updated:
> > > https://github.com/msys2/MINGW-packages/pull/6175
> > >
> > >
> > > Thanks,
> > > --
> > > kou
> > >
> > > In 
> > >   "Re: [VOTE] Release Apache Arrow 0.16.0 - RC2" on Sun, 9 Feb 2020 
> > > 09:06:39 -0800,
> > >   Neal Richardson  wrote:
> > >
> > > > R package 0.16.0 has been accepted by CRAN; may take a few more days for
> > > > CRAN to build Windows and macOS binaries.
> > > >
> > > > Neal
> > > >
> > > > On Fri, Feb 7, 2020 at 4:50 PM Neal Richardson 
> > > > 
> > > > wrote:
> > > >
> > > >> Homebrew PR is up: https://github.com/Homebrew/homebrew-core/pull/49908
> > > >>
> > > >>
> > > >>
> > > >> On Fri, Feb 7, 2020 at 3:44 PM Neal Richardson <
> > > >> neal.p.richard...@gmail.com> wrote:
> > > >>
> > > >>> I'm working (with Jeroen) on the R package stuff, and I'll put 
> > > >>> together
> > > >>> the homebrew formula PR now.
> > > >>>
> > > >>> Neal
> > > >>>
> > > >>> On Fri, Feb 7, 2020 at 6:42 AM Andy Grove  
> > > >>> wrote:
> > > >>>
> > >  Rust crates are published. Filed
> > >  https://issues.apache.org/jira/browse/ARROW-7794 for the arrow-flight
> > >  issue.
> > > 
> > >  On Fri, Feb 7, 2020 at 7:05 AM Andy Grove  
> > >  wrote:
> > > 
> > >  > I'll look at the Rust issue now.
> > >  >
> > >  > On Fri, Feb 7, 2020 at 3:27 AM Krisztián Szűcs <
> > >  szucs.kriszt...@gmail.com>
> > >  > wrote:
> > >  >
> > >  >> Status of the post release tasks:
> > >  >>
> > >  >> - [DONE] marking the released version as "RELEASED" on JIRA
> > >  >> - [DONE] rebase the master branch on top of the release branch
> > >  >>   We have failing builds on the master because of the bumped 
> > >  >> version
> > >  >> numbers, those must be fixed.
> > >  >> - [DONE] rebase the pull requests
> > >  >> - [DONE] uploading source release artifacts to SVN
> > >  >> - [DONE] uploading binary release artifacts to Bintray
> > >  >> - [DONE] updating the Arrow website
> > >  >> - [DONE] python source distribution
> > >  >> - [DONE] ruby gems
> > >  >> - [DONE] javascript npm packages
> > >  >> - [DONE] .NET nuget packages
> > >  >> - [DONE] java maven artifacts
> > >  >>I had to re-stage the maven artifacts
> > >  >> - [kszucs] python wheels
> > >  >>   Partially done, the manylinux2010 and manylinux2014 wheels have 
> > >  >> hit
> > >  >> the 60MB limit of pypi, requested to increase it
> > >  >> https://github.com/pypa/pypi-support/issues/186
> > >  >> - [xhochy] conda packages
> > >  >>   Uwe works on it, arrow-cpp is already merged.
> > >  >> - [andygrove] rust packages
> > >  >>Arrow-flight misses the format directory, didn't dig deeper. 
> > >  >> Andy
> > >  >> could you take a look?
> > >  >> - [nealrichardson] R packages
> > >  >>   Neal should handle this.
> > >  >> - [ ] homebrew packages
> > >  >> - [ ] updating website with new API documentation
> > >  >> - [ ] announcing release
> > >  >>
> > >  >> On Fri, Feb 7, 2020 at 8:18 AM Krisztián Szűcs
> > >  >>  wrote:
> > >  >> >
> > >  >> > The VOTE carries with 4 binding +1 votes, 3 non-binding +1 
> > >  >> > votes and
> > >  >> > one binding +0 vote.
> > >  >> >
> > >  >> > I'm starting the post-release tasks, if anyone wants to help 
> > >  >> > please
> > >  let
> > >  >> me know.
> > >  >> >
> > >  >> > On Fri, Feb 7, 2020 at 12:25 AM Krisztián Szűcs
> > >  >> >  wrote:
> > >  >> > >
> > >  >> > > So far we have the following votes:
> > >  >> > >
> > >  >> > > +0 (binding)
> > >  >> > > +1 (binding)
> > >  >> > > +1 (non-binding)
> > >  >> > > +1 (binding)
> > >  >> > > +1 (non-binding)
> > >  >> > > +1 (binding)
> > >  >> > > +1 (non-binding)
> > >  >> > > +1 (binding)
> > >  >> > > 
> > >  >> > > 4 +1 (binding)
> > >  >> > > 3 +1 (non-binding)
> > >  >> > >
> > >  >> > > I'm waiting for votes until tomorrow morning (UTC), then I'm
> > >  closing
> > >  >> the VOTE.
> > >  >> > >
> > >  >> > > Thanks everyone!
> > >  >> > >
> > >  >> > > - Krisztian
> > >  >> > >
> > >  >> > > On Fri, Feb 7, 2020 at 12:06 AM Krisztián Szűcs
> > >  >> > >  wrote:
> > >  >> > > >
> > >  >> > > > Testing on macOS Catalina
> > >  >> > > >
> > >  >> > 

Re: [VOTE] Release Apache Arrow 0.16.0 - RC2

2020-02-10 Thread Krisztián Szűcs
I've already announced the release 4 hours ago with my @apache address.
Although I cannot see it in the archive [1].

I suppose it didn't go through, no clue why.

[1] https://lists.apache.org/list.html?dev@arrow.apache.org


On Mon, Feb 10, 2020 at 3:57 PM Wes McKinney  wrote:
>
> Can we announce the release? Let me know if I can help with anything
>
> On Sun, Feb 9, 2020 at 6:23 PM Sutou Kouhei  wrote:
> >
> > Hi,
> >
> > MSYS2 package is updated:
> > https://github.com/msys2/MINGW-packages/pull/6175
> >
> >
> > Thanks,
> > --
> > kou
> >
> > In 
> >   "Re: [VOTE] Release Apache Arrow 0.16.0 - RC2" on Sun, 9 Feb 2020 
> > 09:06:39 -0800,
> >   Neal Richardson  wrote:
> >
> > > R package 0.16.0 has been accepted by CRAN; may take a few more days for
> > > CRAN to build Windows and macOS binaries.
> > >
> > > Neal
> > >
> > > On Fri, Feb 7, 2020 at 4:50 PM Neal Richardson 
> > > 
> > > wrote:
> > >
> > >> Homebrew PR is up: https://github.com/Homebrew/homebrew-core/pull/49908
> > >>
> > >>
> > >>
> > >> On Fri, Feb 7, 2020 at 3:44 PM Neal Richardson <
> > >> neal.p.richard...@gmail.com> wrote:
> > >>
> > >>> I'm working (with Jeroen) on the R package stuff, and I'll put together
> > >>> the homebrew formula PR now.
> > >>>
> > >>> Neal
> > >>>
> > >>> On Fri, Feb 7, 2020 at 6:42 AM Andy Grove  wrote:
> > >>>
> >  Rust crates are published. Filed
> >  https://issues.apache.org/jira/browse/ARROW-7794 for the arrow-flight
> >  issue.
> > 
> >  On Fri, Feb 7, 2020 at 7:05 AM Andy Grove  
> >  wrote:
> > 
> >  > I'll look at the Rust issue now.
> >  >
> >  > On Fri, Feb 7, 2020 at 3:27 AM Krisztián Szűcs <
> >  szucs.kriszt...@gmail.com>
> >  > wrote:
> >  >
> >  >> Status of the post release tasks:
> >  >>
> >  >> - [DONE] marking the released version as "RELEASED" on JIRA
> >  >> - [DONE] rebase the master branch on top of the release branch
> >  >>   We have failing builds on the master because of the bumped version
> >  >> numbers, those must be fixed.
> >  >> - [DONE] rebase the pull requests
> >  >> - [DONE] uploading source release artifacts to SVN
> >  >> - [DONE] uploading binary release artifacts to Bintray
> >  >> - [DONE] updating the Arrow website
> >  >> - [DONE] python source distribution
> >  >> - [DONE] ruby gems
> >  >> - [DONE] javascript npm packages
> >  >> - [DONE] .NET nuget packages
> >  >> - [DONE] java maven artifacts
> >  >>I had to re-stage the maven artifacts
> >  >> - [kszucs] python wheels
> >  >>   Partially done, the manylinux2010 and manylinux2014 wheels have 
> >  >> hit
> >  >> the 60MB limit of pypi, requested to increase it
> >  >> https://github.com/pypa/pypi-support/issues/186
> >  >> - [xhochy] conda packages
> >  >>   Uwe works on it, arrow-cpp is already merged.
> >  >> - [andygrove] rust packages
> >  >>Arrow-flight misses the format directory, didn't dig deeper. Andy
> >  >> could you take a look?
> >  >> - [nealrichardson] R packages
> >  >>   Neal should handle this.
> >  >> - [ ] homebrew packages
> >  >> - [ ] updating website with new API documentation
> >  >> - [ ] announcing release
> >  >>
> >  >> On Fri, Feb 7, 2020 at 8:18 AM Krisztián Szűcs
> >  >>  wrote:
> >  >> >
> >  >> > The VOTE carries with 4 binding +1 votes, 3 non-binding +1 votes 
> >  >> > and
> >  >> > one binding +0 vote.
> >  >> >
> >  >> > I'm starting the post-release tasks, if anyone wants to help 
> >  >> > please
> >  let
> >  >> me know.
> >  >> >
> >  >> > On Fri, Feb 7, 2020 at 12:25 AM Krisztián Szűcs
> >  >> >  wrote:
> >  >> > >
> >  >> > > So far we have the following votes:
> >  >> > >
> >  >> > > +0 (binding)
> >  >> > > +1 (binding)
> >  >> > > +1 (non-binding)
> >  >> > > +1 (binding)
> >  >> > > +1 (non-binding)
> >  >> > > +1 (binding)
> >  >> > > +1 (non-binding)
> >  >> > > +1 (binding)
> >  >> > > 
> >  >> > > 4 +1 (binding)
> >  >> > > 3 +1 (non-binding)
> >  >> > >
> >  >> > > I'm waiting for votes until tomorrow morning (UTC), then I'm
> >  closing
> >  >> the VOTE.
> >  >> > >
> >  >> > > Thanks everyone!
> >  >> > >
> >  >> > > - Krisztian
> >  >> > >
> >  >> > > On Fri, Feb 7, 2020 at 12:06 AM Krisztián Szűcs
> >  >> > >  wrote:
> >  >> > > >
> >  >> > > > Testing on macOS Catalina
> >  >> > > >
> >  >> > > > Binaries: OK
> >  >> > > >
> >  >> > > > Wheels: OK
> >  >> > > > Verified on macOS and on Linux.
> >  >> > > > On linux the verification script has failed for python 3.5 and
> >  >> manylinux2010
> >  >> > > > and manylinux2014 with unsupported platform tag. I've manually
> >  >> checked
> >  >> > > > these wheels in the python:3.5 docker image, and the wheels 
> 

[jira] [Created] (ARROW-7817) [Integration] macOS R autobrew nightly fails

2020-02-10 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-7817:
--

 Summary: [Integration] macOS R autobrew nightly fails
 Key: ARROW-7817
 URL: https://issues.apache.org/jira/browse/ARROW-7817
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Packaging
Reporter: Krisztian Szucs


Failing build: https://travis-ci.org/ursa-labs/crossbow/builds/648308138
Probably an OSX SDK issue.

cc [~npr]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [NIGHTLY] Arrow Build Report for Job nightly-2020-02-10-0

2020-02-10 Thread Krisztián Szűcs
On Mon, Feb 10, 2020 at 2:47 PM Crossbow  wrote:
>
>
> Arrow Build Report for Job nightly-2020-02-10-0
>
> All tasks: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-10-0
>
> Failed Tasks:
> - macos-r-autobrew:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-10-0-travis-macos-r-autobrew
Valid, created jira https://issues.apache.org/jira/browse/ARROW-7817
> - test-conda-python-3.7-turbodbc-latest:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-10-0-circle-test-conda-python-3.7-turbodbc-latest
> - test-conda-python-3.7-turbodbc-master:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-10-0-circle-test-conda-python-3.7-turbodbc-master
Valid, turbodbc fails to compile. Created jira
https://issues.apache.org/jira/browse/ARROW-7816
> - wheel-manylinux1-cp38:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-10-0-azure-wheel-manylinux1-cp38
Caused by a connection timeout, restarted.
> - wheel-osx-cp27m:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-10-0-travis-wheel-osx-cp27m
The build halted without any error message, seems like a travis
flakiness, restarted.
>
> Succeeded Tasks:
> - centos-6:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-10-0-azure-centos-6
> - centos-7:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-10-0-azure-centos-7
> - centos-8:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-10-0-azure-centos-8
> - conda-linux-gcc-py27:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-10-0-azure-conda-linux-gcc-py27
> - conda-linux-gcc-py36:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-10-0-azure-conda-linux-gcc-py36
> - conda-linux-gcc-py37:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-10-0-azure-conda-linux-gcc-py37
> - conda-linux-gcc-py38:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-10-0-azure-conda-linux-gcc-py38
> - conda-osx-clang-py27:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-10-0-azure-conda-osx-clang-py27
> - conda-osx-clang-py36:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-10-0-azure-conda-osx-clang-py36
> - conda-osx-clang-py37:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-10-0-azure-conda-osx-clang-py37
> - conda-osx-clang-py38:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-10-0-azure-conda-osx-clang-py38
> - conda-win-vs2015-py36:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-10-0-azure-conda-win-vs2015-py36
> - conda-win-vs2015-py37:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-10-0-azure-conda-win-vs2015-py37
> - conda-win-vs2015-py38:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-10-0-azure-conda-win-vs2015-py38
> - debian-buster:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-10-0-azure-debian-buster
> - debian-stretch:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-10-0-azure-debian-stretch
> - gandiva-jar-osx:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-10-0-travis-gandiva-jar-osx
> - gandiva-jar-trusty:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-10-0-travis-gandiva-jar-trusty
> - homebrew-cpp:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-10-0-travis-homebrew-cpp
> - test-conda-cpp:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-10-0-circle-test-conda-cpp
> - test-conda-python-2.7-pandas-latest:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-10-0-circle-test-conda-python-2.7-pandas-latest
> - test-conda-python-2.7:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-10-0-circle-test-conda-python-2.7
> - test-conda-python-3.6:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-10-0-circle-test-conda-python-3.6
> - test-conda-python-3.7-dask-latest:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-10-0-circle-test-conda-python-3.7-dask-latest
> - test-conda-python-3.7-hdfs-2.9.2:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-10-0-circle-test-conda-python-3.7-hdfs-2.9.2
> - test-conda-python-3.7-pandas-latest:
>   URL: 
> 

[jira] [Created] (ARROW-7816) [Integration] Turbodbc fails to compile in the nightly tests

2020-02-10 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-7816:
--

 Summary: [Integration] Turbodbc fails to compile in the nightly 
tests
 Key: ARROW-7816
 URL: https://issues.apache.org/jira/browse/ARROW-7816
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Krisztian Szucs


Failing builds:
- 
https://circleci.com/gh/ursa-labs/crossbow/8035?utm_campaign=vcs-integration-link_medium=referral_source=github-build-link
- 
https://circleci.com/gh/ursa-labs/crossbow/8035?utm_campaign=vcs-integration-link_medium=referral_source=github-build-link



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7815) [C++] Fix crashes on corrupt IPC input (OSS-Fuzz)

2020-02-10 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7815:
-

 Summary: [C++] Fix crashes on corrupt IPC input (OSS-Fuzz)
 Key: ARROW-7815
 URL: https://issues.apache.org/jira/browse/ARROW-7815
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou
 Fix For: 1.0.0


More issues have been discovered with OSS-Fuzz, we need to enhance input 
validation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] Release Apache Arrow 0.16.0 - RC2

2020-02-10 Thread Wes McKinney
Can we announce the release? Let me know if I can help with anything

On Sun, Feb 9, 2020 at 6:23 PM Sutou Kouhei  wrote:
>
> Hi,
>
> MSYS2 package is updated:
> https://github.com/msys2/MINGW-packages/pull/6175
>
>
> Thanks,
> --
> kou
>
> In 
>   "Re: [VOTE] Release Apache Arrow 0.16.0 - RC2" on Sun, 9 Feb 2020 09:06:39 
> -0800,
>   Neal Richardson  wrote:
>
> > R package 0.16.0 has been accepted by CRAN; may take a few more days for
> > CRAN to build Windows and macOS binaries.
> >
> > Neal
> >
> > On Fri, Feb 7, 2020 at 4:50 PM Neal Richardson 
> > wrote:
> >
> >> Homebrew PR is up: https://github.com/Homebrew/homebrew-core/pull/49908
> >>
> >>
> >>
> >> On Fri, Feb 7, 2020 at 3:44 PM Neal Richardson <
> >> neal.p.richard...@gmail.com> wrote:
> >>
> >>> I'm working (with Jeroen) on the R package stuff, and I'll put together
> >>> the homebrew formula PR now.
> >>>
> >>> Neal
> >>>
> >>> On Fri, Feb 7, 2020 at 6:42 AM Andy Grove  wrote:
> >>>
>  Rust crates are published. Filed
>  https://issues.apache.org/jira/browse/ARROW-7794 for the arrow-flight
>  issue.
> 
>  On Fri, Feb 7, 2020 at 7:05 AM Andy Grove  wrote:
> 
>  > I'll look at the Rust issue now.
>  >
>  > On Fri, Feb 7, 2020 at 3:27 AM Krisztián Szűcs <
>  szucs.kriszt...@gmail.com>
>  > wrote:
>  >
>  >> Status of the post release tasks:
>  >>
>  >> - [DONE] marking the released version as "RELEASED" on JIRA
>  >> - [DONE] rebase the master branch on top of the release branch
>  >>   We have failing builds on the master because of the bumped version
>  >> numbers, those must be fixed.
>  >> - [DONE] rebase the pull requests
>  >> - [DONE] uploading source release artifacts to SVN
>  >> - [DONE] uploading binary release artifacts to Bintray
>  >> - [DONE] updating the Arrow website
>  >> - [DONE] python source distribution
>  >> - [DONE] ruby gems
>  >> - [DONE] javascript npm packages
>  >> - [DONE] .NET nuget packages
>  >> - [DONE] java maven artifacts
>  >>I had to re-stage the maven artifacts
>  >> - [kszucs] python wheels
>  >>   Partially done, the manylinux2010 and manylinux2014 wheels have hit
>  >> the 60MB limit of pypi, requested to increase it
>  >> https://github.com/pypa/pypi-support/issues/186
>  >> - [xhochy] conda packages
>  >>   Uwe works on it, arrow-cpp is already merged.
>  >> - [andygrove] rust packages
>  >>Arrow-flight misses the format directory, didn't dig deeper. Andy
>  >> could you take a look?
>  >> - [nealrichardson] R packages
>  >>   Neal should handle this.
>  >> - [ ] homebrew packages
>  >> - [ ] updating website with new API documentation
>  >> - [ ] announcing release
>  >>
>  >> On Fri, Feb 7, 2020 at 8:18 AM Krisztián Szűcs
>  >>  wrote:
>  >> >
>  >> > The VOTE carries with 4 binding +1 votes, 3 non-binding +1 votes and
>  >> > one binding +0 vote.
>  >> >
>  >> > I'm starting the post-release tasks, if anyone wants to help please
>  let
>  >> me know.
>  >> >
>  >> > On Fri, Feb 7, 2020 at 12:25 AM Krisztián Szűcs
>  >> >  wrote:
>  >> > >
>  >> > > So far we have the following votes:
>  >> > >
>  >> > > +0 (binding)
>  >> > > +1 (binding)
>  >> > > +1 (non-binding)
>  >> > > +1 (binding)
>  >> > > +1 (non-binding)
>  >> > > +1 (binding)
>  >> > > +1 (non-binding)
>  >> > > +1 (binding)
>  >> > > 
>  >> > > 4 +1 (binding)
>  >> > > 3 +1 (non-binding)
>  >> > >
>  >> > > I'm waiting for votes until tomorrow morning (UTC), then I'm
>  closing
>  >> the VOTE.
>  >> > >
>  >> > > Thanks everyone!
>  >> > >
>  >> > > - Krisztian
>  >> > >
>  >> > > On Fri, Feb 7, 2020 at 12:06 AM Krisztián Szűcs
>  >> > >  wrote:
>  >> > > >
>  >> > > > Testing on macOS Catalina
>  >> > > >
>  >> > > > Binaries: OK
>  >> > > >
>  >> > > > Wheels: OK
>  >> > > > Verified on macOS and on Linux.
>  >> > > > On linux the verification script has failed for python 3.5 and
>  >> manylinux2010
>  >> > > > and manylinux2014 with unsupported platform tag. I've manually
>  >> checked
>  >> > > > these wheels in the python:3.5 docker image, and the wheels were
>  >> good
>  >> > > > (this is automatically checked by crossbow too [1]). All other
>  >> wheels were
>  >> > > > passing using the verification script.
>  >> > > >
>  >> > > > Source: OK
>  >> > > > I had to revert the nvm path [2] to pass the js and integration
>  >> tests and
>  >> > > > force the glib test to use my system python instead of the conda
>  >> one.
>  >> > > >
>  >> > > > I vote with +1 (binding)
>  >> > > >
>  >> > > > [1]:
>  >> https://github.com/apache/arrow/blob/master/dev/tasks/tasks.yml#L568
>  >> > > > [2]:

Re: Arrow Datasets Functionality for Python

2020-02-10 Thread Francois Saint-Jacques
Hello Matthew,

The dplyr binding is just syntactic sugar on top of the dataset API.
There's no analytics capabilities yet [1], other than the select and
the limited projection supported by the dataset API. It looks like it
is doing analytics due to properly placed `collect()` calls, which
converts from Arrow's stream of RecordBatch to R internal data frames.
The analytic work is done by R. The same functionality exists under
python, you invoke the dataset scan and then pass the result to
pandas.

In 2020 [2], we are actively working toward an analytic engine, with
bindings for R *and* Python. Within this engine, we have physical
operators, or compute kernels, that can be seen as functions that
takes a stream of RecordBatch and yields a new stream of RecordBatch.
The dataset API is the Scan physical operators, i.e. it materialize a
stream of RecordBatch from files or other sources. Gandiva is a
compiler that generates the Filter and Project physical operators.
Think of gandiva as a physical operator factory, you give it a
predicate (or multiple expression in the case of projection) and it
gives you back a function pointer that knows how to evaluate this
predicate (expressions) on a RecordBatch and yields a RecordBatch.
There still needs to be a coordinator on top of both that "plugs"
them, i.e. the execution engine.

Hope this helps,
François

[1] 
https://github.com/apache/arrow/blob/6600a39ffe149971afd5ad3c78c2b538cdc03cfd/r/R/dplyr.R#L255-L322
[2] https://ursalabs.org/blog/2020-outlook/



On Sun, Feb 9, 2020 at 11:24 PM Matthew Turner
 wrote:
>
> Hi Wes / Arrow Dev Team,
>
> Following up on our brief twitter 
> convo on the 
> Datasets functionality in R / Python.
>
> To provide context to others, you had mentioned that the API in python / 
> pyarrow was more developer centric and intended for users to consume it 
> through higher level interfaces(i.e. IBIS).  This was in comparison to dplyr 
> which from your demo had some nice analytic capabilities on top of Arrow 
> Datasets.
>
> Seeing that demonstration made me interested to see similar Arrow Datasets 
> functionality within Python.  But it doesn't seem that is an intended 
> capability for pyarrow which I do generally understand.  However, I was 
> trying to understand how Gandiva ties into the Arrow project as I understand 
> that to be an analytic engine of sorts (maybe im misunderstanding).  I saw 
> this implementation of Gandiva 
> with pandas which was quite interesting and was wondering if this is the 
> strategic goal - to have Gandiva be a component of third party tools who use 
> arrow or if Gandiva would eventually be more of a core analytic component of 
> Arrow.
>
> Extending on this I hoping to get the teams view on what they see as the 
> likely home of dplyr datasets type functionality within the python ecosystem 
> (i.e. IBIS or something else).
>
> Thanks to all for your work on the project!
>
> Best,
>
> Matthew M. Turner
> Email: matthew.m.tur...@outlook.com
> Phone: (908)-868-2786
>


[NIGHTLY] Arrow Build Report for Job nightly-2020-02-10-0

2020-02-10 Thread Crossbow


Arrow Build Report for Job nightly-2020-02-10-0

All tasks: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-10-0

Failed Tasks:
- macos-r-autobrew:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-10-0-travis-macos-r-autobrew
- test-conda-python-3.7-turbodbc-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-10-0-circle-test-conda-python-3.7-turbodbc-latest
- test-conda-python-3.7-turbodbc-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-10-0-circle-test-conda-python-3.7-turbodbc-master
- wheel-manylinux1-cp38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-10-0-azure-wheel-manylinux1-cp38
- wheel-osx-cp27m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-10-0-travis-wheel-osx-cp27m

Succeeded Tasks:
- centos-6:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-10-0-azure-centos-6
- centos-7:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-10-0-azure-centos-7
- centos-8:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-10-0-azure-centos-8
- conda-linux-gcc-py27:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-10-0-azure-conda-linux-gcc-py27
- conda-linux-gcc-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-10-0-azure-conda-linux-gcc-py36
- conda-linux-gcc-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-10-0-azure-conda-linux-gcc-py37
- conda-linux-gcc-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-10-0-azure-conda-linux-gcc-py38
- conda-osx-clang-py27:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-10-0-azure-conda-osx-clang-py27
- conda-osx-clang-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-10-0-azure-conda-osx-clang-py36
- conda-osx-clang-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-10-0-azure-conda-osx-clang-py37
- conda-osx-clang-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-10-0-azure-conda-osx-clang-py38
- conda-win-vs2015-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-10-0-azure-conda-win-vs2015-py36
- conda-win-vs2015-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-10-0-azure-conda-win-vs2015-py37
- conda-win-vs2015-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-10-0-azure-conda-win-vs2015-py38
- debian-buster:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-10-0-azure-debian-buster
- debian-stretch:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-10-0-azure-debian-stretch
- gandiva-jar-osx:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-10-0-travis-gandiva-jar-osx
- gandiva-jar-trusty:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-10-0-travis-gandiva-jar-trusty
- homebrew-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-10-0-travis-homebrew-cpp
- test-conda-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-10-0-circle-test-conda-cpp
- test-conda-python-2.7-pandas-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-10-0-circle-test-conda-python-2.7-pandas-latest
- test-conda-python-2.7:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-10-0-circle-test-conda-python-2.7
- test-conda-python-3.6:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-10-0-circle-test-conda-python-3.6
- test-conda-python-3.7-dask-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-10-0-circle-test-conda-python-3.7-dask-latest
- test-conda-python-3.7-hdfs-2.9.2:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-10-0-circle-test-conda-python-3.7-hdfs-2.9.2
- test-conda-python-3.7-pandas-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-10-0-circle-test-conda-python-3.7-pandas-latest
- test-conda-python-3.7-pandas-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-10-0-circle-test-conda-python-3.7-pandas-master
- test-conda-python-3.7-spark-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-10-0-circle-test-conda-python-3.7-spark-master
- test-conda-python-3.7:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-10-0-circle-test-conda-python-3.7
- test-conda-python-3.8-dask-master:
  URL: 

[jira] [Created] (ARROW-7814) [C++] Simplify build-support/run-test.sh

2020-02-10 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-7814:
--

 Summary: [C++] Simplify build-support/run-test.sh 
 Key: ARROW-7814
 URL: https://issues.apache.org/jira/browse/ARROW-7814
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Krisztian Szucs


run-test.sh contains unused code, see 
https://github.com/apache/arrow/pull/6202#pullrequestreview-354477439



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[ANNOUNCE] Apache Arrow 0.16.0 released

2020-02-10 Thread Krisztián Szűcs
The Apache Arrow community is pleased to announce the 0.16.0 release.
The release includes 735 resolved issues ([1]) since the 0.15.0 release.

The release is available now from our website, [2] and [3]:
https://arrow.apache.org/install/

Release notes are available at:
https://arrow.apache.org/release/0.16.0.html

What is Apache Arrow?
-

Apache Arrow is a cross-language development platform for in-memory data. It
specifies a standardized language-independent columnar memory format for flat
and hierarchical data, organized for efficient analytic operations on modern
hardware. It also provides computational libraries and zero-copy streaming
messaging and interprocess communication. Languages currently supported include
C, C++, C#, Go, Java, JavaScript, MATLAB, Python, R, Ruby, and Rust.

Please report any feedback to the mailing lists ([4])

Regards,
The Apache Arrow community

[1]: https://issues.apache.org/jira/projects/ARROW/versions/12340948
[2]: https://www.apache.org/dyn/closer.cgi/arrow/arrow-0.16.0/
[3]: https://bintray.com/apache/arrow
[4]: https://lists.apache.org/list.html?dev@arrow.apache.org


[jira] [Created] (ARROW-7813) Fix undefined behaviour and and remove unsafe

2020-02-10 Thread Markus Westerlind (Jira)
Markus Westerlind created ARROW-7813:


 Summary: Fix undefined behaviour and and remove unsafe
 Key: ARROW-7813
 URL: https://issues.apache.org/jira/browse/ARROW-7813
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Markus Westerlind


The rust implementation contains a lot more unsafe than is necessary and in 
many of these instances it is possible to invoke undefined behaviour from safe 
code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)