date:20200603

[jira] [Created] (ARROW-9033) [Python] Add tests to verify that one can build a C++ extension against the manylinux1 wheels

2020-06-03 Thread Wes McKinney (Jira)

Wes McKinney created ARROW-9033:
---

 Summary: [Python] Add tests to verify that one can build a C++ 
extension against the manylinux1 wheels
 Key: ARROW-9033
 URL: https://issues.apache.org/jira/browse/ARROW-9033
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Wes McKinney


Some project want to be able to use the Python wheels to build other Python 
packages with C++ extensions that need to link against libarrow.so. It would be 
great if someone would add automated tests to ensure that our wheel builds can 
be used successfully in this fashion. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-9032) [C++] Split arrow/util/bit_util.h into multiple header files

2020-06-03 Thread Wes McKinney (Jira)

Wes McKinney created ARROW-9032:
---

 Summary: [C++] Split arrow/util/bit_util.h into multiple header 
files
 Key: ARROW-9032
 URL: https://issues.apache.org/jira/browse/ARROW-9032
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Wes McKinney
 Fix For: 1.0.0


This header has grown quite large and any given compilation unit's use of it is 
likely limited to only a couple of functions or classes. I suspect it would 
improve compilation time to split up this header into a few headers organized 
by frequency of code use. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-9031) [R] Implement conversion from Type::UINT64 to R vector

2020-06-03 Thread Wes McKinney (Jira)

Wes McKinney created ARROW-9031:
---

 Summary: [R] Implement conversion from Type::UINT64 to R vector
 Key: ARROW-9031
 URL: https://issues.apache.org/jira/browse/ARROW-9031
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Wes McKinney
 Fix For: 1.0.0


This case is not handled in array_to_vector.cpp



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-9030) [Python] Clean up some usages of pyarrow.compat, move some common functions/symbols to lib.pyx

2020-06-03 Thread Wes McKinney (Jira)

Wes McKinney created ARROW-9030:
---

 Summary: [Python] Clean up some usages of pyarrow.compat, move 
some common functions/symbols to lib.pyx
 Key: ARROW-9030
 URL: https://issues.apache.org/jira/browse/ARROW-9030
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Wes McKinney


I started doing this while looking into ARROW-4633



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] Add kernel integer overflow handling

2020-06-03 Thread Wes McKinney

On Wed, Jun 3, 2020 at 11:25 AM Krisztián Szűcs
 wrote:
>
> On Wed, Jun 3, 2020 at 6:16 PM Krisztián Szűcs
>  wrote:
> >
> > On Wed, Jun 3, 2020 at 5:52 PM Wes McKinney  wrote:
> > >
> > > On Wed, Jun 3, 2020 at 10:49 AM Krisztián Szűcs
> > >  wrote:
> > > >
> > > > From the user perspective I find the following pretty confusing:
> > > >
> > > > In [1]: np.array([-128, 127], dtype=np.int8()) * 2
> > > > Out[1]: array([ 0, -2], dtype=int8)
> > > >
> > > > In [2]: np.array([-128, 127], dtype=np.int16()) * 2
> > > > Out[2]: array([-256,  254], dtype=int16)
> > > >
> > > > In my opinion somewhere (on a higher level maybe) we should provide
> > > > the correct results promoted to a wider type implicitly.
> > >
> > > Yes, I agree with you, but I agree that the best place to address this
> > > is at a higher level rather than having this logic implemented at the
> > > lowest level (kernels) -- I think database systems handle this during
> > > logical->physical planning.
> >
> > It raises another question: where to incorporate the implicit promotions?
> > // correct me if I'm wrong but these implicit promotions are operation
> > // dependent and distinct from kernel dispatching issue [1]
> >
> > The numpy example above can be roughly translated to:
> > >>> a = pa.array([-128, 127])
> > >>> pa.compute.add(a, a)
> > array([ 0, -2]
> >
> > Which is rather surprising from the user's perspective.
>
> Would it be enough to document the exact behavior  and advice the user
> to place casts until we have logical -> phisycal machinery?

I think it's enough to clearly document the behavior and assume that
the "user" will act according to what semantics are desired for their
use cases. Per my comments in my last e-mail I don't think the users
of these functions need to be handled with "kid's gloves".

> I'm updating my PR as discussed.
> >
> > [1] https://issues.apache.org/jira/browse/ARROW-8919
> > >
> > > > Clickhouse for example does the type promotion.
> > > >
> > > > On Wed, Jun 3, 2020 at 5:29 PM Antoine Pitrou  
> > > > wrote:
> > > > >
> > > > > On Wed, 3 Jun 2020 10:47:38 -0400
> > > > > Ben Kietzman  wrote:
> > > > > > https://github.com/apache/arrow/pull/7341#issuecomment-638241193
> > > > > >
> > > > > > How should arithmetic kernels handle integer overflow?
> > > > > >
> > > > > > The approach currently taken in the linked PR is to promote such 
> > > > > > that
> > > > > > overflow will not occur, for example `(int8, int8)->int16` and 
> > > > > > `(uint16,
> > > > > > uint16)->uint32`.
> > > > > >
> > > > > > I'm not sure that's desirable. For one thing this leads to 
> > > > > > inconsistent
> > > > > > handling of 64 bit integer types, which are currently allowed to 
> > > > > > overflow
> > > > > > since we cannot promote further (NB: that means this kernel includes
> > > > > > undefined behavior for int64).
> > > > >
> > > > > I agree with you.  I would strongly advise against implicit promotion
> > > > > accross arithmetic operations.  We initially did that in Numba and it
> > > > > quickly became a can of worms.
> > > > >
> > > > > The most desirable behaviour IMHO is to keep the original type, so:
> > > > > - (int8, int8) -> int8
> > > > > - (uint16, uint16) -> uint16
> > > > >
> > > > > Then the question is what happens when the actual overflow occurs.  I
> > > > > think this should be directed by a kernel option.  By default an error
> > > > > should probably be raised (letting errors pass and silently produce
> > > > > erroneous data is wrong), but we might want to allow people to bypass
> > > > > overflow checks for speed.
> > > > >
> > > > > Note that even if overflow detection is enabled, it *should* be 
> > > > > possible
> > > > > to enable vectorization, e.g. by making overflow detection a separate
> > > > > pass (itself vectorizable).
> > > > >
> > > > > Regards
> > > > >
> > > > > Antoine.
> > > > >
> > > > >

Re: [DISCUSS] Add kernel integer overflow handling

2020-06-03 Thread Wes McKinney

On Wed, Jun 3, 2020 at 11:16 AM Krisztián Szűcs
 wrote:
>
> On Wed, Jun 3, 2020 at 5:52 PM Wes McKinney  wrote:
> >
> > On Wed, Jun 3, 2020 at 10:49 AM Krisztián Szűcs
> >  wrote:
> > >
> > > From the user perspective I find the following pretty confusing:
> > >
> > > In [1]: np.array([-128, 127], dtype=np.int8()) * 2
> > > Out[1]: array([ 0, -2], dtype=int8)
> > >
> > > In [2]: np.array([-128, 127], dtype=np.int16()) * 2
> > > Out[2]: array([-256,  254], dtype=int16)
> > >
> > > In my opinion somewhere (on a higher level maybe) we should provide
> > > the correct results promoted to a wider type implicitly.
> >
> > Yes, I agree with you, but I agree that the best place to address this
> > is at a higher level rather than having this logic implemented at the
> > lowest level (kernels) -- I think database systems handle this during
> > logical->physical planning.
>
> It raises another question: where to incorporate the implicit promotions?
> // correct me if I'm wrong but these implicit promotions are operation
> // dependent and distinct from kernel dispatching issue [1]

Implicit promotions would be handled when translating a logical
expression (like "Add(x, y)") to a physical bound expression
(something like "add_kernel[int16, int16](cast(x, int16), cast(y,
int16))", where x and y are int8).

>
> The numpy example above can be roughly translated to:
> >>> a = pa.array([-128, 127])
> >>> pa.compute.add(a, a)
> array([ 0, -2]
>
> Which is rather surprising from the user's perspective.

I guess we have to qualify what is a "user". The direct "user" of the
kernels is not the same as a user who would write a SQL query or use a
data frame library. I believe the contents of pyarrow.compute (and
most of pyarrow, FWIW) are intended for system developers not end
users (e.g. "people who use pandas").

>
> [1] https://issues.apache.org/jira/browse/ARROW-8919
> >
> > > Clickhouse for example does the type promotion.
> > >
> > > On Wed, Jun 3, 2020 at 5:29 PM Antoine Pitrou  wrote:
> > > >
> > > > On Wed, 3 Jun 2020 10:47:38 -0400
> > > > Ben Kietzman  wrote:
> > > > > https://github.com/apache/arrow/pull/7341#issuecomment-638241193
> > > > >
> > > > > How should arithmetic kernels handle integer overflow?
> > > > >
> > > > > The approach currently taken in the linked PR is to promote such that
> > > > > overflow will not occur, for example `(int8, int8)->int16` and 
> > > > > `(uint16,
> > > > > uint16)->uint32`.
> > > > >
> > > > > I'm not sure that's desirable. For one thing this leads to 
> > > > > inconsistent
> > > > > handling of 64 bit integer types, which are currently allowed to 
> > > > > overflow
> > > > > since we cannot promote further (NB: that means this kernel includes
> > > > > undefined behavior for int64).
> > > >
> > > > I agree with you.  I would strongly advise against implicit promotion
> > > > accross arithmetic operations.  We initially did that in Numba and it
> > > > quickly became a can of worms.
> > > >
> > > > The most desirable behaviour IMHO is to keep the original type, so:
> > > > - (int8, int8) -> int8
> > > > - (uint16, uint16) -> uint16
> > > >
> > > > Then the question is what happens when the actual overflow occurs.  I
> > > > think this should be directed by a kernel option.  By default an error
> > > > should probably be raised (letting errors pass and silently produce
> > > > erroneous data is wrong), but we might want to allow people to bypass
> > > > overflow checks for speed.
> > > >
> > > > Note that even if overflow detection is enabled, it *should* be possible
> > > > to enable vectorization, e.g. by making overflow detection a separate
> > > > pass (itself vectorizable).
> > > >
> > > > Regards
> > > >
> > > > Antoine.
> > > >
> > > >

Re: [DISCUSS] Add kernel integer overflow handling

2020-06-03 Thread Krisztián Szűcs

On Wed, Jun 3, 2020 at 6:16 PM Krisztián Szűcs
 wrote:
>
> On Wed, Jun 3, 2020 at 5:52 PM Wes McKinney  wrote:
> >
> > On Wed, Jun 3, 2020 at 10:49 AM Krisztián Szűcs
> >  wrote:
> > >
> > > From the user perspective I find the following pretty confusing:
> > >
> > > In [1]: np.array([-128, 127], dtype=np.int8()) * 2
> > > Out[1]: array([ 0, -2], dtype=int8)
> > >
> > > In [2]: np.array([-128, 127], dtype=np.int16()) * 2
> > > Out[2]: array([-256,  254], dtype=int16)
> > >
> > > In my opinion somewhere (on a higher level maybe) we should provide
> > > the correct results promoted to a wider type implicitly.
> >
> > Yes, I agree with you, but I agree that the best place to address this
> > is at a higher level rather than having this logic implemented at the
> > lowest level (kernels) -- I think database systems handle this during
> > logical->physical planning.
>
> It raises another question: where to incorporate the implicit promotions?
> // correct me if I'm wrong but these implicit promotions are operation
> // dependent and distinct from kernel dispatching issue [1]
>
> The numpy example above can be roughly translated to:
> >>> a = pa.array([-128, 127])
> >>> pa.compute.add(a, a)
> array([ 0, -2]
>
> Which is rather surprising from the user's perspective.

Would it be enough to document the exact behavior  and advice the user
to place casts until we have logical -> phisycal machinery?

I'm updating my PR as discussed.
>
> [1] https://issues.apache.org/jira/browse/ARROW-8919
> >
> > > Clickhouse for example does the type promotion.
> > >
> > > On Wed, Jun 3, 2020 at 5:29 PM Antoine Pitrou  wrote:
> > > >
> > > > On Wed, 3 Jun 2020 10:47:38 -0400
> > > > Ben Kietzman  wrote:
> > > > > https://github.com/apache/arrow/pull/7341#issuecomment-638241193
> > > > >
> > > > > How should arithmetic kernels handle integer overflow?
> > > > >
> > > > > The approach currently taken in the linked PR is to promote such that
> > > > > overflow will not occur, for example `(int8, int8)->int16` and 
> > > > > `(uint16,
> > > > > uint16)->uint32`.
> > > > >
> > > > > I'm not sure that's desirable. For one thing this leads to 
> > > > > inconsistent
> > > > > handling of 64 bit integer types, which are currently allowed to 
> > > > > overflow
> > > > > since we cannot promote further (NB: that means this kernel includes
> > > > > undefined behavior for int64).
> > > >
> > > > I agree with you.  I would strongly advise against implicit promotion
> > > > accross arithmetic operations.  We initially did that in Numba and it
> > > > quickly became a can of worms.
> > > >
> > > > The most desirable behaviour IMHO is to keep the original type, so:
> > > > - (int8, int8) -> int8
> > > > - (uint16, uint16) -> uint16
> > > >
> > > > Then the question is what happens when the actual overflow occurs.  I
> > > > think this should be directed by a kernel option.  By default an error
> > > > should probably be raised (letting errors pass and silently produce
> > > > erroneous data is wrong), but we might want to allow people to bypass
> > > > overflow checks for speed.
> > > >
> > > > Note that even if overflow detection is enabled, it *should* be possible
> > > > to enable vectorization, e.g. by making overflow detection a separate
> > > > pass (itself vectorizable).
> > > >
> > > > Regards
> > > >
> > > > Antoine.
> > > >
> > > >

Re: [DISCUSS] Add kernel integer overflow handling

2020-06-03 Thread Krisztián Szűcs

On Wed, Jun 3, 2020 at 5:52 PM Wes McKinney  wrote:
>
> On Wed, Jun 3, 2020 at 10:49 AM Krisztián Szűcs
>  wrote:
> >
> > From the user perspective I find the following pretty confusing:
> >
> > In [1]: np.array([-128, 127], dtype=np.int8()) * 2
> > Out[1]: array([ 0, -2], dtype=int8)
> >
> > In [2]: np.array([-128, 127], dtype=np.int16()) * 2
> > Out[2]: array([-256,  254], dtype=int16)
> >
> > In my opinion somewhere (on a higher level maybe) we should provide
> > the correct results promoted to a wider type implicitly.
>
> Yes, I agree with you, but I agree that the best place to address this
> is at a higher level rather than having this logic implemented at the
> lowest level (kernels) -- I think database systems handle this during
> logical->physical planning.

It raises another question: where to incorporate the implicit promotions?
// correct me if I'm wrong but these implicit promotions are operation
// dependent and distinct from kernel dispatching issue [1]

The numpy example above can be roughly translated to:
>>> a = pa.array([-128, 127])
>>> pa.compute.add(a, a)
array([ 0, -2]

Which is rather surprising from the user's perspective.

[1] https://issues.apache.org/jira/browse/ARROW-8919
>
> > Clickhouse for example does the type promotion.
> >
> > On Wed, Jun 3, 2020 at 5:29 PM Antoine Pitrou  wrote:
> > >
> > > On Wed, 3 Jun 2020 10:47:38 -0400
> > > Ben Kietzman  wrote:
> > > > https://github.com/apache/arrow/pull/7341#issuecomment-638241193
> > > >
> > > > How should arithmetic kernels handle integer overflow?
> > > >
> > > > The approach currently taken in the linked PR is to promote such that
> > > > overflow will not occur, for example `(int8, int8)->int16` and `(uint16,
> > > > uint16)->uint32`.
> > > >
> > > > I'm not sure that's desirable. For one thing this leads to inconsistent
> > > > handling of 64 bit integer types, which are currently allowed to 
> > > > overflow
> > > > since we cannot promote further (NB: that means this kernel includes
> > > > undefined behavior for int64).
> > >
> > > I agree with you.  I would strongly advise against implicit promotion
> > > accross arithmetic operations.  We initially did that in Numba and it
> > > quickly became a can of worms.
> > >
> > > The most desirable behaviour IMHO is to keep the original type, so:
> > > - (int8, int8) -> int8
> > > - (uint16, uint16) -> uint16
> > >
> > > Then the question is what happens when the actual overflow occurs.  I
> > > think this should be directed by a kernel option.  By default an error
> > > should probably be raised (letting errors pass and silently produce
> > > erroneous data is wrong), but we might want to allow people to bypass
> > > overflow checks for speed.
> > >
> > > Note that even if overflow detection is enabled, it *should* be possible
> > > to enable vectorization, e.g. by making overflow detection a separate
> > > pass (itself vectorizable).
> > >
> > > Regards
> > >
> > > Antoine.
> > >
> > >

[jira] [Created] (ARROW-9029) [C++] Implement BitmapScanner interface to accelerate processing of mostly-not-null data

2020-06-03 Thread Wes McKinney (Jira)

Wes McKinney created ARROW-9029:
---

 Summary: [C++] Implement BitmapScanner interface to accelerate 
processing of mostly-not-null data
 Key: ARROW-9029
 URL: https://issues.apache.org/jira/browse/ARROW-9029
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Wes McKinney
Assignee: Wes McKinney
 Fix For: 1.0.0


In analytics, it is common for data to be all not-null or mostly not-null. Data 
with > 50% nulls tends to be more exceptional. In this might, our 
{{BitmapReader}} class which allows iteration of each bit in a bitmap can be 
wasteful for mostly set validity bitmaps.

I propose instead a new interface for use in kernel implementations, for lack 
of a better term {{BitmapScanner}}. This works as follows:

* Uses popcount to accumulate consecutive 64-bit words from a bitmap where all 
values are set, up to some limit (e.g. anywhere from 8 to 128 words -- we can 
use benchmarks to determine what is a good limit). The length of this "all-on" 
run is returned to the caller in a single function call, so that this "run" of 
data can be processed without any bit-by-bit bitmap checking
* If words containing unset bits is encountered, the scanner will similarly 
accumulate non-full words until the next full word is encountered or a limit is 
hit. The length of this "has nulls" run is returned to the caller, which then 
proceeds bit-by-bit to process the data

For data with a lot of nulls, this may degrade performance somewhat but 
probably not that much empirically. However, data that is mostly-not-null 
should benefit from this. 

This BitmapScanner utility can probably also be used to accelerate the 
implementation of Filter for mostly-not-null data



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] Add kernel integer overflow handling

2020-06-03 Thread Wes McKinney

On Wed, Jun 3, 2020 at 10:49 AM Krisztián Szűcs
 wrote:
>
> From the user perspective I find the following pretty confusing:
>
> In [1]: np.array([-128, 127], dtype=np.int8()) * 2
> Out[1]: array([ 0, -2], dtype=int8)
>
> In [2]: np.array([-128, 127], dtype=np.int16()) * 2
> Out[2]: array([-256,  254], dtype=int16)
>
> In my opinion somewhere (on a higher level maybe) we should provide
> the correct results promoted to a wider type implicitly.

Yes, I agree with you, but I agree that the best place to address this
is at a higher level rather than having this logic implemented at the
lowest level (kernels) -- I think database systems handle this during
logical->physical planning.


> Clickhouse for example does the type promotion.
>
> On Wed, Jun 3, 2020 at 5:29 PM Antoine Pitrou  wrote:
> >
> > On Wed, 3 Jun 2020 10:47:38 -0400
> > Ben Kietzman  wrote:
> > > https://github.com/apache/arrow/pull/7341#issuecomment-638241193
> > >
> > > How should arithmetic kernels handle integer overflow?
> > >
> > > The approach currently taken in the linked PR is to promote such that
> > > overflow will not occur, for example `(int8, int8)->int16` and `(uint16,
> > > uint16)->uint32`.
> > >
> > > I'm not sure that's desirable. For one thing this leads to inconsistent
> > > handling of 64 bit integer types, which are currently allowed to overflow
> > > since we cannot promote further (NB: that means this kernel includes
> > > undefined behavior for int64).
> >
> > I agree with you.  I would strongly advise against implicit promotion
> > accross arithmetic operations.  We initially did that in Numba and it
> > quickly became a can of worms.
> >
> > The most desirable behaviour IMHO is to keep the original type, so:
> > - (int8, int8) -> int8
> > - (uint16, uint16) -> uint16
> >
> > Then the question is what happens when the actual overflow occurs.  I
> > think this should be directed by a kernel option.  By default an error
> > should probably be raised (letting errors pass and silently produce
> > erroneous data is wrong), but we might want to allow people to bypass
> > overflow checks for speed.
> >
> > Note that even if overflow detection is enabled, it *should* be possible
> > to enable vectorization, e.g. by making overflow detection a separate
> > pass (itself vectorizable).
> >
> > Regards
> >
> > Antoine.
> >
> >

Re: [DISCUSS] Add kernel integer overflow handling

2020-06-03 Thread Krisztián Szűcs

>From the user perspective I find the following pretty confusing:

In [1]: np.array([-128, 127], dtype=np.int8()) * 2
Out[1]: array([ 0, -2], dtype=int8)

In [2]: np.array([-128, 127], dtype=np.int16()) * 2
Out[2]: array([-256,  254], dtype=int16)

In my opinion somewhere (on a higher level maybe) we should provide
the correct results promoted to a wider type implicitly.

Clickhouse for example does the type promotion.

On Wed, Jun 3, 2020 at 5:29 PM Antoine Pitrou  wrote:
>
> On Wed, 3 Jun 2020 10:47:38 -0400
> Ben Kietzman  wrote:
> > https://github.com/apache/arrow/pull/7341#issuecomment-638241193
> >
> > How should arithmetic kernels handle integer overflow?
> >
> > The approach currently taken in the linked PR is to promote such that
> > overflow will not occur, for example `(int8, int8)->int16` and `(uint16,
> > uint16)->uint32`.
> >
> > I'm not sure that's desirable. For one thing this leads to inconsistent
> > handling of 64 bit integer types, which are currently allowed to overflow
> > since we cannot promote further (NB: that means this kernel includes
> > undefined behavior for int64).
>
> I agree with you.  I would strongly advise against implicit promotion
> accross arithmetic operations.  We initially did that in Numba and it
> quickly became a can of worms.
>
> The most desirable behaviour IMHO is to keep the original type, so:
> - (int8, int8) -> int8
> - (uint16, uint16) -> uint16
>
> Then the question is what happens when the actual overflow occurs.  I
> think this should be directed by a kernel option.  By default an error
> should probably be raised (letting errors pass and silently produce
> erroneous data is wrong), but we might want to allow people to bypass
> overflow checks for speed.
>
> Note that even if overflow detection is enabled, it *should* be possible
> to enable vectorization, e.g. by making overflow detection a separate
> pass (itself vectorizable).
>
> Regards
>
> Antoine.
>
>

Re: [DISCUSS] Add kernel integer overflow handling

2020-06-03 Thread Wes McKinney

On Wed, Jun 3, 2020 at 10:44 AM Wes McKinney  wrote:
>
> > By default an error should probably be raised
>
> I would very strongly recommend keeping the behavior consistent with
> that of analytic DBMSes. I don't think that most analytic DBMS error
> on overflows because it's too computationally expensive to check.
> NumPy doesn't error (by default at least) when you have overflows
> either:
>
> >>> arr = np.array([128, 128, 128], dtype='i1')
> >>> arr + arr
> array([0, 0, 0], dtype=int8)

*facepalm*

>>> arr = np.array([127, 127, 127], dtype='i1')
>>> arr + arr
array([-2, -2, -2], dtype=int8)

>>>


>
> On Wed, Jun 3, 2020 at 10:29 AM Antoine Pitrou  wrote:
> >
> > On Wed, 3 Jun 2020 10:47:38 -0400
> > Ben Kietzman  wrote:
> > > https://github.com/apache/arrow/pull/7341#issuecomment-638241193
> > >
> > > How should arithmetic kernels handle integer overflow?
> > >
> > > The approach currently taken in the linked PR is to promote such that
> > > overflow will not occur, for example `(int8, int8)->int16` and `(uint16,
> > > uint16)->uint32`.
> > >
> > > I'm not sure that's desirable. For one thing this leads to inconsistent
> > > handling of 64 bit integer types, which are currently allowed to overflow
> > > since we cannot promote further (NB: that means this kernel includes
> > > undefined behavior for int64).
> >
> > I agree with you.  I would strongly advise against implicit promotion
> > accross arithmetic operations.  We initially did that in Numba and it
> > quickly became a can of worms.
> >
> > The most desirable behaviour IMHO is to keep the original type, so:
> > - (int8, int8) -> int8
> > - (uint16, uint16) -> uint16
> >
> > Then the question is what happens when the actual overflow occurs.  I
> > think this should be directed by a kernel option.  By default an error
> > should probably be raised (letting errors pass and silently produce
> > erroneous data is wrong), but we might want to allow people to bypass
> > overflow checks for speed.
> >
> > Note that even if overflow detection is enabled, it *should* be possible
> > to enable vectorization, e.g. by making overflow detection a separate
> > pass (itself vectorizable).
> >
> > Regards
> >
> > Antoine.
> >
> >

Re: [DISCUSS] Add kernel integer overflow handling

2020-06-03 Thread Wes McKinney

> By default an error should probably be raised

I would very strongly recommend keeping the behavior consistent with
that of analytic DBMSes. I don't think that most analytic DBMS error
on overflows because it's too computationally expensive to check.
NumPy doesn't error (by default at least) when you have overflows
either:

>>> arr = np.array([128, 128, 128], dtype='i1')
>>> arr + arr
array([0, 0, 0], dtype=int8)

On Wed, Jun 3, 2020 at 10:29 AM Antoine Pitrou  wrote:
>
> On Wed, 3 Jun 2020 10:47:38 -0400
> Ben Kietzman  wrote:
> > https://github.com/apache/arrow/pull/7341#issuecomment-638241193
> >
> > How should arithmetic kernels handle integer overflow?
> >
> > The approach currently taken in the linked PR is to promote such that
> > overflow will not occur, for example `(int8, int8)->int16` and `(uint16,
> > uint16)->uint32`.
> >
> > I'm not sure that's desirable. For one thing this leads to inconsistent
> > handling of 64 bit integer types, which are currently allowed to overflow
> > since we cannot promote further (NB: that means this kernel includes
> > undefined behavior for int64).
>
> I agree with you.  I would strongly advise against implicit promotion
> accross arithmetic operations.  We initially did that in Numba and it
> quickly became a can of worms.
>
> The most desirable behaviour IMHO is to keep the original type, so:
> - (int8, int8) -> int8
> - (uint16, uint16) -> uint16
>
> Then the question is what happens when the actual overflow occurs.  I
> think this should be directed by a kernel option.  By default an error
> should probably be raised (letting errors pass and silently produce
> erroneous data is wrong), but we might want to allow people to bypass
> overflow checks for speed.
>
> Note that even if overflow detection is enabled, it *should* be possible
> to enable vectorization, e.g. by making overflow detection a separate
> pass (itself vectorizable).
>
> Regards
>
> Antoine.
>
>

Re: [DISCUSS] Add kernel integer overflow handling

2020-06-03 Thread Antoine Pitrou

On Wed, 3 Jun 2020 10:47:38 -0400
Ben Kietzman  wrote:
> https://github.com/apache/arrow/pull/7341#issuecomment-638241193
> 
> How should arithmetic kernels handle integer overflow?
> 
> The approach currently taken in the linked PR is to promote such that
> overflow will not occur, for example `(int8, int8)->int16` and `(uint16,
> uint16)->uint32`.
> 
> I'm not sure that's desirable. For one thing this leads to inconsistent
> handling of 64 bit integer types, which are currently allowed to overflow
> since we cannot promote further (NB: that means this kernel includes
> undefined behavior for int64).

I agree with you.  I would strongly advise against implicit promotion
accross arithmetic operations.  We initially did that in Numba and it
quickly became a can of worms.

The most desirable behaviour IMHO is to keep the original type, so:
- (int8, int8) -> int8
- (uint16, uint16) -> uint16

Then the question is what happens when the actual overflow occurs.  I
think this should be directed by a kernel option.  By default an error
should probably be raised (letting errors pass and silently produce
erroneous data is wrong), but we might want to allow people to bypass
overflow checks for speed.

Note that even if overflow detection is enabled, it *should* be possible
to enable vectorization, e.g. by making overflow detection a separate
pass (itself vectorizable).

Regards

Antoine.

Re: [DISCUSS] Add kernel integer overflow handling

2020-06-03 Thread Wes McKinney

What do open source analytic database systems do? I don't think we
should deviate from the behavior of these systems. For example, you
can see that Apache Impala uses unsigned arithmetic on signed integers

https://github.com/apache/impala/blob/5c69e7ba583297dc886652ac5952816882b928af/be/src/exprs/operators-ir.cc#L38

On Wed, Jun 3, 2020 at 9:47 AM Ben Kietzman  wrote:
>
> https://github.com/apache/arrow/pull/7341#issuecomment-638241193
>
> How should arithmetic kernels handle integer overflow?
>
> The approach currently taken in the linked PR is to promote such that
> overflow will not occur, for example `(int8, int8)->int16` and `(uint16,
> uint16)->uint32`.
>
> I'm not sure that's desirable. For one thing this leads to inconsistent
> handling of 64 bit integer types, which are currently allowed to overflow
> since we cannot promote further (NB: that means this kernel includes
> undefined behavior for int64).
>
> There are a few other approaches we could take (ordered by personal
> preference):
>
>- define explicit overflow behavior for signed integer operands (for
>example if we declared that add(i8(a), i8(b)) will always be equivalent
>to i8(i16(a) + i16(b)) then we could instantiate only unsigned addition
>kernels)
>- raise an error on signed overflow
>- provide ArithmeticOptions::overflow_behavior and allow users to choose
>between these
>- require users to pass arguments which will not overflow

Re: Writing Parquet datasets using pyarrow.parquet.ParquetWriter

2020-06-03 Thread Joris Van den Bossche

Hi Palak,

The ParquetWriter class is meant to write a single parquet file (so in that
sense, that you see only a single parquet file being written based on the
shown code, that is expected).

If you want to write multiple files, you can either manually create
multiple ParquetWriter instances (each with a different parquet file name).
Or, you can use the `pq.write_to_dataset()` function, which can
automatically partition your data in multiple files based on a column. But,
this function requires the full dataset in memory as a pandas dataframe or
pyarrow table (so this is not compatible with the chunked csv reading).

If you want to do it in chunks, it might be easier to use a higher level
package such as dask. Dask can read a csv file in chunks and write to
parquet using pyarrow automatically (see eg
https://docs.dask.org/en/latest/dataframe-api.html#dask.dataframe.read_csv).
It would look like:

import dask.dataframe as dd
df = dd.read_csv(..)
df.to_parquet(.., partition_on=['col'], engine="pyarrow")

Dask can also write a (common) metadata file for you.
If you want to do this manually using pyarrow, you can take a look at the
`parquet.write_metadata` function (
https://github.com/apache/arrow/blob/494e7a9c5714f3ed9e5590aeef8362114d5a3a46/python/pyarrow/parquet.py#L1748-L1783).

This needs to be better documented (covered by
https://issues.apache.org/jira/browse/ARROW-3154).

Best,
Joris

On Sat, 30 May 2020 at 16:52, Palak Harwani 
wrote:

> Hi,
> I had a few questions regarding pyarrow.parquet. I want to write a Parquet
> dataset which is partitioned according to one column. I have a large csv
> file and I'm using chunks of csv using the following code :
>
>   # csv_to_parquet.py
>
> import pandas as pdimport pyarrow as paimport pyarrow.parquet as pq
>
> csv_file = '/path/to/my.tsv'
> parquet_file = '/path/to/my.parquet'
> chunksize = 100_000
>
> csv_stream = pd.read_csv(csv_file, sep='\t', chunksize=chunksize,
> low_memory=False)
> for i, chunk in enumerate(csv_stream):
> print("Chunk", i)
> if i == 0:
> # Guess the schema of the CSV file from the first chunk
> parquet_schema = pa.Table.from_pandas(df=chunk).schema
> # Open a Parquet file for writing
> parquet_writer = pq.ParquetWriter(parquet_file,
> parquet_schema, compression='snappy')
> # Write CSV chunk to the parquet file
> table = pa.Table.from_pandas(chunk, schema=parquet_schema)
> parquet_writer.write_table(table)
>
>
> parquet_writer.close()
>
>
>
> But this code writes a single parquet file and I don't see any method in
> Parquet writer to write to a dataset, It just has the write_table method.
> Is there a way to do this ?
>
> Also how do I write the metadata file in the example mentioned above and
> the common metadata file as well as the metadata files in case of a
> partitioned dataset?
>
> Thanks in advanced.
>
> --
> *Regards,*
> *Palak Harwani*
>

[DISCUSS] Add kernel integer overflow handling

2020-06-03 Thread Ben Kietzman

https://github.com/apache/arrow/pull/7341#issuecomment-638241193

How should arithmetic kernels handle integer overflow?

The approach currently taken in the linked PR is to promote such that
overflow will not occur, for example `(int8, int8)->int16` and `(uint16,
uint16)->uint32`.

I'm not sure that's desirable. For one thing this leads to inconsistent
handling of 64 bit integer types, which are currently allowed to overflow
since we cannot promote further (NB: that means this kernel includes
undefined behavior for int64).

There are a few other approaches we could take (ordered by personal
preference):

   - define explicit overflow behavior for signed integer operands (for
   example if we declared that add(i8(a), i8(b)) will always be equivalent
   to i8(i16(a) + i16(b)) then we could instantiate only unsigned addition
   kernels)
   - raise an error on signed overflow
   - provide ArithmeticOptions::overflow_behavior and allow users to choose
   between these
   - require users to pass arguments which will not overflow

Re: [DISCUSS] Adding "byteWidth" field to Decimal Flatbuffers type for forward compatibility

2020-06-03 Thread Antoine Pitrou



Sounds good to me.

Regards

Antoine.


On Mon, 1 Jun 2020 17:47:38 -0500
Wes McKinney  wrote:
> I mentioned this on the recent sync call and opened
> 
> https://issues.apache.org/jira/browse/ARROW-8985
> 
> I believe at some point that Arrow may need to be used to transport
> decimal widths different from 128 bits. For example systems like
> Apache Kudu have 32-bit and 64-bit decimals. Computational code may
> immediately promote small decimals, but it's valuable to be able to
> transfer and represent the data as is rather than forcing an
> up-promotion even for low-precision decimal data.
> 
> In order to allow for this work to possibly happen in the future
> without requiring a new value be added to the "Type" Flatbuffers
> union, I propose to add a "byteWidth" field with default value 16 to
> the existing Decimal type. Here is a patch with this change:
> 
> https://github.com/apache/arrow/pull/7321
> 
> To make the forward compatibility issue clear, if this field is not
> added then current library versions would not be able to perceive the
> absence of the field, this making it unsafe for future library
> versions to annotate anything other than 16-byte decimals with this
> metadata.
> 
> As part of adopting this change, we would want to add assertions to
> the existing libraries to check that the byteWidth is indeed 16 and
> either throwing an exception or passing through the data as
> FixedSizeBinary otherwise.
> 
> Thanks,
> Wes
>

[jira] [Created] (ARROW-9028) [R] Should be able to convert an empty table

2020-06-03 Thread Francois Saint-Jacques (Jira)

Francois Saint-Jacques created ARROW-9028:
-

 Summary: [R] Should be able to convert an empty table
 Key: ARROW-9028
 URL: https://issues.apache.org/jira/browse/ARROW-9028
 Project: Apache Arrow
  Issue Type: Bug
  Components: R
Reporter: Francois Saint-Jacques






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-9027) [Python] Split in multiple files + clean-up pyarrow.parquet tests

2020-06-03 Thread Joris Van den Bossche (Jira)

Joris Van den Bossche created ARROW-9027:


 Summary: [Python] Split in multiple files + clean-up 
pyarrow.parquet tests
 Key: ARROW-9027
 URL: https://issues.apache.org/jira/browse/ARROW-9027
 Project: Apache Arrow
  Issue Type: Test
  Components: Python
Reporter: Joris Van den Bossche


The current {{test_parquet.py}} file is already above 4000 lines of code, and 
it is becoming a bit unwieldy to work with. 
Better structuring it, and maybe splitting it in multiple files, would help 
(separate test files could cover tests for basic reading/writing, tests for 
metadata/statistics objects, tests for multi-file datasets)





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-9026) [C++/Python] Force package removal from arrow-nightlies conda repository

2020-06-03 Thread Uwe Korn (Jira)

Uwe Korn created ARROW-9026:
---

 Summary: [C++/Python] Force package removal from arrow-nightlies 
conda repository
 Key: ARROW-9026
 URL: https://issues.apache.org/jira/browse/ARROW-9026
 Project: Apache Arrow
  Issue Type: Bug
  Components: Packaging
Reporter: Uwe Korn
Assignee: Uwe Korn






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-9025) Apache arrow fails to build with recent version of protobuf?

2020-06-03 Thread Keith Hughitt (Jira)

Keith Hughitt created ARROW-9025:


 Summary: Apache arrow fails to build with recent version of 
protobuf?
 Key: ARROW-9025
 URL: https://issues.apache.org/jira/browse/ARROW-9025
 Project: Apache Arrow
  Issue Type: Bug
Affects Versions: 0.17.0, 0.2.0
 Environment: - Arch Linux 5.6.15 (64-bit)
- Protobuf 3.12.0
- cmake 3.17.3
Reporter: Keith Hughitt


Greetings!

 

I am running into errors when attempting to compile recent versions of Arrow 
(0.17.0 and nightly for 2020-06-02).

 

I believe the issue may be related to a recent update to the protobuf library 
(version 3.12.0).

 

Error:

 

```

 {{...
Scanning dependencies of target plasma-serialization-tests
[ 41%] Building CXX object 
src/plasma/CMakeFiles/plasma-serialization-tests.dir/test/serialization_tests.cc.o
In file included from /usr/include/string.h:495,
 from 
/home/keith/software/arrow-git/src/build/googletest_ep-prefix/include/gtest/internal/gtest-port.h:270,
 from 
/home/keith/software/arrow-git/src/build/googletest_ep-prefix/include/gtest/internal/gtest-internal.h:40,
 from 
/home/keith/software/arrow-git/src/build/googletest_ep-prefix/include/gtest/gtest.h:59,
 from 
/home/keith/software/arrow-git/src/arrow/cpp/src/plasma/test/serialization_tests.cc:23:
In function ‘char* strncpy(char*, const char*, size_t)’,
inlined from ‘int plasma::TestPlasmaSerialization::CreateTemporaryFile()’ 
at 
/home/keith/software/arrow-git/src/arrow/cpp/src/plasma/test/serialization_tests.cc:85:12:
/usr/include/bits/string_fortified.h:106:34: warning: ‘char* 
__builtin_strncpy(char*, const char*, long unsigned int)’ specified bound 1024 
equals destination size [-Wstringop-truncation]
  106 |   return __builtin___strncpy_chk (__dest, __src, __len, __bos (__dest));
  |  ^~
[ 41%] Linking CXX executable ../../release/plasma-serialization-tests
/usr/bin/ld: ../../release/libarrow.so.18.0.0: undefined reference to 
`google::protobuf::internal::ComputeUnknownFieldsSize(google::protobuf::internal::InternalMetadataWithArena
 const&, unsigned long, google::protobuf::internal::CachedSize*)'
/usr/bin/ld: ../../release/libarrow.so.18.0.0: undefined reference to 
`google::protobuf::internal::UnknownFieldParse(unsigned int, 
google::protobuf::internal::InternalMetadataWithArena*, char const*, 
google::protobuf::internal::ParseContext*)'
/usr/bin/ld: ../../release/libarrow.so.18.0.0: undefined reference to 
`google::protobuf::internal::AssignDescriptors(google::protobuf::internal::DescriptorTable
 const*)'
collect2: error: ld returned 1 exit status
make[2]: *** 
[src/plasma/CMakeFiles/plasma-serialization-tests.dir/build.make:123: 
release/plasma-serialization-tests] Error 1
make[1]: *** [CMakeFiles/Makefile2:1516: 
src/plasma/CMakeFiles/plasma-serialization-tests.dir/all] Error 2
make: *** [Makefile:161: all] Error 2}}

```

 

The result is similar for the most recent nightly version of Arrow (2020-06-02).

 

Just let me know if there is anything you would like me to try / any other 
information I can help provide.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-9024) [C++/Python] Install anaconda-client in conda-clean job

2020-06-03 Thread Uwe Korn (Jira)

Uwe Korn created ARROW-9024:
---

 Summary: [C++/Python] Install anaconda-client in conda-clean job
 Key: ARROW-9024
 URL: https://issues.apache.org/jira/browse/ARROW-9024
 Project: Apache Arrow
  Issue Type: Bug
  Components: Packaging
Reporter: Uwe Korn
Assignee: Uwe Korn






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-9023) [C++] Use mimalloc conda package

2020-06-03 Thread Uwe Korn (Jira)

Uwe Korn created ARROW-9023:
---

 Summary: [C++] Use mimalloc conda package
 Key: ARROW-9023
 URL: https://issues.apache.org/jira/browse/ARROW-9023
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, Packaging
Reporter: Uwe Korn
Assignee: Uwe Korn






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-9022) [C++][Compute] Make Add function safe for numeric limits

2020-06-03 Thread Krisztian Szucs (Jira)

Krisztian Szucs created ARROW-9022:
--

 Summary: [C++][Compute] Make Add function safe for numeric limits
 Key: ARROW-9022
 URL: https://issues.apache.org/jira/browse/ARROW-9022
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Krisztian Szucs
Assignee: Krisztian Szucs
 Fix For: 1.0.0


Currently the output type of the Add function is identical with the argument 
types which makes it unsafe to add numeric limit values, so instead of using 
{{(int8, int8) -> int8}} signature we should use {{((int8, int8) -> int16}}.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[NIGHTLY] Arrow Build Report for Job nightly-2020-06-03-0

2020-06-03 Thread Crossbow



Arrow Build Report for Job nightly-2020-06-03-0

All tasks: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-03-0

Failed Tasks:
- centos-8-aarch64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-03-0-travis-centos-8-aarch64
- conda-clean:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-03-0-azure-conda-clean
- conda-linux-gcc-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-03-0-azure-conda-linux-gcc-py36
- conda-linux-gcc-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-03-0-azure-conda-linux-gcc-py37
- conda-linux-gcc-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-03-0-azure-conda-linux-gcc-py38
- conda-osx-clang-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-03-0-azure-conda-osx-clang-py36
- conda-osx-clang-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-03-0-azure-conda-osx-clang-py37
- conda-osx-clang-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-03-0-azure-conda-osx-clang-py38
- conda-win-vs2015-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-03-0-azure-conda-win-vs2015-py36
- conda-win-vs2015-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-03-0-azure-conda-win-vs2015-py37
- conda-win-vs2015-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-03-0-azure-conda-win-vs2015-py38
- homebrew-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-03-0-travis-homebrew-cpp
- homebrew-r-autobrew:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-03-0-travis-homebrew-r-autobrew
- test-conda-cpp-valgrind:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-03-0-github-test-conda-cpp-valgrind
- test-conda-python-3.7-dask-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-03-0-github-test-conda-python-3.7-dask-latest
- test-conda-python-3.7-spark-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-03-0-github-test-conda-python-3.7-spark-master
- test-conda-python-3.8-dask-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-03-0-github-test-conda-python-3.8-dask-master
- test-conda-python-3.8-jpype:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-03-0-github-test-conda-python-3.8-jpype
- wheel-osx-cp38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-03-0-travis-wheel-osx-cp38

Succeeded Tasks:
- centos-6-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-03-0-github-centos-6-amd64
- centos-7-aarch64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-03-0-travis-centos-7-aarch64
- centos-7-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-03-0-github-centos-7-amd64
- centos-8-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-03-0-github-centos-8-amd64
- debian-buster-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-03-0-github-debian-buster-amd64
- debian-buster-arm64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-03-0-travis-debian-buster-arm64
- debian-stretch-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-03-0-github-debian-stretch-amd64
- debian-stretch-arm64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-03-0-travis-debian-stretch-arm64
- gandiva-jar-osx:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-03-0-travis-gandiva-jar-osx
- gandiva-jar-xenial:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-03-0-travis-gandiva-jar-xenial
- nuget:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-03-0-github-nuget
- test-conda-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-03-0-github-test-conda-cpp
- test-conda-python-3.6-pandas-0.23:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-03-0-github-test-conda-python-3.6-pandas-0.23
- test-conda-python-3.6:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-03-0-github-test-conda-python-3.6
- test-conda-python-3.7-hdfs-2.9.2:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-03-0-github-test-conda-python-3.7-hdfs-2.9.2
- test-conda-python-3.7-kartothek-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-03-0-github-test-conda

[jira] [Created] (ARROW-9021) [Python] The filesystem keyword in parquet.read_table is not documented

2020-06-03 Thread Joris Van den Bossche (Jira)

Joris Van den Bossche created ARROW-9021:


 Summary: [Python] The filesystem keyword in parquet.read_table is 
not documented
 Key: ARROW-9021
 URL: https://issues.apache.org/jira/browse/ARROW-9021
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Joris Van den Bossche






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-9033) [Python] Add tests to verify that one can build a C++ extension against the manylinux1 wheels

[jira] [Created] (ARROW-9032) [C++] Split arrow/util/bit_util.h into multiple header files

[jira] [Created] (ARROW-9031) [R] Implement conversion from Type::UINT64 to R vector

[jira] [Created] (ARROW-9030) [Python] Clean up some usages of pyarrow.compat, move some common functions/symbols to lib.pyx

Re: [DISCUSS] Add kernel integer overflow handling

Re: [DISCUSS] Add kernel integer overflow handling

Re: [DISCUSS] Add kernel integer overflow handling

Re: [DISCUSS] Add kernel integer overflow handling

[jira] [Created] (ARROW-9029) [C++] Implement BitmapScanner interface to accelerate processing of mostly-not-null data

Re: [DISCUSS] Add kernel integer overflow handling

Re: [DISCUSS] Add kernel integer overflow handling

Re: [DISCUSS] Add kernel integer overflow handling

Re: [DISCUSS] Add kernel integer overflow handling

Re: [DISCUSS] Add kernel integer overflow handling

Re: [DISCUSS] Add kernel integer overflow handling

Re: Writing Parquet datasets using pyarrow.parquet.ParquetWriter

[DISCUSS] Add kernel integer overflow handling

Re: [DISCUSS] Adding "byteWidth" field to Decimal Flatbuffers type for forward compatibility

[jira] [Created] (ARROW-9028) [R] Should be able to convert an empty table

[jira] [Created] (ARROW-9027) [Python] Split in multiple files + clean-up pyarrow.parquet tests

[jira] [Created] (ARROW-9026) [C++/Python] Force package removal from arrow-nightlies conda repository

[jira] [Created] (ARROW-9025) Apache arrow fails to build with recent version of protobuf?

[jira] [Created] (ARROW-9024) [C++/Python] Install anaconda-client in conda-clean job

[jira] [Created] (ARROW-9023) [C++] Use mimalloc conda package

[jira] [Created] (ARROW-9022) [C++][Compute] Make Add function safe for numeric limits

[NIGHTLY] Arrow Build Report for Job nightly-2020-06-03-0

[jira] [Created] (ARROW-9021) [Python] The filesystem keyword in parquet.read_table is not documented

27 matches

Site Navigation

Mail list logo

Footer information