Re: Using Plasma with xnd

2018-03-06 Thread Wes McKinney
hi Saul -- I think the easiest solution here is the buffer/memoryview
protocol. You won't have to touch the Cython or C++ API from pyarrow
if you do this.

You can interact with a Buffer object like any other Python object
implementing the buffer protocol. See numpy.frombuffer as an example
of a function that interacts with such objects. I would suggest adding
a method to xnd method for this.

If you need to interact with Plasma from C then things will be more
complicated -- Robert or Philipp should be able to advise in this
case.

- Wes

On Tue, Mar 6, 2018 at 4:55 PM, Saul Shanabrook  wrote:
> Hey Wes,
>
> I don't have much experience doing C + Python + Cython development, so I am
> probably missing something obvious, but reading the Cython docs,
> 
> it
> seems like I can only access types marked as public from C code. When I
> compile arrow locally, I do get some C++ headers for the plasma code, but I
> don't think I can use them from C code either.
>
> Best,
> Saul
>
>
>
> On Tue, Mar 6, 2018 at 3:12 PM Wes McKinney  wrote:
>
>> hi Saul,
>>
>> Are you able to use the buffer/memoryview protocol? Instances of
>> pyarrow.Buffer, like PlasmaBuffer, support this
>>
>> https://github.com/apache/arrow/blob/master/python/pyarrow/plasma.pyx#L182
>>
>> - Wes
>>
>> On Tue, Mar 6, 2018 at 3:09 PM, Saul Shanabrook 
>> wrote:
>> > I am trying to use the Plasma store to back xnd objects. Xnd (
>> > https://xnd.readthedocs.io/en/latest/xnd/index.html) is a container
>> library
>> > in C that has Python bindings. I would like to get a pointer to the
>> > allocated memory after creating or get an object in Plasma. I see that
>> this
>> > is supported in the C++ API (
>> >
>> https://arrow.apache.org/docs/cpp/classplasma_1_1_plasma_client.html#ac18ab9cc792c620a97a3dcb165e0ecd7
>> )
>> > but not in the python API (as far as I can tell). Is it possible to use
>> the
>> > C++ Plasma API from a C project? If not, would it make sense to expose
>> > pointer access on the Python API using capsules
>> > https://docs.python.org/3.6/c-api/capsule.html
>> > ?
>>


RE: Parquet to arrow java converter

2018-03-06 Thread Wenbo Zhao
Thanks Julien and Wes. There is an ongoing PR 
https://github.com/apache/parquet-mr/pull/443 (update Arrow version to 0.8.0) 
which I may be depending on. Should I wait for this? 

Wenbo 

-Original Message-
From: Julien Le Dem [mailto:julien.le...@gmail.com] 
Sent: Tuesday, March 6, 2018 5:27 PM
To: dev@arrow.apache.org
Subject: Re: Parquet to arrow java converter

I would put in the parquet-mr codebase. I have contributed the schéma 
conversion code there. I’m happy to provide feedback on PRs in this area. 

Julien

> On Mar 6, 2018, at 12:18, Wes McKinney  wrote:
> 
> When it had been discussed in the past, the thinking had been to 
> implement it in the Parquet Java codebase. I'd be interested in 
> others' opinions about this (since I'm not an expert on Java matters)
> 
> - Wes
> 
>> On Tue, Mar 6, 2018 at 2:27 PM, Wenbo Zhao  wrote:
>> Hi,
>> 
>> Sorry that if someone may have asked the same question before. We are 
>> interested in providing a java convertor from Parquet to Arrow. Should I 
>> implement this converter in Parquet-mr/Parquet-arrow or under the Arrow 
>> project? I have the feeling that putting the implementation in 
>> Parquet-mr/Parquet-arrow would be preferable 
>> https://www.mail-archive.com/dev@arrow.apache.org/msg02606.html?
>> 
>> Thanks,
>> 
>> Wenbo


[jira] [Created] (ARROW-2283) [C++] Support Arrow C++ installed in /usr detection by pkg-config

2018-03-06 Thread Kouhei Sutou (JIRA)
Kouhei Sutou created ARROW-2283:
---

 Summary: [C++] Support Arrow C++ installed in /usr detection by 
pkg-config
 Key: ARROW-2283
 URL: https://issues.apache.org/jira/browse/ARROW-2283
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Affects Versions: 0.8.0
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou
 Fix For: 0.9.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [VOTE] Accept donation of Arrow Go implementation

2018-03-06 Thread Kouhei Sutou
+1

In 
  "Re: [VOTE] Accept donation of Arrow Go implementation" on Tue, 6 Mar 2018 
15:46:31 -0500,
  Li Jin  wrote:

> +1
> 
> On Tue, Mar 6, 2018 at 3:31 PM, Uwe L. Korn  wrote:
> 
>> +1
>>
>> On Tue, Mar 6, 2018, at 9:28 PM, Jacques Nadeau wrote:
>> > +1
>> >
>> > On Tue, Mar 6, 2018 at 10:57 AM, Wes McKinney 
>> wrote:
>> >
>> > > Dear all,
>> > >
>> > > The Arrow PMC has been in contact with the developers of
>> > >
>> > > https://github.com/influxdata/arrow
>> > >
>> > > which is a native Go implementation of Apache Arrow. We are proposing
>> > > to accept this codebase into the Apache project. If the vote passes,
>> > > the PMC and the authors of the code will work together to complete the
>> > > ASF IP Clearance process (http://incubator.apache.org/ip-clearance/)
>> > > and import the Go implementation for inclusion in a future release:
>> > >
>> > > [ ] +1 : Accept contribution of Go implementation
>> > > [ ]  0 : No opinion
>> > > [ ] -1 : Reject contribution because...
>> > >
>> > > Here is my vote: +1
>> > >
>> > > The vote will be open for at least 72 hours.
>> > >
>> > > Thanks,
>> > > Wes
>> > >
>>


Re: Parquet to arrow java converter

2018-03-06 Thread Julien Le Dem
I would put in the parquet-mr codebase. I have contributed the schéma 
conversion code there. I’m happy to provide feedback on PRs in this area. 

Julien

> On Mar 6, 2018, at 12:18, Wes McKinney  wrote:
> 
> When it had been discussed in the past, the thinking had been to
> implement it in the Parquet Java codebase. I'd be interested in
> others' opinions about this (since I'm not an expert on Java matters)
> 
> - Wes
> 
>> On Tue, Mar 6, 2018 at 2:27 PM, Wenbo Zhao  wrote:
>> Hi,
>> 
>> Sorry that if someone may have asked the same question before. We are 
>> interested in providing a java convertor from Parquet to Arrow. Should I 
>> implement this converter in Parquet-mr/Parquet-arrow or under the Arrow 
>> project? I have the feeling that putting the implementation in 
>> Parquet-mr/Parquet-arrow would be preferable 
>> https://www.mail-archive.com/dev@arrow.apache.org/msg02606.html?
>> 
>> Thanks,
>> 
>> Wenbo


Re: Using Plasma with xnd

2018-03-06 Thread Saul Shanabrook
Hey Wes,

I don't have much experience doing C + Python + Cython development, so I am
probably missing something obvious, but reading the Cython docs,

it
seems like I can only access types marked as public from C code. When I
compile arrow locally, I do get some C++ headers for the plasma code, but I
don't think I can use them from C code either.

Best,
Saul



On Tue, Mar 6, 2018 at 3:12 PM Wes McKinney  wrote:

> hi Saul,
>
> Are you able to use the buffer/memoryview protocol? Instances of
> pyarrow.Buffer, like PlasmaBuffer, support this
>
> https://github.com/apache/arrow/blob/master/python/pyarrow/plasma.pyx#L182
>
> - Wes
>
> On Tue, Mar 6, 2018 at 3:09 PM, Saul Shanabrook 
> wrote:
> > I am trying to use the Plasma store to back xnd objects. Xnd (
> > https://xnd.readthedocs.io/en/latest/xnd/index.html) is a container
> library
> > in C that has Python bindings. I would like to get a pointer to the
> > allocated memory after creating or get an object in Plasma. I see that
> this
> > is supported in the C++ API (
> >
> https://arrow.apache.org/docs/cpp/classplasma_1_1_plasma_client.html#ac18ab9cc792c620a97a3dcb165e0ecd7
> )
> > but not in the python API (as far as I can tell). Is it possible to use
> the
> > C++ Plasma API from a C project? If not, would it make sense to expose
> > pointer access on the Python API using capsules
> > https://docs.python.org/3.6/c-api/capsule.html
> > ?
>


[jira] [Created] (ARROW-2282) [Python] Create StringArray from buffers

2018-03-06 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-2282:
--

 Summary: [Python] Create StringArray from buffers
 Key: ARROW-2282
 URL: https://issues.apache.org/jira/browse/ARROW-2282
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Uwe L. Korn
 Fix For: 0.9.0


While we will add a more general-purpose functionality in 
https://issues.apache.org/jira/browse/ARROW-2281, the interface is more 
complicate then the constructor that explicitly states all arguments:  
{{StringArray(int64_t length, const std::shared_ptr& value_offsets, …}}

Thus I will also expose this explicit constructor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2281) [Python] Expose MakeArray to construct arrays from buffers

2018-03-06 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-2281:
--

 Summary: [Python] Expose MakeArray to construct arrays from buffers
 Key: ARROW-2281
 URL: https://issues.apache.org/jira/browse/ARROW-2281
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Uwe L. Korn
Assignee: Uwe L. Korn
 Fix For: 0.10.0


To create new arrays from existing buffers in Python, we would need to call 
into the C++ {{MakeArray}} method. This would then construct the Array and we 
would only wrap it in Python to have construction support for all Array types.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [Documentation] Incomplete Documentation

2018-03-06 Thread Wes McKinney
hi Alberto,

We are volunteer developers developing new codebases with a quite
large feature surface area. Please feel free to create JIRA issues
pointing out missing API documentation so that members of the
community can submit patches improving it.

There is already a JIRA about concat_tables, for example
(https://issues.apache.org/jira/browse/ARROW-2181). Patches would be
welcome

Thank you,
Wes

On Tue, Mar 6, 2018 at 4:04 PM, ALBERTO Bocchinfuso
 wrote:
> Hi everyone,
>
> I am noting more and more that the API documentation is missing some 
> functions or some fields. I can testify about the python APIs, which are the 
> ones that I am using.
> For example,
>  Batch.num_rows
>  Batch.num_columns
>  Batch.schema
>  pyarrow.concat_tables()
>
> Are missing from the APIs, and I do thing that this can be a problem since I 
> think that some developers (I can speak surely for myself.) are not reading 
> arrow source code, but rely essentially on the documentation.
> Even though there are traces of the existence of these functions in various 
> other sections of the documentation, I think it is important to keep the 
> references here up-to date.
> I’d like to pinpoint this since I think that docs are really important to 
> help the developers and boost the diffusion of software.
>
> Thank you,
> Alberto
>


[Documentation] Incomplete Documentation

2018-03-06 Thread ALBERTO Bocchinfuso
Hi everyone,

I am noting more and more that the API documentation is missing some functions 
or some fields. I can testify about the python APIs, which are the ones that I 
am using.
For example,
 Batch.num_rows
 Batch.num_columns
 Batch.schema
 pyarrow.concat_tables()

Are missing from the APIs, and I do thing that this can be a problem since I 
think that some developers (I can speak surely for myself.) are not reading 
arrow source code, but rely essentially on the documentation.
Even though there are traces of the existence of these functions in various 
other sections of the documentation, I think it is important to keep the 
references here up-to date.
I’d like to pinpoint this since I think that docs are really important to help 
the developers and boost the diffusion of software.

Thank you,
Alberto



Re: [VOTE] Accept donation of Arrow Go implementation

2018-03-06 Thread Li Jin
+1

On Tue, Mar 6, 2018 at 3:31 PM, Uwe L. Korn  wrote:

> +1
>
> On Tue, Mar 6, 2018, at 9:28 PM, Jacques Nadeau wrote:
> > +1
> >
> > On Tue, Mar 6, 2018 at 10:57 AM, Wes McKinney 
> wrote:
> >
> > > Dear all,
> > >
> > > The Arrow PMC has been in contact with the developers of
> > >
> > > https://github.com/influxdata/arrow
> > >
> > > which is a native Go implementation of Apache Arrow. We are proposing
> > > to accept this codebase into the Apache project. If the vote passes,
> > > the PMC and the authors of the code will work together to complete the
> > > ASF IP Clearance process (http://incubator.apache.org/ip-clearance/)
> > > and import the Go implementation for inclusion in a future release:
> > >
> > > [ ] +1 : Accept contribution of Go implementation
> > > [ ]  0 : No opinion
> > > [ ] -1 : Reject contribution because...
> > >
> > > Here is my vote: +1
> > >
> > > The vote will be open for at least 72 hours.
> > >
> > > Thanks,
> > > Wes
> > >
>


Re: [VOTE] Accept donation of Arrow Go implementation

2018-03-06 Thread Uwe L. Korn
+1

On Tue, Mar 6, 2018, at 9:28 PM, Jacques Nadeau wrote:
> +1
> 
> On Tue, Mar 6, 2018 at 10:57 AM, Wes McKinney  wrote:
> 
> > Dear all,
> >
> > The Arrow PMC has been in contact with the developers of
> >
> > https://github.com/influxdata/arrow
> >
> > which is a native Go implementation of Apache Arrow. We are proposing
> > to accept this codebase into the Apache project. If the vote passes,
> > the PMC and the authors of the code will work together to complete the
> > ASF IP Clearance process (http://incubator.apache.org/ip-clearance/)
> > and import the Go implementation for inclusion in a future release:
> >
> > [ ] +1 : Accept contribution of Go implementation
> > [ ]  0 : No opinion
> > [ ] -1 : Reject contribution because...
> >
> > Here is my vote: +1
> >
> > The vote will be open for at least 72 hours.
> >
> > Thanks,
> > Wes
> >


Re: [VOTE] Accept donation of Arrow Go implementation

2018-03-06 Thread Jacques Nadeau
+1

On Tue, Mar 6, 2018 at 10:57 AM, Wes McKinney  wrote:

> Dear all,
>
> The Arrow PMC has been in contact with the developers of
>
> https://github.com/influxdata/arrow
>
> which is a native Go implementation of Apache Arrow. We are proposing
> to accept this codebase into the Apache project. If the vote passes,
> the PMC and the authors of the code will work together to complete the
> ASF IP Clearance process (http://incubator.apache.org/ip-clearance/)
> and import the Go implementation for inclusion in a future release:
>
> [ ] +1 : Accept contribution of Go implementation
> [ ]  0 : No opinion
> [ ] -1 : Reject contribution because...
>
> Here is my vote: +1
>
> The vote will be open for at least 72 hours.
>
> Thanks,
> Wes
>


Re: Parquet to arrow java converter

2018-03-06 Thread Wes McKinney
When it had been discussed in the past, the thinking had been to
implement it in the Parquet Java codebase. I'd be interested in
others' opinions about this (since I'm not an expert on Java matters)

- Wes

On Tue, Mar 6, 2018 at 2:27 PM, Wenbo Zhao  wrote:
> Hi,
>
> Sorry that if someone may have asked the same question before. We are 
> interested in providing a java convertor from Parquet to Arrow. Should I 
> implement this converter in Parquet-mr/Parquet-arrow or under the Arrow 
> project? I have the feeling that putting the implementation in 
> Parquet-mr/Parquet-arrow would be preferable 
> https://www.mail-archive.com/dev@arrow.apache.org/msg02606.html?
>
> Thanks,
>
> Wenbo


Re: Parquet to arrow java converter

2018-03-06 Thread Li Jin
This definitely sounds like a useful tool. It seems like Julien started
some of work in Parquet-arrow a while back.

Julien, I am wondering what's your thoughts on whether such code should
live in parquet-mr or arrow codebase?

On Tue, Mar 6, 2018 at 2:27 PM, Wenbo Zhao  wrote:

> Hi,
>
> Sorry that if someone may have asked the same question before. We are
> interested in providing a java convertor from Parquet to Arrow. Should I
> implement this converter in Parquet-mr/Parquet-arrow or under the Arrow
> project? I have the feeling that putting the implementation in
> Parquet-mr/Parquet-arrow would be preferable https://www.mail-archive.com/
> dev@arrow.apache.org/msg02606.html?
>
> Thanks,
>
> Wenbo
>


Re: Using Plasma with xnd

2018-03-06 Thread Wes McKinney
hi Saul,

Are you able to use the buffer/memoryview protocol? Instances of
pyarrow.Buffer, like PlasmaBuffer, support this

https://github.com/apache/arrow/blob/master/python/pyarrow/plasma.pyx#L182

- Wes

On Tue, Mar 6, 2018 at 3:09 PM, Saul Shanabrook  wrote:
> I am trying to use the Plasma store to back xnd objects. Xnd (
> https://xnd.readthedocs.io/en/latest/xnd/index.html) is a container library
> in C that has Python bindings. I would like to get a pointer to the
> allocated memory after creating or get an object in Plasma. I see that this
> is supported in the C++ API (
> https://arrow.apache.org/docs/cpp/classplasma_1_1_plasma_client.html#ac18ab9cc792c620a97a3dcb165e0ecd7)
> but not in the python API (as far as I can tell). Is it possible to use the
> C++ Plasma API from a C project? If not, would it make sense to expose
> pointer access on the Python API using capsules
> https://docs.python.org/3.6/c-api/capsule.html
> ?


Using Plasma with xnd

2018-03-06 Thread Saul Shanabrook
I am trying to use the Plasma store to back xnd objects. Xnd (
https://xnd.readthedocs.io/en/latest/xnd/index.html) is a container library
in C that has Python bindings. I would like to get a pointer to the
allocated memory after creating or get an object in Plasma. I see that this
is supported in the C++ API (
https://arrow.apache.org/docs/cpp/classplasma_1_1_plasma_client.html#ac18ab9cc792c620a97a3dcb165e0ecd7)
but not in the python API (as far as I can tell). Is it possible to use the
C++ Plasma API from a C project? If not, would it make sense to expose
pointer access on the Python API using capsules
https://docs.python.org/3.6/c-api/capsule.html
?


[jira] [Created] (ARROW-2280) [Python] pyarrow.Array.buffers should also include the offsets

2018-03-06 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-2280:
--

 Summary: [Python] pyarrow.Array.buffers should also include the 
offsets
 Key: ARROW-2280
 URL: https://issues.apache.org/jira/browse/ARROW-2280
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Reporter: Uwe L. Korn
 Fix For: 0.9.0


Currently we only return the buffers but they don't make sense without the 
offsets for them, esp. the validity bitmap will have a non-zero offset in most 
cases where the array was sliced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2279) [Python] Better error message if lib cannot be found

2018-03-06 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-2279:
--

 Summary: [Python] Better error message if lib cannot be found
 Key: ARROW-2279
 URL: https://issues.apache.org/jira/browse/ARROW-2279
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Uwe L. Korn
 Fix For: 0.9.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[VOTE] Accept donation of Arrow Go implementation

2018-03-06 Thread Wes McKinney
Dear all,

The Arrow PMC has been in contact with the developers of

https://github.com/influxdata/arrow

which is a native Go implementation of Apache Arrow. We are proposing
to accept this codebase into the Apache project. If the vote passes,
the PMC and the authors of the code will work together to complete the
ASF IP Clearance process (http://incubator.apache.org/ip-clearance/)
and import the Go implementation for inclusion in a future release:

[ ] +1 : Accept contribution of Go implementation
[ ]  0 : No opinion
[ ] -1 : Reject contribution because...

Here is my vote: +1

The vote will be open for at least 72 hours.

Thanks,
Wes


[jira] [Created] (ARROW-2278) [Python] deserializing Numpy struct arrays raises

2018-03-06 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2278:
-

 Summary: [Python] deserializing Numpy struct arrays raises
 Key: ARROW-2278
 URL: https://issues.apache.org/jira/browse/ARROW-2278
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.8.0
Reporter: Antoine Pitrou


{code:python}
>>> import numpy as np
>>> dt = np.dtype([('x', np.int8), ('y', np.float32)])
>>> arr = np.arange(5*10, dtype=np.int8).view(dt)
>>> pa.deserialize(pa.serialize(arr).to_buffer())
Traceback (most recent call last):
  File "", line 1, in 
pa.deserialize(pa.serialize(arr).to_buffer())
  File "serialization.pxi", line 441, in pyarrow.lib.deserialize
  File "serialization.pxi", line 404, in pyarrow.lib.deserialize_from
  File "serialization.pxi", line 257, in 
pyarrow.lib.SerializedPyObject.deserialize
  File "serialization.pxi", line 174, in 
pyarrow.lib.SerializationContext._deserialize_callback
  File "/home/antoine/arrow/python/pyarrow/serialization.py", line 44, in 
_deserialize_numpy_array_list
return np.array(data[0], dtype=np.dtype(data[1]))
TypeError: a bytes-like object is required, not 'int'
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2277) [Python] Tensor.from_numpy doesn't support struct arrays

2018-03-06 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2277:
-

 Summary: [Python] Tensor.from_numpy doesn't support struct arrays
 Key: ARROW-2277
 URL: https://issues.apache.org/jira/browse/ARROW-2277
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Affects Versions: 0.8.0
Reporter: Antoine Pitrou


{code:python}
>>> dt = np.dtype([('x', np.int8), ('y', np.float32)])
>>> dt.itemsize
5
>>> arr = np.arange(5*10, dtype=np.int8).view(dt)
>>> pa.Tensor.from_numpy(arr)
Traceback (most recent call last):
  File "", line 1, in 
pa.Tensor.from_numpy(arr)
  File "array.pxi", line 523, in pyarrow.lib.Tensor.from_numpy
  File "error.pxi", line 85, in pyarrow.lib.check_status
ArrowNotImplementedError: 
/home/antoine/arrow/cpp/src/arrow/python/numpy_convert.cc:250 code: 
GetTensorType(reinterpret_cast(PyArray_DESCR(ndarray)), )
Unsupported numpy type 20

{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2276) [Python] Tensor could implement the buffer protocol

2018-03-06 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2276:
-

 Summary: [Python] Tensor could implement the buffer protocol
 Key: ARROW-2276
 URL: https://issues.apache.org/jira/browse/ARROW-2276
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Affects Versions: 0.8.0
Reporter: Antoine Pitrou


Tensors have an underlying buffer, a data type, shape and strides. It seems 
like they could implement the Python buffer protocol.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2275) [C++] Buffer::mutable_data_ member uninitialized

2018-03-06 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2275:
-

 Summary: [C++] Buffer::mutable_data_ member uninitialized
 Key: ARROW-2275
 URL: https://issues.apache.org/jira/browse/ARROW-2275
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Affects Versions: 0.8.0
Reporter: Antoine Pitrou


For immutable buffers (i.e. most of them), the {{mutable_data_}} member is 
uninitialized. If the user calls {{mutable_data()}} by mistake on such a 
buffer, they will get a bogus pointer back.

This is exacerbated by the Tensor API whose const and non-const {{raw_data()}} 
methods return different things...

(also an idea: add a DCHECK for mutability before returning from 
{{mutable_data()}}?)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2273) Cannot deserialize pandas SparseDataFrame

2018-03-06 Thread Mitar (JIRA)
Mitar created ARROW-2273:


 Summary: Cannot deserialize pandas SparseDataFrame
 Key: ARROW-2273
 URL: https://issues.apache.org/jira/browse/ARROW-2273
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.9.0
Reporter: Mitar


>>> import pyarrow
>>> import pandas
>>> a = pandas.SparseDataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})
>>> pyarrow.deserialize(pyarrow.serialize(a).to_buffer())
Traceback (most recent call last):
  File "", line 1, in 
  File "serialization.pxi", line 441, in pyarrow.lib.deserialize
  File "serialization.pxi", line 404, in pyarrow.lib.deserialize_from
  File "serialization.pxi", line 257, in 
pyarrow.lib.SerializedPyObject.deserialize
  File "serialization.pxi", line 174, in 
pyarrow.lib.SerializationContext._deserialize_callback
  File 
".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/serialization.py", 
line 77, in _deserialize_pandas_dataframe
return pdcompat.serialized_dict_to_dataframe(data)
  File 
".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
line 450, in serialized_dict_to_dataframe
for block in data['blocks']]
  File 
".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
line 450, in 
for block in data['blocks']]
  File 
".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
line 478, in _reconstruct_block
block = _int.make_block(block_arr, placement=placement)
  File 
".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", 
line 2957, in make_block
return klass(values, ndim=ndim, fastpath=fastpath, placement=placement)
  File 
".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", 
line 120, in __init__
len(self.mgr_locs)))
ValueError: Wrong number of items passed 3, placement implies 1




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2272) [Python] test_plasma spams /tmp

2018-03-06 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2272:
-

 Summary: [Python] test_plasma spams /tmp
 Key: ARROW-2272
 URL: https://issues.apache.org/jira/browse/ARROW-2272
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.8.0
Reporter: Antoine Pitrou


{{test_plasma}} creates a new socket in {{/tmp}} for each test and never cleans 
up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2271) [Python] test_plasma could make errors more diagnosable

2018-03-06 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2271:
-

 Summary: [Python] test_plasma could make errors more diagnosable
 Key: ARROW-2271
 URL: https://issues.apache.org/jira/browse/ARROW-2271
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Antoine Pitrou


Currently, when {{plasma_store}} fails for a reason or another, you get poorly 
readable errors from {{test_plasma.py}}. Displaying the child process' stderr 
would help.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2270) [Python] ForeignBuffer doesn't tie Python object lifetime to C++ buffer lifetime

2018-03-06 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2270:
-

 Summary: [Python] ForeignBuffer doesn't tie Python object lifetime 
to C++ buffer lifetime
 Key: ARROW-2270
 URL: https://issues.apache.org/jira/browse/ARROW-2270
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou


{{ForeignBuffer}} keeps the reference to the Python base object in the Python 
wrapper class, not in the C++ buffer instance, meaning if the C++ buffer gets 
passed around but the Python wrapper gets destroyed, the reference to the 
original Python base object will be released.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2269) Cannot build bdist_wheel for Python

2018-03-06 Thread Mitar (JIRA)
Mitar created ARROW-2269:


 Summary: Cannot build bdist_wheel for Python
 Key: ARROW-2269
 URL: https://issues.apache.org/jira/browse/ARROW-2269
 Project: Apache Arrow
  Issue Type: Bug
  Components: Packaging
Affects Versions: 0.9.0
Reporter: Mitar


I am trying current master.

I ran:

{{python setup.py build_ext --build-type=$ARROW_BUILD_TYPE --with-parquet 
--with-plasma --bundle-arrow-cpp bdist_wheel }}

Output:

{{running build_ext creating build creating build/temp.linux-x86_64-3.6 -- 
Runnning cmake for pyarrow cmake 
-DPYTHON_EXECUTABLE=.../Temp/arrow/pyarrow/bin/python 
-DPYARROW_BUILD_PARQUET=on -DPYARROW_BOOST_USE_SHARED=on 
-DPYARROW_BUILD_PLASMA=on -DPYARROW_BUNDLE_ARROW_CPP=ON 
-DCMAKE_BUILD_TYPE=release .../Temp/arrow/arrow/python -- The C compiler 
identification is GNU 7.2.0 -- The CXX compiler identification is GNU 7.2.0 -- 
Check for working C compiler: /usr/bin/cc -- Check for working C compiler: 
/usr/bin/cc -- works -- Detecting C compiler ABI info -- Detecting C compiler 
ABI info - done -- Detecting C compile features -- Detecting C compile features 
- done -- Check for working CXX compiler: /usr/bin/c++ -- Check for working CXX 
compiler: /usr/bin/c++ -- works -- Detecting CXX compiler ABI info -- Detecting 
CXX compiler ABI info - done -- Detecting CXX compile features -- Detecting CXX 
compile features - done INFOCompiler command: /usr/bin/c++ INFOCompiler 
version: Using built-in specs. COLLECT_GCC=/usr/bin/c++ 
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/7/lto-wrapper 
OFFLOAD_TARGET_NAMES=nvptx-none OFFLOAD_TARGET_DEFAULT=1 Target: 
x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Ubuntu 
7.2.0-8ubuntu3.2' --with-bugurl=file:///usr/share/doc/gcc-7/README.Bugs 
--enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --prefix=/usr 
--with-gcc-major-version-only --program-suffix=-7 
--program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id 
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix 
--libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu 
--enable-libstdcxx-debug --enable-libstdcxx-time=yes 
--with-default-libstdcxx-abi=new --enable-gnu-unique-object 
--disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie 
--with-system-zlib --with-target-system-zlib --enable-objc-gc=auto 
--enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 
--with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic 
--enable-offload-targets=nvptx-none --without-cuda-driver 
--enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu 
--target=x86_64-linux-gnu Thread model: posix gcc version 7.2.0 (Ubuntu 
7.2.0-8ubuntu3.2) INFOCompiler id: GNU Selected compiler gcc 7.2.0 -- 
Performing Test CXX_SUPPORTS_SSE3 -- Performing Test CXX_SUPPORTS_SSE3 - 
Success -- Performing Test CXX_SUPPORTS_ALTIVEC -- Performing Test 
CXX_SUPPORTS_ALTIVEC - Failed Configured for RELEASE build (set with cmake 
-DCMAKE_BUILD_TYPE=\{release,debug,...}) -- Build Type: RELEASE -- Build output 
directory: .../Temp/arrow/arrow/python/build/temp.linux-x86_64-3.6/release/ -- 
Found PythonInterp: .../Temp/arrow/pyarrow/bin/python (found version "3.6.3") 
-- Searching for Python libs in 
.../Temp/arrow/pyarrow/lib64;.../Temp/arrow/pyarrow/lib;/usr/lib/python3.6/config-3.6m-x86_64-linux-gnu
 -- Looking for python3.6m -- Found Python lib 
/usr/lib/python3.6/config-3.6m-x86_64-linux-gnu/libpython3.6m.so -- Found 
PythonLibs: /usr/lib/python3.6/config-3.6m-x86_64-linux-gnu/libpython3.6m.so -- 
Found NumPy: version "1.14.1" 
.../Temp/arrow/pyarrow/lib/python3.6/site-packages/numpy/core/include -- 
Searching for Python libs in 
.../Temp/arrow/pyarrow/lib64;.../Temp/arrow/pyarrow/lib;/usr/lib/python3.6/config-3.6m-x86_64-linux-gnu
 -- Looking for python3.6m -- Found Python lib 
/usr/lib/python3.6/config-3.6m-x86_64-linux-gnu/libpython3.6m.so -- Found 
PkgConfig: /usr/bin/pkg-config (found version "0.29.1") -- Checking for module 
'arrow' -- Found arrow, version 0.9.0-SNAPSHOT -- Arrow ABI version: 0.0.0 -- 
Arrow SO version: 0 -- Found the Arrow core library: 
.../Temp/arrow/dist/lib/libarrow.so -- Found the Arrow Python library: 
.../Temp/arrow/dist/lib/libarrow_python.so -- Boost version: 1.63.0 -- Found 
the following Boost libraries: -- system -- filesystem -- regex Added shared 
library dependency arrow: .../Temp/arrow/dist/lib/libarrow.so Added shared 
library dependency arrow_python: .../Temp/arrow/dist/lib/libarrow_python.so -- 
Found the Parquet library: .../Temp/arrow/dist/lib/libparquet.so Added shared 
library dependency parquet: .../Temp/arrow/dist/lib/libparquet.so -- Checking 
for module 'plasma' -- Found plasma, version -- Plasma ABI version: 0.0.0 -- 
Plasma SO version: 0 -- Found the Plasma core library: 
.../Temp/arrow/dist/lib/libplasma.so -- Found