[jira] [Created] (ARROW-2155) [Python] pa.frombuffer(bytearray) returns immutable Buffer

2018-02-14 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2155:
-

 Summary: [Python] pa.frombuffer(bytearray) returns immutable Buffer
 Key: ARROW-2155
 URL: https://issues.apache.org/jira/browse/ARROW-2155
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Affects Versions: 0.8.0
Reporter: Antoine Pitrou


I'd expect it to return a mutable buffer:
{code:python}
>>> pa.frombuffer(bytearray(10)).is_mutable
False
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


R: Merge multiple record batches

2018-02-14 Thread ALBERTO Bocchinfuso
Hi,
I don’t think I understood perfectly your point, but I try to give you the 
answer that looks the simplest to me.
In your code there isn’t any operation on table 1 and 2 separately, it just 
looks like you want to merge all those RecordBatches.
Now I think that:

  1.  you can use the to_batches() operation reported in the API for Table, but 
I never tried it myself. In this way you create 2 tables, create batches from 
these tables, put the batches togheter.
  2.  I would rather store ALL the BATCHES in the two streams in the SAME 
python LIST, and then create an unique table using from_batches() as you 
suggested. That’s because in your code you create two tables even though you 
don’t seem to care about them.

I didn’t try, but I think that you can go both ways and then tell us if the 
result is the same and if one of the two is faster then the other.

Alberto

Da: Rares Vernica
Inviato: mercoledì 14 febbraio 2018 05:13
A: dev@arrow.apache.org
Oggetto: Merge multiple record batches

Hi,

If I have multiple RecordBatchStreamReader inputs, what is the recommended
way to get all the RecordBatch from all the inputs together, maybe in a
Table? They all have the same schema. The source for the readers are
different files.

So, I do something like:

reader1 = pa.open_stream('foo')
table1 = reader1.read_all()

reader2 = pa.open_stream('bar')
table2 = reader2.read_all()

# table_all = ???
# OR maybe I don't need to create table1 and table2
# table_all = pa.Table.from_batches( ??? )

Thanks!
Rares



[jira] [Created] (ARROW-2154) [Python] __eq__ unimplemented on Buffer

2018-02-14 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2154:
-

 Summary: [Python] __eq__ unimplemented on Buffer
 Key: ARROW-2154
 URL: https://issues.apache.org/jira/browse/ARROW-2154
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Affects Versions: 0.8.0
Reporter: Antoine Pitrou


Having to call {{equals()}} is un-Pythonic:
{code:python}
>>> pa.frombuffer(b'foo') == pa.frombuffer(b'foo')
False
>>> pa.frombuffer(b'foo').equals(pa.frombuffer(b'foo'))
True
{code}

Same for many other pyarrow types, incidently.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Arrow for MATLAB?

2018-02-14 Thread Joris Peeters
Yeah, pinging Mathworks might be worth it. It feels like this is something
that could be of great value to them. Aside from HDF5 & mat files (both on
disk), it can be really tedious to efficiently share data from anywhere
with MATLAB, so it's becoming increasingly isolated.
I'm going to play around for a few days with Arrow & the Matlab C++ API,
getting a bit more familiar with it and maybe hacking a small prototype
together. Currently it's just out-of-hours, though, so don't expect major
magic. :)

Thanks,
-J

On Tue, Feb 13, 2018 at 2:33 PM, Phillip Cloud  wrote:

> The MathWorks is in the process of starting to contribute. I spoke with
> them a couple weeks ago about this and they were excited about it. I can
> ping them to see if they are still interested.
>
> On Tue, Feb 13, 2018, 09:24 Uwe L. Korn  wrote:
>
> > Hello Joris,
> >
> > this is only due to lack of someone doing it and probably due to lack of
> > people that have the experience to do that. I had a short look at
> Matlab's
> > C++ API and the interfaces seem to be promising enough
> > https://de.mathworks.com/help/matlab/matlab-data-array.html that once
> > someone attempts it, it should not be hard to build.
> >
> > If you want to try to take a shot, we are happy to help if there are
> > problems with the Arrow side of things.
> >
> > Uwe
> >
> > On Tue, Feb 13, 2018, at 2:41 PM, Joris Peeters wrote:
> > > Hello,
> > >
> > > Is anyone aware of plans (or concrete projects) to add MATLAB bindings
> > for
> > > Arrow? I'm interested in exchanging data between Java, Python, ..., and
> > > MATLAB - and Arrow sounds like a great solution.
> > >
> > > I couldn't find any pre-existing effort, though, so curious if that is
> > due
> > > to a lack of interest or because there might be underlying reasons that
> > > would make this very hard to achieve.
> > >
> > > Best,
> > > -Joris.
> >
>


[ANNOUNCE] New Arrow committers

2018-02-14 Thread Wes McKinney
On behalf of the Arrow PMC, I'm pleased to announce that Brian Hulette
(@TheNeuralBit) and Robert Nishihara (@robertnishihara) are now Arrow
committers. Thank you for all your contributions!

Welcome, and congrats!

- Wes


[jira] [Created] (ARROW-2156) [CI] Isolate Sphinx dependencies

2018-02-14 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2156:
-

 Summary: [CI] Isolate Sphinx dependencies
 Key: ARROW-2156
 URL: https://issues.apache.org/jira/browse/ARROW-2156
 Project: Apache Arrow
  Issue Type: Task
  Components: Continuous Integration
Affects Versions: 0.8.0
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou


In the Travis Python test script, we always install the documentation 
dependencies. We should only install them when building the docs, since they 
are not trivial and may take time fetching.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2157) Decimal arrays cannot be constructed from Python lists

2018-02-14 Thread Phillip Cloud (JIRA)
Phillip Cloud created ARROW-2157:


 Summary: Decimal arrays cannot be constructed from Python lists
 Key: ARROW-2157
 URL: https://issues.apache.org/jira/browse/ARROW-2157
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.8.0
Reporter: Phillip Cloud
Assignee: Phillip Cloud
 Fix For: 0.9.0


{code}
In [14]: pa.array([Decimal('1')])
---
ArrowInvalid  Traceback (most recent call last)
 in ()
> 1 pa.array([Decimal('1')])

array.pxi in pyarrow.lib.array()

array.pxi in pyarrow.lib._sequence_to_array()

error.pxi in pyarrow.lib.check_status()

ArrowInvalid: Error inferring Arrow data type for collection of Python objects. 
Got Python object of type Decimal but can only handle these types: bool, float, 
integer, date, datetime, bytes, unicode
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [ANNOUNCE] New Arrow committers

2018-02-14 Thread Philipp Moritz
Congrats to the new committers!

On Wed, Feb 14, 2018 at 9:07 AM, Robert Nishihara  wrote:

> Thanks a lot Wes!
> On Wed, Feb 14, 2018 at 7:28 AM Wes McKinney  wrote:
>
> > On behalf of the Arrow PMC, I'm pleased to announce that Brian Hulette
> > (@TheNeuralBit) and Robert Nishihara (@robertnishihara) are now Arrow
> > committers. Thank you for all your contributions!
> >
> > Welcome, and congrats!
> >
> > - Wes
> >
>


Re: [ANNOUNCE] New Arrow committers

2018-02-14 Thread Li Jin
Congrats!

On Wed, Feb 14, 2018 at 12:14 PM, Philipp Moritz  wrote:

> Congrats to the new committers!
>
> On Wed, Feb 14, 2018 at 9:07 AM, Robert Nishihara <
> robertnishih...@gmail.com
> > wrote:
>
> > Thanks a lot Wes!
> > On Wed, Feb 14, 2018 at 7:28 AM Wes McKinney 
> wrote:
> >
> > > On behalf of the Arrow PMC, I'm pleased to announce that Brian Hulette
> > > (@TheNeuralBit) and Robert Nishihara (@robertnishihara) are now Arrow
> > > committers. Thank you for all your contributions!
> > >
> > > Welcome, and congrats!
> > >
> > > - Wes
> > >
> >
>


Re: [ANNOUNCE] New Arrow committers

2018-02-14 Thread Robert Nishihara
Thanks a lot Wes!
On Wed, Feb 14, 2018 at 7:28 AM Wes McKinney  wrote:

> On behalf of the Arrow PMC, I'm pleased to announce that Brian Hulette
> (@TheNeuralBit) and Robert Nishihara (@robertnishihara) are now Arrow
> committers. Thank you for all your contributions!
>
> Welcome, and congrats!
>
> - Wes
>


[jira] [Created] (ARROW-2160) decimal precision inference

2018-02-14 Thread Antony Mayi (JIRA)
Antony Mayi created ARROW-2160:
--

 Summary: decimal precision inference
 Key: ARROW-2160
 URL: https://issues.apache.org/jira/browse/ARROW-2160
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, Python
Affects Versions: 0.8.0
Reporter: Antony Mayi


{code}
import pyarrow as pa
import pandas as pd
import decimal

df = pd.DataFrame({'a': [decimal.Decimal('0.1'), decimal.Decimal('0.01')]})
pa.Table.from_pandas(df)
{code}

raises:
{code}
pyarrow.lib.ArrowInvalid: Decimal type with precision 2 does not fit into 
precision inferred from first array element: 1
{code}

Looks arrow is inferring the highest precision for given column based on the 
first cell and expecting the rest fits in. I understand this is by design but 
from the point of view of pandas-arrow compatibility this is quite painful as 
pandas is more flexible (as demonstrated).

What this means is that user trying to pass pandas {{DataFrame}} with 
{{Decimal}} column(s) to arrow {{Table}} would always have to first:
# Find the highest precision used in (each of) that column(s)
# Adjust the first cell of (each of) that column(s) so it has the highest 
precision of that column(s)
# Only then pass such {{DataFrame}} to {{Table.from_pandas()}}

So given this unavoidable procedure (and assuming arrow needs to be strict 
about the highest precision for a column) - shouldn't this logic be part of the 
{{Table.from_pandas()}} directly to make this transparent?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2161) test_cython_api failing for a build_ext --inplace install

2018-02-14 Thread Phillip Cloud (JIRA)
Phillip Cloud created ARROW-2161:


 Summary: test_cython_api failing for a build_ext --inplace install
 Key: ARROW-2161
 URL: https://issues.apache.org/jira/browse/ARROW-2161
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Phillip Cloud


{code}
pytest pyarrow -x --tb=short 
= test session starts 
=
platform linux -- Python 3.6.3, pytest-3.3.1, py-1.5.2, pluggy-0.6.0
rootdir: /home/phillip/Documents/code/cpp/arrow/python, inifile: setup.cfg
collected 580 items

pyarrow/tests/test_array.py 
... 
[ 10%]
pyarrow/tests/test_convert_builtin.py 

  [ 24%]
pyarrow/tests/test_convert_pandas.py 
...x...s..
 [ 38%]
.   
[ 41%]
pyarrow/tests/test_cython.py F

== FAILURES 
===
___ test_cython_api 
___
pyarrow/tests/test_cython.py:88: in test_cython_api
'build_ext', '--inplace'])
/home/phillip/miniconda3/envs/pyarrow36/lib/python3.6/subprocess.py:291: in 
check_call
raise CalledProcessError(retcode, cmd)
E   subprocess.CalledProcessError: Command 
'['/home/phillip/miniconda3/envs/pyarrow36/bin/python', 'setup.py', 
'build_ext', '--inplace']' returned non-zero exit status 1.
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [ANNOUNCE] New Arrow committers

2018-02-14 Thread Phillip Cloud
Congratulations to everyone, welcome!

On Wed, Feb 14, 2018 at 12:23 PM Li Jin  wrote:

> Congrats!
>
> On Wed, Feb 14, 2018 at 12:14 PM, Philipp Moritz 
> wrote:
>
> > Congrats to the new committers!
> >
> > On Wed, Feb 14, 2018 at 9:07 AM, Robert Nishihara <
> > robertnishih...@gmail.com
> > > wrote:
> >
> > > Thanks a lot Wes!
> > > On Wed, Feb 14, 2018 at 7:28 AM Wes McKinney 
> > wrote:
> > >
> > > > On behalf of the Arrow PMC, I'm pleased to announce that Brian
> Hulette
> > > > (@TheNeuralBit) and Robert Nishihara (@robertnishihara) are now Arrow
> > > > committers. Thank you for all your contributions!
> > > >
> > > > Welcome, and congrats!
> > > >
> > > > - Wes
> > > >
> > >
> >
>


Re: [ANNOUNCE] New Arrow committers

2018-02-14 Thread Brian Hulette

Thanks everyone! Looking forward to growing the project with you all :)

Brian


On 02/14/2018 12:57 PM, Phillip Cloud wrote:

Congratulations to everyone, welcome!

On Wed, Feb 14, 2018 at 12:23 PM Li Jin  wrote:


Congrats!

On Wed, Feb 14, 2018 at 12:14 PM, Philipp Moritz 
wrote:


Congrats to the new committers!

On Wed, Feb 14, 2018 at 9:07 AM, Robert Nishihara <
robertnishih...@gmail.com

wrote:
Thanks a lot Wes!
On Wed, Feb 14, 2018 at 7:28 AM Wes McKinney 

wrote:

On behalf of the Arrow PMC, I'm pleased to announce that Brian

Hulette

(@TheNeuralBit) and Robert Nishihara (@robertnishihara) are now Arrow
committers. Thank you for all your contributions!

Welcome, and congrats!

- Wes





[jira] [Created] (ARROW-2158) [Python] Construction of Decimal array with None or np.nan fails

2018-02-14 Thread Phillip Cloud (JIRA)
Phillip Cloud created ARROW-2158:


 Summary: [Python] Construction of Decimal array with None or 
np.nan fails
 Key: ARROW-2158
 URL: https://issues.apache.org/jira/browse/ARROW-2158
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.8.0
Reporter: Phillip Cloud
Assignee: Phillip Cloud
 Fix For: 0.9.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2159) [JS] Support custom predicates

2018-02-14 Thread Brian Hulette (JIRA)
Brian Hulette created ARROW-2159:


 Summary: [JS] Support custom predicates
 Key: ARROW-2159
 URL: https://issues.apache.org/jira/browse/ARROW-2159
 Project: Apache Arrow
  Issue Type: New Feature
  Components: JavaScript
Reporter: Brian Hulette
Assignee: Brian Hulette


Right now the `DataFrame` interface only supports a pretty basic set of 
operations, which could be limiting to users. We should add the ability for the 
user to define their own predicates using callback functions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2162) [Python/C++] Decimal Values with too-high precision are multiplied by 100

2018-02-14 Thread Phillip Cloud (JIRA)
Phillip Cloud created ARROW-2162:


 Summary: [Python/C++] Decimal Values with too-high precision are 
multiplied by 100
 Key: ARROW-2162
 URL: https://issues.apache.org/jira/browse/ARROW-2162
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Python
Affects Versions: 0.8.0
Reporter: Phillip Cloud
Assignee: Phillip Cloud
 Fix For: 0.9.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)