Re: Symbol not found: _PyCObject_Type (MacOS El Capitan, Python 3.6)

2018-05-15 Thread Wes McKinney
hi Quang -- I recommend clearing out your CMake temporary files after
making any conda environment changes. If you activate a different
conda environment, CMake will not know to recompute variables related
to Python's header files and libraries. So it might have been that you
invoked CMake with Python 2 activated and later activated Python 3

- Wes

On Tue, May 15, 2018 at 5:15 AM, Quang Vu  wrote:
> Yes Antoine, that happens when compiling Arrow under an activated conda
> environment.
> Thank you for all the info you are helping me with!
>
> Quang.
>
> On Mon, May 14, 2018 at 3:34 PM Antoine Pitrou  wrote:
>
>>
>> To give a bit more insight: you should compile Arrow with your conda
>> environment activated, so that it picks the right Python version (3.6.5,
>> in your case).  If it's still picking the wrong Python version, that
>> might be a bug.
>>
>> Regards
>>
>> Antoine.
>>
>>
>> Le 14/05/2018 à 20:50, Quang Vu a écrit :
>> > Thanks Antoine,
>> >
>> > I will need to learn more about the compiling process that happens on my
>> > Mac, to see how that link to Python 2.
>> >  I am not familiar with that process. But this is a good pointer for my
>> > issue. Thank you for your response to my issue!
>> >
>> > Quang.
>> >
>> > On Mon, May 14, 2018 at 12:50 PM Antoine Pitrou 
>> wrote:
>> >
>> >>
>> >> Hi Quang,
>> >>
>> >> It sounds like you have compiled Arrow against a Python 2 install but
>> >> are now trying to use it with Python 3.  This won't work, the same
>> >> Python version must be used when compiling and when using PyArrow.
>> >>
>> >> ("PyCObject" is a Python 2-specific API that doesn't exist anymore in
>> >> Python 3)
>> >>
>> >> Regards
>> >>
>> >> Antoine.
>> >>
>> >>
>> >> Le 14/05/2018 à 18:34, Quang Vu a écrit :
>> >>> Hi Arrow dev,
>> >>>
>> >>> I am having trouble with installing and setting my development
>> >> environment
>> >>> for Arrow. I wonder if anyone is familiar with the issue. My system
>> info:
>> >>> - MacOS 10.11.6 (El Capitan)
>> >>> - conda 4.5.1
>> >>> - python 3.6.5
>> >>> - arrow's current commit: 4b8511
>> >>>
>> >>> Installing Arrow C++ libraries and Pacquet are both successful. But
>> >>> importing `pyarrow` fail:
>> >>>
>> >>> $ python -c 'import pyarrow'
>> >>>
>> >>> Traceback (most recent call last):
>> >>>   File "", line 1, in 
>> >>>   File "/Users/myuser/code/arrow/python/pyarrow/__init__.py", line 47,
>> in
>> >>> 
>> >>> from pyarrow.lib import cpu_count, set_cpu_count
>> >>> ImportError: dlopen(/Users/myuser/code/arrow/python/pyarrow/
>> >>> lib.cpython-36m-darwin.so, 2): Symbol not found: _PyCObject_Type
>> >>>   Referenced from:
>> >>> /Users/myuser/miniconda3/envs/pyarrow-test/lib/libarrow_python.10.dylib
>> >>>   Expected in: flat namespace
>> >>>  in
>> >> /Users/myuser/miniconda3/envs/pyarrow-test/lib/libarrow_python.10.dylib
>> >>>
>> >>> If anyone have suggestion on what the problem is about, please let me
>> >> know.
>> >>> Thanks!
>> >>>
>> >>
>> >
>>


Re: [VOTE] Accept donation of Arrow Ruby bindings

2018-05-15 Thread P. Taylor Goetz
+1

I’ve been through IP clearance a few times, and can help if needed.

-Taylor

> On May 11, 2018, at 6:47 PM, Wes McKinney  wrote:
> 
> Dear all,
> 
> Arrow PMC member Kouhei Sutou has developed Ruby bindings to the GLib
> C interface for Apache Arrow
> 
> * https://github.com/red-data-tools/red-arrow
> * https://github.com/red-data-tools/red-arrow-gpu
> 
> He is proposing to pull these projects into Apache Arrow to develop
> them all in the same place
> 
> https://github.com/apache/arrow/pull/1990
> 
> We are proposing to accept this code into the Apache project. If the
> vote passes, the PMC and Kou will work together to complete the ASF IP
> Clearance process (http://incubator.apache.org/ip-clearance/) and
> import the Ruby bindings for inclusion in a future release:
> 
>[ ] +1 : Accept contribution of Ruby bindings
>[ ]  0 : No opinion
>[ ] -1 : Reject contribution because...
> 
> Here is my vote: +1
> 
> The vote will be open for at least 72 hours.
> 
> Thanks,
> Wes


Re: [VOTE] Accept donation of Arrow Ruby bindings

2018-05-15 Thread Jacques Nadeau
+1. Thanks

On Sun, May 13, 2018 at 10:48 AM, Uwe L. Korn  wrote:

> +1, thanks for the code donation and building the Ruby bindings.
>
> Uwe
>
> On Sat, May 12, 2018, at 8:53 AM, Kouhei Sutou wrote:
> > Hi,
> >
> > Thanks for starting the vote!
> >
> > +1
> >
> >
> > Thanks,
> > --
> > kou
> >
> > In 
> >   "[VOTE] Accept donation of Arrow Ruby bindings" on Fri, 11 May 2018
> > 18:47:52 -0400,
> >   Wes McKinney  wrote:
> >
> > > Dear all,
> > >
> > > Arrow PMC member Kouhei Sutou has developed Ruby bindings to the GLib
> > > C interface for Apache Arrow
> > >
> > >  * https://github.com/red-data-tools/red-arrow
> > >  * https://github.com/red-data-tools/red-arrow-gpu
> > >
> > > He is proposing to pull these projects into Apache Arrow to develop
> > > them all in the same place
> > >
> > > https://github.com/apache/arrow/pull/1990
> > >
> > > We are proposing to accept this code into the Apache project. If the
> > > vote passes, the PMC and Kou will work together to complete the ASF IP
> > > Clearance process (http://incubator.apache.org/ip-clearance/) and
> > > import the Ruby bindings for inclusion in a future release:
> > >
> > > [ ] +1 : Accept contribution of Ruby bindings
> > > [ ]  0 : No opinion
> > > [ ] -1 : Reject contribution because...
> > >
> > > Here is my vote: +1
> > >
> > > The vote will be open for at least 72 hours.
> > >
> > > Thanks,
> > > Wes
>


Re: [CI] Code coverage reports

2018-05-15 Thread Antoine Pitrou

Hi,

There's now a draft PR that generates and uploads Python / Cython code
coverage.  See example report here:
https://codecov.io/gh/apache/arrow/pull/2050/list/

Regards

Antoine.


On Sat, 12 May 2018 16:18:47 +0200
Antoine Pitrou  wrote:

> Le 12/05/2018 à 00:55, Wes McKinney a écrit :
> > 
> > Thanks for doing this! I am sure our code coverage has suffered as a
> > result of not having the reports. I wonder what it would take to get
> > C++ coverage that includes lines touched by Python unit test execution  
> 
> Nothing, because it already does :-)
> I'm now working on Python / Cython code coverage.
> 
> Regards
> 
> Antoine.
> 



[jira] [Created] (ARROW-2586) Make child builders of ListBuilder and StructBuilder shared_ptr's

2018-05-15 Thread Joshua Storck (JIRA)
Joshua Storck created ARROW-2586:


 Summary: Make child builders of ListBuilder and StructBuilder 
shared_ptr's
 Key: ARROW-2586
 URL: https://issues.apache.org/jira/browse/ARROW-2586
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Joshua Storck


This is needed for changes in this PR that make it possible to deserialize 
arbitrary nested structures in parquet (ARROW-1644): 
https://github.com/apache/parquet-cpp/pull/462 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2585) Add Decimal128::FromBigEndian

2018-05-15 Thread Joshua Storck (JIRA)
Joshua Storck created ARROW-2585:


 Summary: Add Decimal128::FromBigEndian
 Key: ARROW-2585
 URL: https://issues.apache.org/jira/browse/ARROW-2585
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Joshua Storck


This code is being moved from 
https://github.com/apache/parquet-cpp/blob/8046481235e558344c3aa059c83ee86b9f67/src/parquet/arrow/reader.cc#L1049
 for us in this PR: https://github.com/apache/parquet-cpp/pull/462



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2584) [JS] Node v10 issues

2018-05-15 Thread Brian Hulette (JIRA)
Brian Hulette created ARROW-2584:


 Summary: [JS] Node v10 issues
 Key: ARROW-2584
 URL: https://issues.apache.org/jira/browse/ARROW-2584
 Project: Apache Arrow
  Issue Type: Bug
  Components: JavaScript
Reporter: Brian Hulette
Assignee: Paul Taylor


Build and tests fail with node v10. Fix these issues and bump CI to use node v10



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2583) [Rust] Buffer should be typeless

2018-05-15 Thread Andy Grove (JIRA)
Andy Grove created ARROW-2583:
-

 Summary: [Rust] Buffer should be typeless
 Key: ARROW-2583
 URL: https://issues.apache.org/jira/browse/ARROW-2583
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Andy Grove
 Fix For: 0.10.0


See comments in [https://github.com/apache/arrow/pull/1971] for background on 
this but the summary is that Buffer should just deal with untyped memory e.g. 
`* const u8` and all type-handling should be moved to the Array layer e.g. 
`BufferArray`.

This would be more consistent with the other implementations.

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: file-system specification

2018-05-15 Thread Antoine Pitrou

Hi Martin,

On Wed, 9 May 2018 11:28:15 -0400
Martin Durant  wrote:
> I have sketched out a possible start of a python-wide file-system 
> specification
> https://github.com/martindurant/filesystem_spec
> 
> This came about from my work in some other (remote) file-systems 
> implementations for python, particularly in the context of Dask. Since arrow 
> also cares about both local files and, for example, hdfs, I thought that 
> people on this list may have comments and opinions about a possible standard 
> that we ought to converge on. I do not think that my suggestions so far are 
> necessarily right or even good in many cases, but I want to get the 
> conversation going.

Here are some comments:

- API naming: you seem to favour re-using Unix command-line monickers in
  some places, while using more regular verbs or names in other
  places.  I think it should be consistent.  Since the Unix
  command-line doesn't exactly cover the exposed functionality, and
  since Unix tends to favour short cryptic names, I think it's better
  to use Python-like naming (which is also more familiar to non-Unix
  users). For example "move" or "rename" or "replace" instead of "mv",
  etc.

- **kwargs parameters: a couple APIs (`mkdir`, `put`...) allow passing
  arbitrary parameters, which I assume are intended to be
  backend-specific.  It makes it difficult to add other optional
  parameters to those APIs in the future.  So I'd make the
  backend-specific directives a single (optional) dict parameter rather
  than a **kwargs.

- `invalidate_cache` doesn't state whether it invalidates recursively
  or not (recursively sounds better intuitively?).  Also, I think it
  would be more flexible to take a list of paths rather than a single
  path.

- `du`: the effect of the `deep` parameter isn't obvious to me. I don't
  know what it would mean *not* to recurse here: what is the size of a
  directory if you don't recurse into it?

- `glob` may need a formal definition (are trailing slashes
  significant for directory or symlink resolution? this kind of thing),
  though you may want to keep edge cases backend-specific.

- are `head` and `tail` at all useful? They can be easily recreated
  using a generic `open` facility.

- `read_block` tries to do too much in a single API IMHO, and
  using `open` directly is more flexible anyway.

- if `touch` is intended to emulate the Unix API of the same name, the
  docstring should state "Create empty file or update last modification
  timestamp".

- the information dicts returned by several APIs (`ls`, `info`)
  need standardizing, at least for non backend-specific fields.

- if the backend is a networked filesystem with non-trivial latency,
  perhaps the operations would deserve being batched (operate on
  several paths at once), though I will happily defer to your expertise
  on the topic.

Regards

Antoine.


[jira] [Created] (ARROW-2582) [GLib] Add negate functions for Decimal128

2018-05-15 Thread yosuke shiro (JIRA)
yosuke shiro created ARROW-2582:
---

 Summary: [GLib] Add negate functions for Decimal128
 Key: ARROW-2582
 URL: https://issues.apache.org/jira/browse/ARROW-2582
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: yosuke shiro






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)