Thread-safety guarantees of pyarrow Table (and other) objects

2019-09-25 Thread Yevgeni Litvin
Where in the documentation can I find information about thread-safety
guarantee of arrow classes? In particular, is the following usage of
pyarrow.Table showed by the pseudo-code thread-safe?


arrow_table = pa.Table.from_pandas(df)


def other_thread_worker_impl(arrow_table):

arrow_table.column('some_column')[row].as_py()


run_in_parallel(other_thread_worker_impl, arrow_table)


I tried using pandas.DataFrame in the same multi-threaded setup and it
turned out to be unsafe (https://github.com/pandas-dev/pandas/issues/28439).

Thank you.

- Yevgeni


[jira] [Created] (ARROW-6700) [Rust] [DataFusion] Use new parquet arrow reader

2019-09-25 Thread Andy Grove (Jira)
Andy Grove created ARROW-6700:
-

 Summary: [Rust] [DataFusion] Use new parquet arrow reader
 Key: ARROW-6700
 URL: https://issues.apache.org/jira/browse/ARROW-6700
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust, Rust - DataFusion
Reporter: Andy Grove
 Fix For: 1.0.0


Once [https://github.com/apache/arrow/pull/5378] is merged, DataFusion should 
be updated to use this new array reader support instead of the current parquet 
reader code in the DataFusion crate.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Unnesting ListArrays

2019-09-25 Thread Wes McKinney
hi Suhail,

This follows the columnar format closely. The List layout is composed
from a child array providing the "inner" values, which are given the
List interpretation by adding an offsets buffer, and a validity
buffer to distinguish null from 0-length list values. So flatten()
here just returns the child array, which has only 3 values in the
example you gave.

A function could be written to insert "null" for List values that are
null, but someone would have to write it and give it a name =)

- Wes

On Wed, Sep 25, 2019 at 5:15 PM Suhail Razzak  wrote:
>
> Hi,
>
> I'm working through a certain use case where I'm unnesting ListArrays, but
> I noticed something peculiar - null ListValues are not retained in the
> unnested array.
>
> E.g.
> In [0]: arr = pa.array([[0, 1], [0], None, None])
> In [1]: arr.flatten()
> Out [1]: [0, 1, 0]
>
> While I would have expected [0, 1, 0, null, null].
>
> I should note that this works if the None is encapsulated in a list. So I'm
> guessing this is expected logic and if so, what's the reasoning for that?
>
> Thanks,
> Suhail


Unnesting ListArrays

2019-09-25 Thread Suhail Razzak
Hi,

I'm working through a certain use case where I'm unnesting ListArrays, but
I noticed something peculiar - null ListValues are not retained in the
unnested array.

E.g.
In [0]: arr = pa.array([[0, 1], [0], None, None])
In [1]: arr.flatten()
Out [1]: [0, 1, 0]

While I would have expected [0, 1, 0, null, null].

I should note that this works if the None is encapsulated in a list. So I'm
guessing this is expected logic and if so, what's the reasoning for that?

Thanks,
Suhail


Re: Timeline for 0.15.0 release

2019-09-25 Thread Neal Richardson
IMO it's too risky to add something that adds a dependency
(aws-sdk-cpp) on the day of cutting a release.

Neal

On Wed, Sep 25, 2019 at 12:54 PM Krisztián Szűcs
 wrote:
>
> We don't have a comprehensive documentation yet, so let's postpone it.
>
>
> On Wed, Sep 25, 2019 at 9:48 PM Krisztián Szűcs  
> wrote:
>>
>> The S3 python bindings would be a nice addition to the release.
>> I don't think we should block on this but the PR is ready. Opinions?
>> https://github.com/apache/arrow/pull/5423
>>
>>
>>
>>
>> On Wed, Sep 25, 2019 at 5:28 PM Micah Kornfield  
>> wrote:
>>>
>>> OK, I'll start the process today.  I'll send up e-mail updates as I make 
>>> progress.
>>>
>>> On Wed, Sep 25, 2019 at 8:22 AM Wes McKinney  wrote:

 Yes, all systems go as far as I'm concerned.

 On Wed, Sep 25, 2019 at 9:56 AM Neal Richardson
  wrote:
 >
 > Andy's DataFusion issue and Wes's Parquet one have both been merged,
 > and it looks like the LICENSE issue is being resolved as I type. So
 > are we good to go now?
 >
 > Neal
 >
 >
 > On Tue, Sep 24, 2019 at 10:30 PM Andy Grove  
 > wrote:
 > >
 > > I found a last minute issue with DataFusion (Rust) and would 
 > > appreciate it
 > > if we could merge ARROW-6086 (PR is
 > > https://github.com/apache/arrow/pull/5494) before cutting the RC.
 > >
 > > Thanks,
 > >
 > > Andy.
 > >
 > >
 > > On Tue, Sep 24, 2019 at 6:19 PM Micah Kornfield 
 > > wrote:
 > >
 > > > OK, I'm going to postpone cutting a release until tomorrow (hoping 
 > > > we can
 > > > issues resolved by then)..  I'll also try to review the third-party
 > > > additions since 14.x.
 > > >
 > > > On Tue, Sep 24, 2019 at 4:20 PM Wes McKinney  
 > > > wrote:
 > > >
 > > > > I found a licensing issue
 > > > >
 > > > > https://issues.apache.org/jira/browse/ARROW-6679
 > > > >
 > > > > It might be worth examining third party code added to the project
 > > > > since 0.14.x to make sure there are no other such issues.
 > > > >
 > > > > On Tue, Sep 24, 2019 at 6:10 PM Wes McKinney 
 > > > wrote:
 > > > > >
 > > > > > I have diagnosed the problem (Thrift "string" data must be UTF-8,
 > > > > > cannot be arbitrary binary) and am working on a patch right now
 > > > > >
 > > > > > On Tue, Sep 24, 2019 at 6:02 PM Wes McKinney 
 > > > > > 
 > > > > wrote:
 > > > > > >
 > > > > > > I just opened
 > > > > > >
 > > > > > > https://issues.apache.org/jira/browse/ARROW-6678
 > > > > > >
 > > > > > > Please don't cut an RC until I have an opportunity to diagnose 
 > > > > > > this,
 > > > > > > will report back.
 > > > > > >
 > > > > > >
 > > > > > > On Tue, Sep 24, 2019 at 5:51 PM Wes McKinney 
 > > > > > > 
 > > > > wrote:
 > > > > > > >
 > > > > > > > I'm investigating a possible Parquet-related compatibility 
 > > > > > > > bug
 > > > that I
 > > > > > > > encountered through some routine testing / benchmarking. I'll
 > > > report
 > > > > > > > back once I figure out what is going on (if anything)
 > > > > > > >
 > > > > > > > On Sun, Sep 22, 2019 at 11:51 PM Micah Kornfield <
 > > > > emkornfi...@gmail.com> wrote:
 > > > > > > > >>
 > > > > > > > >> It's ideal if your GPG key is in the web of trust (i.e. 
 > > > > > > > >> you can
 > > > > get it
 > > > > > > > >> signed by another PMC member), but is not 100% essential.
 > > > > > > > >
 > > > > > > > > That won't be an option for me this week (it seems like I 
 > > > > > > > > would
 > > > > need to meet one face-to-face).  I'll try to get the GPG checked 
 > > > > in and
 > > > the
 > > > > rest of the pre-requisites done tomorrow (Monday) to hopefully 
 > > > > start the
 > > > > release on Tuesday (hopefully we can solve the last 
 > > > > blocker/integration
 > > > > tests by then).
 > > > > > > > >
 > > > > > > > > On Sat, Sep 21, 2019 at 7:12 PM Wes McKinney <
 > > > wesmck...@gmail.com>
 > > > > wrote:
 > > > > > > > >>
 > > > > > > > >> It's ideal if your GPG key is in the web of trust (i.e. 
 > > > > > > > >> you can
 > > > > get it
 > > > > > > > >> signed by another PMC member), but is not 100% essential.
 > > > > > > > >>
 > > > > > > > >> Speaking of the release, there are at least 2 code 
 > > > > > > > >> changes I
 > > > still
 > > > > > > > >> want to get in
 > > > > > > > >>
 > > > > > > > >> ARROW-5717
 > > > > > > > >> ARROW-6353
 > > > > > > > >>
 > > > > > > > >> I just pushed updates to ARROW-5717, will merge once the 
 > > > > > > > >> build
 > > > is
 > > > > green.
 > > > > > > > >>
 > > > > > > > >> There are a couple of Rust patches still marked for 0.15. 
 > > > > > > > >> The
 > > > rest

Re: Timeline for 0.15.0 release

2019-09-25 Thread Krisztián Szűcs
The S3 python bindings would be a nice addition to the release.
I don't think we should block on this but the PR is ready. Opinions?
https://github.com/apache/arrow/pull/5423




On Wed, Sep 25, 2019 at 5:28 PM Micah Kornfield 
wrote:

> OK, I'll start the process today.  I'll send up e-mail updates as I make
> progress.
>
> On Wed, Sep 25, 2019 at 8:22 AM Wes McKinney  wrote:
>
>> Yes, all systems go as far as I'm concerned.
>>
>> On Wed, Sep 25, 2019 at 9:56 AM Neal Richardson
>>  wrote:
>> >
>> > Andy's DataFusion issue and Wes's Parquet one have both been merged,
>> > and it looks like the LICENSE issue is being resolved as I type. So
>> > are we good to go now?
>> >
>> > Neal
>> >
>> >
>> > On Tue, Sep 24, 2019 at 10:30 PM Andy Grove 
>> wrote:
>> > >
>> > > I found a last minute issue with DataFusion (Rust) and would
>> appreciate it
>> > > if we could merge ARROW-6086 (PR is
>> > > https://github.com/apache/arrow/pull/5494) before cutting the RC.
>> > >
>> > > Thanks,
>> > >
>> > > Andy.
>> > >
>> > >
>> > > On Tue, Sep 24, 2019 at 6:19 PM Micah Kornfield <
>> emkornfi...@gmail.com>
>> > > wrote:
>> > >
>> > > > OK, I'm going to postpone cutting a release until tomorrow (hoping
>> we can
>> > > > issues resolved by then)..  I'll also try to review the third-party
>> > > > additions since 14.x.
>> > > >
>> > > > On Tue, Sep 24, 2019 at 4:20 PM Wes McKinney 
>> wrote:
>> > > >
>> > > > > I found a licensing issue
>> > > > >
>> > > > > https://issues.apache.org/jira/browse/ARROW-6679
>> > > > >
>> > > > > It might be worth examining third party code added to the project
>> > > > > since 0.14.x to make sure there are no other such issues.
>> > > > >
>> > > > > On Tue, Sep 24, 2019 at 6:10 PM Wes McKinney > >
>> > > > wrote:
>> > > > > >
>> > > > > > I have diagnosed the problem (Thrift "string" data must be
>> UTF-8,
>> > > > > > cannot be arbitrary binary) and am working on a patch right now
>> > > > > >
>> > > > > > On Tue, Sep 24, 2019 at 6:02 PM Wes McKinney <
>> wesmck...@gmail.com>
>> > > > > wrote:
>> > > > > > >
>> > > > > > > I just opened
>> > > > > > >
>> > > > > > > https://issues.apache.org/jira/browse/ARROW-6678
>> > > > > > >
>> > > > > > > Please don't cut an RC until I have an opportunity to
>> diagnose this,
>> > > > > > > will report back.
>> > > > > > >
>> > > > > > >
>> > > > > > > On Tue, Sep 24, 2019 at 5:51 PM Wes McKinney <
>> wesmck...@gmail.com>
>> > > > > wrote:
>> > > > > > > >
>> > > > > > > > I'm investigating a possible Parquet-related compatibility
>> bug
>> > > > that I
>> > > > > > > > encountered through some routine testing / benchmarking.
>> I'll
>> > > > report
>> > > > > > > > back once I figure out what is going on (if anything)
>> > > > > > > >
>> > > > > > > > On Sun, Sep 22, 2019 at 11:51 PM Micah Kornfield <
>> > > > > emkornfi...@gmail.com> wrote:
>> > > > > > > > >>
>> > > > > > > > >> It's ideal if your GPG key is in the web of trust (i.e.
>> you can
>> > > > > get it
>> > > > > > > > >> signed by another PMC member), but is not 100% essential.
>> > > > > > > > >
>> > > > > > > > > That won't be an option for me this week (it seems like I
>> would
>> > > > > need to meet one face-to-face).  I'll try to get the GPG checked
>> in and
>> > > > the
>> > > > > rest of the pre-requisites done tomorrow (Monday) to hopefully
>> start the
>> > > > > release on Tuesday (hopefully we can solve the last
>> blocker/integration
>> > > > > tests by then).
>> > > > > > > > >
>> > > > > > > > > On Sat, Sep 21, 2019 at 7:12 PM Wes McKinney <
>> > > > wesmck...@gmail.com>
>> > > > > wrote:
>> > > > > > > > >>
>> > > > > > > > >> It's ideal if your GPG key is in the web of trust (i.e.
>> you can
>> > > > > get it
>> > > > > > > > >> signed by another PMC member), but is not 100% essential.
>> > > > > > > > >>
>> > > > > > > > >> Speaking of the release, there are at least 2 code
>> changes I
>> > > > still
>> > > > > > > > >> want to get in
>> > > > > > > > >>
>> > > > > > > > >> ARROW-5717
>> > > > > > > > >> ARROW-6353
>> > > > > > > > >>
>> > > > > > > > >> I just pushed updates to ARROW-5717, will merge once the
>> build
>> > > > is
>> > > > > green.
>> > > > > > > > >>
>> > > > > > > > >> There are a couple of Rust patches still marked for
>> 0.15. The
>> > > > rest
>> > > > > > > > >> seems to be documentation and a couple of integration
>> test
>> > > > > failures we
>> > > > > > > > >> should see about fixing in time.
>> > > > > > > > >>
>> > > > > > > > >> On Fri, Sep 20, 2019 at 11:26 PM Micah Kornfield <
>> > > > > emkornfi...@gmail.com> wrote:
>> > > > > > > > >> >
>> > > > > > > > >> > Thanks Krisztián and Wes,
>> > > > > > > > >> > I've gone ahead and started registering myself on all
>> the
>> > > > > packaging sites.
>> > > > > > > > >> >
>> > > > > > > > >> > Is there any review process when adding my GPG key to
>> the SVN
>> > > > > file? [1]
>> > > > > > > > >> > doesn't seem to mention explicitly.
>> > > > > > > > >> >
>> > > > > > > > >> > Thanks,
>> > > 

Re: Build issues on macOS [newbie]

2019-09-25 Thread Tarek Allam Jr .
Thanks for the advice Uwe and Neal. I tried your suggestion (as well as turning 
many of the flags to off) but then ran into other errors afterwards such as:

-- Using ZSTD_ROOT: /usr/local/anaconda3/envs/main
CMake Error at 
/usr/local/Cellar/cmake/3.15.3/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:137
 (message):
  Could NOT find ZSTD (missing: ZSTD_LIB ZSTD_INCLUDE_DIR)
  
/usr/local/Cellar/cmake/3.15.3/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:378
 (_FPHSA_FAILURE_MESSAGE)
  cmake_modules/FindZSTD.cmake:61 (find_package_handle_standard_args)
  cmake_modules/ThirdpartyToolchain.cmake:181 (find_package)
  cmake_modules/ThirdpartyToolchain.cmake:2033 (resolve_dependency)
  CMakeLists.txt:412 (include)

I think I will spend some more time to understand CMAKE better and familiarise 
myself with the codebase more before having another go. Hopefully in this time 
conda-forge would have removed the SDK requirement as well which like you say 
should make things much more similar.

Thanks again, 

Regards, 
Tarek

On 2019/09/19 16:00:09, "Uwe L. Korn"  wrote: 
> Hello Tarek,
> 
> this error message is normally the one you get when CONDA_BUILD_SYSROOT 
> doesn't point to your 10.9 SDK. Please delete your build folder again and do 
> `export CONDA_BUILD_SYSROOT=..` immediately before running cmake. Running 
> e.g. a conda install will sadly reset this variable to something different 
> and break the build.
> 
> As a sidenote: It looks like in 1-2 months that conda-forge will get rid of 
> the SDK requirement, then this will be a bit simpler.
> 
> Cheers
> Uwe
> 
> On Thu, Sep 19, 2019, at 5:24 PM, Tarek Allam Jr. wrote:
> > 
> > Hi all,
> > 
> > Firstly I must apologies if what I put here is extremely trivial, but I am a
> > complete newcomer to the Apache Arrow project and contributing to Apache in
> > general, but I am very keen to get involved.
> > 
> > I'm hoping to help where I can so I recently attempted to complete a build
> > following the instructions laid out in the 'Python Development' section of 
> > the
> > documentation here:
> > 
> > After completing the steps that specifically uses Conda I was able to 
> > create an
> > environment but when it comes to building I am unable to do so.
> > 
> > I am on macOS -- 10.14.6 and as outlined in the docs and here 
> > (https://stackoverflow.com/a/55798942/4521950) I used use 10.9.sdk 
> > instead
> > of the latest. I have both added this manually using ccmake and also 
> > defining it
> > like so:
> > 
> > cmake -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \
> >   -DCMAKE_INSTALL_LIBDIR=lib \
> >   -DARROW_FLIGHT=ON \
> >   -DARROW_GANDIVA=ON \
> >   -DARROW_ORC=ON \
> >   -DARROW_PARQUET=ON \
> >   -DARROW_PYTHON=ON \
> >   -DARROW_PLASMA=ON \
> >   -DARROW_BUILD_TESTS=ON \
> >   -DCONDA_BUILD_SYSROOT=/opt/MacOSX10.9.sdk \
> >   -DARROW_DEPENDENCY_SOURCE=AUTO \
> >   ..
> > 
> > But it seems that whatever I try, I seem to get errors, the main only 
> > tripping
> > me up at the moment is:
> > 
> > -- Building using CMake version: 3.15.3
> > -- The C compiler identification is Clang 4.0.1
> > -- The CXX compiler identification is Clang 4.0.1
> > -- Check for working C compiler: 
> > /usr/local/anaconda3/envs/pyarrow-dev/bin/clang
> > -- Check for working C compiler: 
> > /usr/local/anaconda3/envs/pyarrow-dev/bin/clang -- broken
> > CMake Error at 
> > /usr/local/anaconda3/envs/pyarrow-dev/share/cmake-3.15/Modules/CMakeTestCCompiler.cmake:60
> >  (message):
> >   The C compiler
> > 
> > "/usr/local/anaconda3/envs/pyarrow-dev/bin/clang"
> > 
> >   is not able to compile a simple test program.
> > 
> >   It fails with the following output:
> > 
> > Change Dir: /Users/tallamjr/Github/arrow/cpp/build/CMakeFiles/CMakeTmp
> > 
> > Run Build Command(s):/usr/local/bin/gmake cmTC_b252c/fast && 
> > /usr/local/bin/gmake -f CMakeFiles/cmTC_b252c.dir/build.make 
> > CMakeFiles/cmTC_b252c.dir/build
> > gmake[1]: Entering directory 
> > '/Users/tallamjr/Github/arrow/cpp/build/CMakeFiles/CMakeTmp'
> > Building C object CMakeFiles/cmTC_b252c.dir/testCCompiler.c.o
> > /usr/local/anaconda3/envs/pyarrow-dev/bin/clang   -march=core2 
> > -mtune=haswell -mssse3 -ftree-vectorize -fPIC -fPIE 
> > -fstack-protector-strong -O2 -pipe  -isysroot /opt/MacOSX10.9.sdk   -o 
> > CMakeFiles/cmTC_b252c.dir/testCCompiler.c.o   -c 
> > /Users/tallamjr/Github/arrow/cpp/build/CMakeFiles/CMakeTmp/testCCompiler.c
> > Linking C executable cmTC_b252c
> > /usr/local/anaconda3/envs/pyarrow-dev/bin/cmake -E 
> > cmake_link_script CMakeFiles/cmTC_b252c.dir/link.txt --verbose=1
> > /usr/local/anaconda3/envs/pyarrow-dev/bin/clang -march=core2 
> > -mtune=haswell -mssse3 -ftree-vectorize -fPIC -fPIE 
> > -fstack-protector-strong -O2 -pipe  -isysroot /opt/MacOSX10.9.sdk 
> > -Wl,-search_paths_first -Wl,-headerpad_max_install_names -Wl,-pie 
> > -Wl,-headerpad_max_install_names -Wl,-dead_strip_dylibs  
> > 

Re: Timeline for 0.15.0 release

2019-09-25 Thread Wes McKinney
Yes, all systems go as far as I'm concerned.

On Wed, Sep 25, 2019 at 9:56 AM Neal Richardson
 wrote:
>
> Andy's DataFusion issue and Wes's Parquet one have both been merged,
> and it looks like the LICENSE issue is being resolved as I type. So
> are we good to go now?
>
> Neal
>
>
> On Tue, Sep 24, 2019 at 10:30 PM Andy Grove  wrote:
> >
> > I found a last minute issue with DataFusion (Rust) and would appreciate it
> > if we could merge ARROW-6086 (PR is
> > https://github.com/apache/arrow/pull/5494) before cutting the RC.
> >
> > Thanks,
> >
> > Andy.
> >
> >
> > On Tue, Sep 24, 2019 at 6:19 PM Micah Kornfield 
> > wrote:
> >
> > > OK, I'm going to postpone cutting a release until tomorrow (hoping we can
> > > issues resolved by then)..  I'll also try to review the third-party
> > > additions since 14.x.
> > >
> > > On Tue, Sep 24, 2019 at 4:20 PM Wes McKinney  wrote:
> > >
> > > > I found a licensing issue
> > > >
> > > > https://issues.apache.org/jira/browse/ARROW-6679
> > > >
> > > > It might be worth examining third party code added to the project
> > > > since 0.14.x to make sure there are no other such issues.
> > > >
> > > > On Tue, Sep 24, 2019 at 6:10 PM Wes McKinney 
> > > wrote:
> > > > >
> > > > > I have diagnosed the problem (Thrift "string" data must be UTF-8,
> > > > > cannot be arbitrary binary) and am working on a patch right now
> > > > >
> > > > > On Tue, Sep 24, 2019 at 6:02 PM Wes McKinney 
> > > > wrote:
> > > > > >
> > > > > > I just opened
> > > > > >
> > > > > > https://issues.apache.org/jira/browse/ARROW-6678
> > > > > >
> > > > > > Please don't cut an RC until I have an opportunity to diagnose this,
> > > > > > will report back.
> > > > > >
> > > > > >
> > > > > > On Tue, Sep 24, 2019 at 5:51 PM Wes McKinney 
> > > > wrote:
> > > > > > >
> > > > > > > I'm investigating a possible Parquet-related compatibility bug
> > > that I
> > > > > > > encountered through some routine testing / benchmarking. I'll
> > > report
> > > > > > > back once I figure out what is going on (if anything)
> > > > > > >
> > > > > > > On Sun, Sep 22, 2019 at 11:51 PM Micah Kornfield <
> > > > emkornfi...@gmail.com> wrote:
> > > > > > > >>
> > > > > > > >> It's ideal if your GPG key is in the web of trust (i.e. you can
> > > > get it
> > > > > > > >> signed by another PMC member), but is not 100% essential.
> > > > > > > >
> > > > > > > > That won't be an option for me this week (it seems like I would
> > > > need to meet one face-to-face).  I'll try to get the GPG checked in and
> > > the
> > > > rest of the pre-requisites done tomorrow (Monday) to hopefully start the
> > > > release on Tuesday (hopefully we can solve the last blocker/integration
> > > > tests by then).
> > > > > > > >
> > > > > > > > On Sat, Sep 21, 2019 at 7:12 PM Wes McKinney <
> > > wesmck...@gmail.com>
> > > > wrote:
> > > > > > > >>
> > > > > > > >> It's ideal if your GPG key is in the web of trust (i.e. you can
> > > > get it
> > > > > > > >> signed by another PMC member), but is not 100% essential.
> > > > > > > >>
> > > > > > > >> Speaking of the release, there are at least 2 code changes I
> > > still
> > > > > > > >> want to get in
> > > > > > > >>
> > > > > > > >> ARROW-5717
> > > > > > > >> ARROW-6353
> > > > > > > >>
> > > > > > > >> I just pushed updates to ARROW-5717, will merge once the build
> > > is
> > > > green.
> > > > > > > >>
> > > > > > > >> There are a couple of Rust patches still marked for 0.15. The
> > > rest
> > > > > > > >> seems to be documentation and a couple of integration test
> > > > failures we
> > > > > > > >> should see about fixing in time.
> > > > > > > >>
> > > > > > > >> On Fri, Sep 20, 2019 at 11:26 PM Micah Kornfield <
> > > > emkornfi...@gmail.com> wrote:
> > > > > > > >> >
> > > > > > > >> > Thanks Krisztián and Wes,
> > > > > > > >> > I've gone ahead and started registering myself on all the
> > > > packaging sites.
> > > > > > > >> >
> > > > > > > >> > Is there any review process when adding my GPG key to the SVN
> > > > file? [1]
> > > > > > > >> > doesn't seem to mention explicitly.
> > > > > > > >> >
> > > > > > > >> > Thanks,
> > > > > > > >> > Micah
> > > > > > > >> >
> > > > > > > >> > [1] https://www.apache.org/dev/version-control.html#https-svn
> > > > > > > >> >
> > > > > > > >> > On Fri, Sep 20, 2019 at 5:01 PM Krisztián Szűcs <
> > > > szucs.kriszt...@gmail.com>
> > > > > > > >> > wrote:
> > > > > > > >> >
> > > > > > > >> > > On Thu, Sep 19, 2019 at 5:52 PM Wes McKinney <
> > > > wesmck...@gmail.com> wrote:
> > > > > > > >> > >
> > > > > > > >> > >> On Thu, Sep 19, 2019 at 12:13 AM Micah Kornfield <
> > > > emkornfi...@gmail.com>
> > > > > > > >> > >> wrote:
> > > > > > > >> > >> >>
> > > > > > > >> > >> >> The process should be well documented at this point but
> > > > there are a
> > > > > > > >> > >> >> number of steps.
> > > > > > > >> > >> >
> > > > > > > >> > >> > Is [1] the up-to-date documentation for the release?
> > >  Are
> > > > there
> > > > > > > >> > >> 

Re: Timeline for 0.15.0 release

2019-09-25 Thread Micah Kornfield
OK, I'll start the process today.  I'll send up e-mail updates as I make
progress.

On Wed, Sep 25, 2019 at 8:22 AM Wes McKinney  wrote:

> Yes, all systems go as far as I'm concerned.
>
> On Wed, Sep 25, 2019 at 9:56 AM Neal Richardson
>  wrote:
> >
> > Andy's DataFusion issue and Wes's Parquet one have both been merged,
> > and it looks like the LICENSE issue is being resolved as I type. So
> > are we good to go now?
> >
> > Neal
> >
> >
> > On Tue, Sep 24, 2019 at 10:30 PM Andy Grove 
> wrote:
> > >
> > > I found a last minute issue with DataFusion (Rust) and would
> appreciate it
> > > if we could merge ARROW-6086 (PR is
> > > https://github.com/apache/arrow/pull/5494) before cutting the RC.
> > >
> > > Thanks,
> > >
> > > Andy.
> > >
> > >
> > > On Tue, Sep 24, 2019 at 6:19 PM Micah Kornfield  >
> > > wrote:
> > >
> > > > OK, I'm going to postpone cutting a release until tomorrow (hoping
> we can
> > > > issues resolved by then)..  I'll also try to review the third-party
> > > > additions since 14.x.
> > > >
> > > > On Tue, Sep 24, 2019 at 4:20 PM Wes McKinney 
> wrote:
> > > >
> > > > > I found a licensing issue
> > > > >
> > > > > https://issues.apache.org/jira/browse/ARROW-6679
> > > > >
> > > > > It might be worth examining third party code added to the project
> > > > > since 0.14.x to make sure there are no other such issues.
> > > > >
> > > > > On Tue, Sep 24, 2019 at 6:10 PM Wes McKinney 
> > > > wrote:
> > > > > >
> > > > > > I have diagnosed the problem (Thrift "string" data must be UTF-8,
> > > > > > cannot be arbitrary binary) and am working on a patch right now
> > > > > >
> > > > > > On Tue, Sep 24, 2019 at 6:02 PM Wes McKinney <
> wesmck...@gmail.com>
> > > > > wrote:
> > > > > > >
> > > > > > > I just opened
> > > > > > >
> > > > > > > https://issues.apache.org/jira/browse/ARROW-6678
> > > > > > >
> > > > > > > Please don't cut an RC until I have an opportunity to diagnose
> this,
> > > > > > > will report back.
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Sep 24, 2019 at 5:51 PM Wes McKinney <
> wesmck...@gmail.com>
> > > > > wrote:
> > > > > > > >
> > > > > > > > I'm investigating a possible Parquet-related compatibility
> bug
> > > > that I
> > > > > > > > encountered through some routine testing / benchmarking. I'll
> > > > report
> > > > > > > > back once I figure out what is going on (if anything)
> > > > > > > >
> > > > > > > > On Sun, Sep 22, 2019 at 11:51 PM Micah Kornfield <
> > > > > emkornfi...@gmail.com> wrote:
> > > > > > > > >>
> > > > > > > > >> It's ideal if your GPG key is in the web of trust (i.e.
> you can
> > > > > get it
> > > > > > > > >> signed by another PMC member), but is not 100% essential.
> > > > > > > > >
> > > > > > > > > That won't be an option for me this week (it seems like I
> would
> > > > > need to meet one face-to-face).  I'll try to get the GPG checked
> in and
> > > > the
> > > > > rest of the pre-requisites done tomorrow (Monday) to hopefully
> start the
> > > > > release on Tuesday (hopefully we can solve the last
> blocker/integration
> > > > > tests by then).
> > > > > > > > >
> > > > > > > > > On Sat, Sep 21, 2019 at 7:12 PM Wes McKinney <
> > > > wesmck...@gmail.com>
> > > > > wrote:
> > > > > > > > >>
> > > > > > > > >> It's ideal if your GPG key is in the web of trust (i.e.
> you can
> > > > > get it
> > > > > > > > >> signed by another PMC member), but is not 100% essential.
> > > > > > > > >>
> > > > > > > > >> Speaking of the release, there are at least 2 code
> changes I
> > > > still
> > > > > > > > >> want to get in
> > > > > > > > >>
> > > > > > > > >> ARROW-5717
> > > > > > > > >> ARROW-6353
> > > > > > > > >>
> > > > > > > > >> I just pushed updates to ARROW-5717, will merge once the
> build
> > > > is
> > > > > green.
> > > > > > > > >>
> > > > > > > > >> There are a couple of Rust patches still marked for 0.15.
> The
> > > > rest
> > > > > > > > >> seems to be documentation and a couple of integration test
> > > > > failures we
> > > > > > > > >> should see about fixing in time.
> > > > > > > > >>
> > > > > > > > >> On Fri, Sep 20, 2019 at 11:26 PM Micah Kornfield <
> > > > > emkornfi...@gmail.com> wrote:
> > > > > > > > >> >
> > > > > > > > >> > Thanks Krisztián and Wes,
> > > > > > > > >> > I've gone ahead and started registering myself on all
> the
> > > > > packaging sites.
> > > > > > > > >> >
> > > > > > > > >> > Is there any review process when adding my GPG key to
> the SVN
> > > > > file? [1]
> > > > > > > > >> > doesn't seem to mention explicitly.
> > > > > > > > >> >
> > > > > > > > >> > Thanks,
> > > > > > > > >> > Micah
> > > > > > > > >> >
> > > > > > > > >> > [1]
> https://www.apache.org/dev/version-control.html#https-svn
> > > > > > > > >> >
> > > > > > > > >> > On Fri, Sep 20, 2019 at 5:01 PM Krisztián Szűcs <
> > > > > szucs.kriszt...@gmail.com>
> > > > > > > > >> > wrote:
> > > > > > > > >> >
> > > > > > > > >> > > On Thu, Sep 19, 2019 at 5:52 PM Wes McKinney <
> > > > > wesmck...@gmail.com> wrote:
> > 

[jira] [Created] (ARROW-6699) [C++] Add Parquet docs

2019-09-25 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-6699:
-

 Summary: [C++] Add Parquet docs
 Key: ARROW-6699
 URL: https://issues.apache.org/jira/browse/ARROW-6699
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Antoine Pitrou
 Fix For: 1.0.0


There is currently zero Sphinx doc for Parquet. I'm adding a stub in ARROW-6630 
but we should do more, especially as Arrow benefits from tight integration with 
Parquet.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Timeline for 0.15.0 release

2019-09-25 Thread Neal Richardson
Andy's DataFusion issue and Wes's Parquet one have both been merged,
and it looks like the LICENSE issue is being resolved as I type. So
are we good to go now?

Neal


On Tue, Sep 24, 2019 at 10:30 PM Andy Grove  wrote:
>
> I found a last minute issue with DataFusion (Rust) and would appreciate it
> if we could merge ARROW-6086 (PR is
> https://github.com/apache/arrow/pull/5494) before cutting the RC.
>
> Thanks,
>
> Andy.
>
>
> On Tue, Sep 24, 2019 at 6:19 PM Micah Kornfield 
> wrote:
>
> > OK, I'm going to postpone cutting a release until tomorrow (hoping we can
> > issues resolved by then)..  I'll also try to review the third-party
> > additions since 14.x.
> >
> > On Tue, Sep 24, 2019 at 4:20 PM Wes McKinney  wrote:
> >
> > > I found a licensing issue
> > >
> > > https://issues.apache.org/jira/browse/ARROW-6679
> > >
> > > It might be worth examining third party code added to the project
> > > since 0.14.x to make sure there are no other such issues.
> > >
> > > On Tue, Sep 24, 2019 at 6:10 PM Wes McKinney 
> > wrote:
> > > >
> > > > I have diagnosed the problem (Thrift "string" data must be UTF-8,
> > > > cannot be arbitrary binary) and am working on a patch right now
> > > >
> > > > On Tue, Sep 24, 2019 at 6:02 PM Wes McKinney 
> > > wrote:
> > > > >
> > > > > I just opened
> > > > >
> > > > > https://issues.apache.org/jira/browse/ARROW-6678
> > > > >
> > > > > Please don't cut an RC until I have an opportunity to diagnose this,
> > > > > will report back.
> > > > >
> > > > >
> > > > > On Tue, Sep 24, 2019 at 5:51 PM Wes McKinney 
> > > wrote:
> > > > > >
> > > > > > I'm investigating a possible Parquet-related compatibility bug
> > that I
> > > > > > encountered through some routine testing / benchmarking. I'll
> > report
> > > > > > back once I figure out what is going on (if anything)
> > > > > >
> > > > > > On Sun, Sep 22, 2019 at 11:51 PM Micah Kornfield <
> > > emkornfi...@gmail.com> wrote:
> > > > > > >>
> > > > > > >> It's ideal if your GPG key is in the web of trust (i.e. you can
> > > get it
> > > > > > >> signed by another PMC member), but is not 100% essential.
> > > > > > >
> > > > > > > That won't be an option for me this week (it seems like I would
> > > need to meet one face-to-face).  I'll try to get the GPG checked in and
> > the
> > > rest of the pre-requisites done tomorrow (Monday) to hopefully start the
> > > release on Tuesday (hopefully we can solve the last blocker/integration
> > > tests by then).
> > > > > > >
> > > > > > > On Sat, Sep 21, 2019 at 7:12 PM Wes McKinney <
> > wesmck...@gmail.com>
> > > wrote:
> > > > > > >>
> > > > > > >> It's ideal if your GPG key is in the web of trust (i.e. you can
> > > get it
> > > > > > >> signed by another PMC member), but is not 100% essential.
> > > > > > >>
> > > > > > >> Speaking of the release, there are at least 2 code changes I
> > still
> > > > > > >> want to get in
> > > > > > >>
> > > > > > >> ARROW-5717
> > > > > > >> ARROW-6353
> > > > > > >>
> > > > > > >> I just pushed updates to ARROW-5717, will merge once the build
> > is
> > > green.
> > > > > > >>
> > > > > > >> There are a couple of Rust patches still marked for 0.15. The
> > rest
> > > > > > >> seems to be documentation and a couple of integration test
> > > failures we
> > > > > > >> should see about fixing in time.
> > > > > > >>
> > > > > > >> On Fri, Sep 20, 2019 at 11:26 PM Micah Kornfield <
> > > emkornfi...@gmail.com> wrote:
> > > > > > >> >
> > > > > > >> > Thanks Krisztián and Wes,
> > > > > > >> > I've gone ahead and started registering myself on all the
> > > packaging sites.
> > > > > > >> >
> > > > > > >> > Is there any review process when adding my GPG key to the SVN
> > > file? [1]
> > > > > > >> > doesn't seem to mention explicitly.
> > > > > > >> >
> > > > > > >> > Thanks,
> > > > > > >> > Micah
> > > > > > >> >
> > > > > > >> > [1] https://www.apache.org/dev/version-control.html#https-svn
> > > > > > >> >
> > > > > > >> > On Fri, Sep 20, 2019 at 5:01 PM Krisztián Szűcs <
> > > szucs.kriszt...@gmail.com>
> > > > > > >> > wrote:
> > > > > > >> >
> > > > > > >> > > On Thu, Sep 19, 2019 at 5:52 PM Wes McKinney <
> > > wesmck...@gmail.com> wrote:
> > > > > > >> > >
> > > > > > >> > >> On Thu, Sep 19, 2019 at 12:13 AM Micah Kornfield <
> > > emkornfi...@gmail.com>
> > > > > > >> > >> wrote:
> > > > > > >> > >> >>
> > > > > > >> > >> >> The process should be well documented at this point but
> > > there are a
> > > > > > >> > >> >> number of steps.
> > > > > > >> > >> >
> > > > > > >> > >> > Is [1] the up-to-date documentation for the release?
> >  Are
> > > there
> > > > > > >> > >> instructions for the adding the code signing Key to SVN?
> > > > > > >> > >> >
> > > > > > >> > >> > I will make a go of it.  i will try to mitigate any
> > > internet issues by
> > > > > > >> > >> doing the process for a cloud instance (I assume that isn't
> > > a problem?).
> > > > > > >> > >> >
> > > > > > >> > >>
> > > > > > >> > >> Setting up a new cloud environment 

[jira] [Created] (ARROW-6698) Please support Python __slots__

2019-09-25 Thread John Yost (Jira)
John Yost created ARROW-6698:


 Summary: Please support Python __slots__
 Key: ARROW-6698
 URL: https://issues.apache.org/jira/browse/ARROW-6698
 Project: Apache Arrow
  Issue Type: New Feature
Reporter: John Yost


Hi Everyone,

First of all, well-done! This project is totally awesome.

When I attempted to serialize a Python object that uses __slots__ to minimize 
memory requirements I received the following error:

    return pyarrow.deserialize(data, serialization_context)

  File "pyarrow/serialization.pxi", line 461, in pyarrow.lib.deserialize

  File "pyarrow/serialization.pxi", line 424, in pyarrow.lib.deserialize_from

  File "pyarrow/serialization.pxi", line 275, in 
pyarrow.lib.SerializedPyObject.deserialize

  File "pyarrow/serialization.pxi", line 194, in 
pyarrow.lib.SerializationContext._deserialize_callback

AttributeError: 'RemoteExecutorObject' object has no attribute '__dict__'

Would it be possible to support __slots__? If not, not a huge deal, but wanted 
to at least ask.

 

Thanks, and thanks again for pyarrow!

 

--John



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [NIGHTLY] Arrow Build Report for Job nightly-2019-09-25-0

2019-09-25 Thread Wes McKinney
Thanks Krisz. It doesn't appear there is anything here stopping us
from releasing

On Wed, Sep 25, 2019 at 9:15 AM Krisztián Szűcs
 wrote:
>
> wheel-osx-cp35m has failed with an unrelated timeout error, restarted it:
>   https://travis-ci.org/ursa-labs/crossbow/builds/589326914
>
> On Wed, Sep 25, 2019 at 4:11 PM Crossbow  wrote:
>
> >
> > Arrow Build Report for Job nightly-2019-09-25-0
> >
> > All tasks:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0
> >
> > Failed Tasks:
> > - wheel-osx-cp35m:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-travis-wheel-osx-cp35m
> > - docker-cpp-fuzzit:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-circle-docker-cpp-fuzzit
> > - docker-spark-integration:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-circle-docker-spark-integration
> > - docker-dask-integration:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-circle-docker-dask-integration
> >
> > Succeeded Tasks:
> > - centos-6:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-azure-centos-6
> > - wheel-manylinux2010-cp27m:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-travis-wheel-manylinux2010-cp27m
> > - conda-linux-gcc-py27:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-azure-conda-linux-gcc-py27
> > - wheel-manylinux1-cp35m:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-travis-wheel-manylinux1-cp35m
> > - wheel-win-cp37m:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-appveyor-wheel-win-cp37m
> > - docker-r-conda:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-circle-docker-r-conda
> > - docker-cpp-release:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-circle-docker-cpp-release
> > - docker-go:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-circle-docker-go
> > - wheel-manylinux1-cp37m:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-travis-wheel-manylinux1-cp37m
> > - debian-buster:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-azure-debian-buster
> > - docker-python-2.7-nopandas:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-circle-docker-python-2.7-nopandas
> > - conda-linux-gcc-py37:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-azure-conda-linux-gcc-py37
> > - ubuntu-bionic-arm64:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-azure-ubuntu-bionic-arm64
> > - docker-lint:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-circle-docker-lint
> > - ubuntu-xenial-arm64:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-azure-ubuntu-xenial-arm64
> > - conda-osx-clang-py37:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-azure-conda-osx-clang-py37
> > - homebrew-cpp:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-travis-homebrew-cpp
> > - wheel-manylinux1-cp36m:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-travis-wheel-manylinux1-cp36m
> > - ubuntu-disco:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-azure-ubuntu-disco
> > - centos-7:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-azure-centos-7
> > - gandiva-jar-trusty:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-travis-gandiva-jar-trusty
> > - docker-cpp-static-only:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-circle-docker-cpp-static-only
> > - docker-r:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-circle-docker-r
> > - docker-c_glib:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-circle-docker-c_glib
> > - docker-rust:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-circle-docker-rust
> > - conda-win-vs2015-py37:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-azure-conda-win-vs2015-py37
> > - conda-win-vs2015-py36:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-azure-conda-win-vs2015-py36
> > - homebrew-cpp-autobrew:
> >   

Re: [NIGHTLY] Arrow Build Report for Job nightly-2019-09-25-0

2019-09-25 Thread Krisztián Szűcs
wheel-osx-cp35m has failed with an unrelated timeout error, restarted it:
  https://travis-ci.org/ursa-labs/crossbow/builds/589326914

On Wed, Sep 25, 2019 at 4:11 PM Crossbow  wrote:

>
> Arrow Build Report for Job nightly-2019-09-25-0
>
> All tasks:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0
>
> Failed Tasks:
> - wheel-osx-cp35m:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-travis-wheel-osx-cp35m
> - docker-cpp-fuzzit:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-circle-docker-cpp-fuzzit
> - docker-spark-integration:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-circle-docker-spark-integration
> - docker-dask-integration:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-circle-docker-dask-integration
>
> Succeeded Tasks:
> - centos-6:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-azure-centos-6
> - wheel-manylinux2010-cp27m:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-travis-wheel-manylinux2010-cp27m
> - conda-linux-gcc-py27:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-azure-conda-linux-gcc-py27
> - wheel-manylinux1-cp35m:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-travis-wheel-manylinux1-cp35m
> - wheel-win-cp37m:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-appveyor-wheel-win-cp37m
> - docker-r-conda:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-circle-docker-r-conda
> - docker-cpp-release:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-circle-docker-cpp-release
> - docker-go:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-circle-docker-go
> - wheel-manylinux1-cp37m:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-travis-wheel-manylinux1-cp37m
> - debian-buster:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-azure-debian-buster
> - docker-python-2.7-nopandas:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-circle-docker-python-2.7-nopandas
> - conda-linux-gcc-py37:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-azure-conda-linux-gcc-py37
> - ubuntu-bionic-arm64:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-azure-ubuntu-bionic-arm64
> - docker-lint:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-circle-docker-lint
> - ubuntu-xenial-arm64:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-azure-ubuntu-xenial-arm64
> - conda-osx-clang-py37:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-azure-conda-osx-clang-py37
> - homebrew-cpp:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-travis-homebrew-cpp
> - wheel-manylinux1-cp36m:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-travis-wheel-manylinux1-cp36m
> - ubuntu-disco:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-azure-ubuntu-disco
> - centos-7:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-azure-centos-7
> - gandiva-jar-trusty:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-travis-gandiva-jar-trusty
> - docker-cpp-static-only:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-circle-docker-cpp-static-only
> - docker-r:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-circle-docker-r
> - docker-c_glib:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-circle-docker-c_glib
> - docker-rust:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-circle-docker-rust
> - conda-win-vs2015-py37:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-azure-conda-win-vs2015-py37
> - conda-win-vs2015-py36:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-azure-conda-win-vs2015-py36
> - homebrew-cpp-autobrew:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-travis-homebrew-cpp-autobrew
> - gandiva-jar-osx:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-travis-gandiva-jar-osx
> - debian-stretch:
>   URL:
> 

[NIGHTLY] Arrow Build Report for Job nightly-2019-09-25-0

2019-09-25 Thread Crossbow


Arrow Build Report for Job nightly-2019-09-25-0

All tasks: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0

Failed Tasks:
- wheel-osx-cp35m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-travis-wheel-osx-cp35m
- docker-cpp-fuzzit:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-circle-docker-cpp-fuzzit
- docker-spark-integration:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-circle-docker-spark-integration
- docker-dask-integration:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-circle-docker-dask-integration

Succeeded Tasks:
- centos-6:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-azure-centos-6
- wheel-manylinux2010-cp27m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-travis-wheel-manylinux2010-cp27m
- conda-linux-gcc-py27:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-azure-conda-linux-gcc-py27
- wheel-manylinux1-cp35m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-travis-wheel-manylinux1-cp35m
- wheel-win-cp37m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-appveyor-wheel-win-cp37m
- docker-r-conda:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-circle-docker-r-conda
- docker-cpp-release:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-circle-docker-cpp-release
- docker-go:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-circle-docker-go
- wheel-manylinux1-cp37m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-travis-wheel-manylinux1-cp37m
- debian-buster:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-azure-debian-buster
- docker-python-2.7-nopandas:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-circle-docker-python-2.7-nopandas
- conda-linux-gcc-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-azure-conda-linux-gcc-py37
- ubuntu-bionic-arm64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-azure-ubuntu-bionic-arm64
- docker-lint:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-circle-docker-lint
- ubuntu-xenial-arm64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-azure-ubuntu-xenial-arm64
- conda-osx-clang-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-azure-conda-osx-clang-py37
- homebrew-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-travis-homebrew-cpp
- wheel-manylinux1-cp36m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-travis-wheel-manylinux1-cp36m
- ubuntu-disco:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-azure-ubuntu-disco
- centos-7:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-azure-centos-7
- gandiva-jar-trusty:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-travis-gandiva-jar-trusty
- docker-cpp-static-only:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-circle-docker-cpp-static-only
- docker-r:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-circle-docker-r
- docker-c_glib:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-circle-docker-c_glib
- docker-rust:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-circle-docker-rust
- conda-win-vs2015-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-azure-conda-win-vs2015-py37
- conda-win-vs2015-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-azure-conda-win-vs2015-py36
- homebrew-cpp-autobrew:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-travis-homebrew-cpp-autobrew
- gandiva-jar-osx:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-travis-gandiva-jar-osx
- debian-stretch:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-azure-debian-stretch
- conda-linux-gcc-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-azure-conda-linux-gcc-py36
- docker-turbodbc-integration:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-25-0-circle-docker-turbodbc-integration
- docker-pandas-master:
  URL: 

[jira] [Created] (ARROW-6697) [Rust] [DataFusion] Validate that all parquet partitions have the same schema

2019-09-25 Thread Andy Grove (Jira)
Andy Grove created ARROW-6697:
-

 Summary: [Rust] [DataFusion] Validate that all parquet partitions 
have the same schema
 Key: ARROW-6697
 URL: https://issues.apache.org/jira/browse/ARROW-6697
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust, Rust - DataFusion
Reporter: Andy Grove
 Fix For: 1.0.0


When reading a partitioned Parquet file in DataFusion, the schema is read from 
the first partition and it is assumed that all other partitions have the same 
schema.

It would be better to actually validate that all of the partitions have the 
same schema since there is no support for schema merging yet.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6696) [Rust] [DataFusion] Implement simple math operations in physical query plan

2019-09-25 Thread Andy Grove (Jira)
Andy Grove created ARROW-6696:
-

 Summary: [Rust] [DataFusion] Implement simple math operations in 
physical query plan
 Key: ARROW-6696
 URL: https://issues.apache.org/jira/browse/ARROW-6696
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust, Rust - DataFusion
Reporter: Andy Grove
 Fix For: 1.0.0


Update BinaryExpr to support simple math operations such as +, -, *, / using 
compute kernels where possible.

See the original implementation when executing directly from the logical plan 
for inspiration.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Parquet file reading performance

2019-09-25 Thread Joris Van den Bossche
Hi Maarten,

Thanks for the reproducible script. I ran it on my laptop on pyarrow
master, and not seeing the difference between both datetime indexes:

Versions:
Python:   3.7.3 | packaged by conda-forge | (default, Mar 27 2019,
23:01:00)
[GCC 7.3.0] on linux
numpy:1.16.4
pandas:   0.26.0.dev0+447.gc168ecf26
pyarrow:  0.14.1.dev642+g7f2d637db

1073741824 float64 8388608 16
0: make_dataframe :   1443.483 msec,  709 MB/s
0: write_arrow_parquet:   7685.426 msec,  133 MB/s
0: read_arrow_parquet :   1262.741 msec,  811 MB/s <<<
1: make_dataframe :   1412.575 msec,  725 MB/s
1: write_arrow_parquet:   7869.145 msec,  130 MB/s
1: read_arrow_parquet :   1947.896 msec,  526 MB/s <<<
2: make_dataframe :   1490.165 msec,  687 MB/s
2: write_arrow_parquet:   7040.507 msec,  145 MB/s
2: read_arrow_parquet :   1888.316 msec,  542 MB/s <<<

The only change I needed to make in the script to get it running (within my
memory limits) was the creation of the second DatetimeIndex
(pd.date_range('1970-01-01', '2019-09-01', freq='S') creates an index of
1.5 billion elements, while only the last part of it is used. So changed
that to index = pd.date_range('2018-01-01', '2019-09-01',
freq='S').array[-rows:])

The datetime index reading in general is still slower as the int index. But
doing a bit more detailed timings, and it seems this is not due to the
reading of parquet, but the conversion of arrow to pandas (using the files
from the benchmark):

In [1]: import pyarrow.parquet as pq

In [4]: %timeit pq.read_table('testdata.int.parquet')
41.5 ms ± 3.31 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [5]: %timeit pq.read_table('testdata.dt.parquet')
43 ms ± 1.75 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [6]: table_int = pq.read_table('testdata.int.parquet')

In [7]: table_datetime = pq.read_table('testdata.dt.parquet')

In [8]: %timeit table_int.to_pandas()
14.3 ms ± 309 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [9]: %timeit table_datetime.to_pandas()
47.2 ms ± 2.53 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

So you can see that the parquet reading part is basically identical, but
the conversion to pandas is much slower for the datetime-index case.
I will try to look into that code path to see what makes this so much
slower.

Joris


On Tue, 24 Sep 2019 at 22:28, Maarten Ballintijn  wrote:

> Hi,
>
> The code to show the performance issue with DateTimeIndex is at:
>
> https://gist.github.com/maartenb/256556bcd6d7c7636d400f3b464db18c
>
> It shows three case 0) int index, 1) datetime index, 2) date time index
> created in a slightly roundabout way
>
> I’m a little confused by the two date time cases. Case 2) is much slower
> but the df compares identical to case 1)
> (I originally used something like 2) to match our specific data. I don’t
> see why it behaves differently??)
>
> The timings I find are:
>
> 1073741824 float64 8388608 16
> 0: make_dataframe :   2390.830 msec,  428 MB/s
> 0: write_arrow_parquet:   2486.463 msec,  412 MB/s
> 0: read_arrow_parquet :813.946 msec,  1258 MB/s <<<
> 1: make_dataframe :   2579.815 msec,  397 MB/s
> 1: write_arrow_parquet:   2708.151 msec,  378 MB/s
> 1: read_arrow_parquet :   1413.999 msec,  724 MB/s <<<
> 2: make_dataframe :  15126.520 msec,  68 MB/s
> 2: write_arrow_parquet:   9205.815 msec,  111 MB/s
> 2: read_arrow_parquet :   5929.346 msec,  173 MB/s <<<
>
> Case 0, int index.  This is all great.
> Case 1, date time index. We loose almost half the speed. Given that a
> datetime is only scaled from Pandas IIRC that seems like a lot?
> Case  3, other datetime index. No idea what is going on.
>
> Any insights are much appreciated.
>
> Cheers,
> Maarten.
>
> > On Sep 24, 2019, at 11:25 AM, Wes McKinney  wrote:
> >
> > hi
> >
> > On Tue, Sep 24, 2019 at 9:26 AM Maarten Ballintijn  > wrote:
> >>
> >> Hi Wes,
> >>
> >> Thanks for your quick response.
> >>
> >> Yes, we’re using Python 3.7.4, from miniconda and conda-forge, and:
> >>
> >> numpy:   1.16.5
> >> pandas:  0.25.1
> >> pyarrow: 0.14.1
> >>
> >> It looks like 0.15 is close, so I can wait for that.
> >>
> >> Theoretically I see three components driving the performance:
> >> 1) The cost of locating the column (directory overhead)
> >> 2) The overhead of reading a single column. (reading and processing
> meta data, setting up for reading)
> >> 3) Bulk reading and unmarshalling/decoding the data.
> >>
> >> Only 1) would be impacted by the number of columns, but if you’re
> reading everything ideally this would not be a problem.
> >
> > The problem is more nuanced than that. Parquet's metadata is somewhat
> > "heavy" at the column level. So when you're writing thousands of
> > 

[jira] [Created] (ARROW-6695) [Rust] [DataFusion] Remove execution of logical plan

2019-09-25 Thread Andy Grove (Jira)
Andy Grove created ARROW-6695:
-

 Summary: [Rust] [DataFusion] Remove execution of logical plan
 Key: ARROW-6695
 URL: https://issues.apache.org/jira/browse/ARROW-6695
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust, Rust - DataFusion
Reporter: Andy Grove
 Fix For: 1.0.0


Remove execution of logical plan



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6693) [Rust] [DataFusion] Update unit tests to use physical query plan

2019-09-25 Thread Andy Grove (Jira)
Andy Grove created ARROW-6693:
-

 Summary: [Rust] [DataFusion] Update unit tests to use physical 
query plan
 Key: ARROW-6693
 URL: https://issues.apache.org/jira/browse/ARROW-6693
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust, Rust - DataFusion
Reporter: Andy Grove
 Fix For: 1.0.0


Update unit tests to use physical query plan (once all features are supported)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6694) [Rust] [DataFusion] Update integration tests to use physical plan

2019-09-25 Thread Andy Grove (Jira)
Andy Grove created ARROW-6694:
-

 Summary: [Rust] [DataFusion] Update integration tests to use 
physical plan
 Key: ARROW-6694
 URL: https://issues.apache.org/jira/browse/ARROW-6694
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust, Rust - DataFusion
Reporter: Andy Grove
 Fix For: 1.0.0


Update integration tests to use physical query plan (once all features are 
supported)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6692) [Rust] [DataFusion] Update examples to use physical query plan

2019-09-25 Thread Andy Grove (Jira)
Andy Grove created ARROW-6692:
-

 Summary: [Rust] [DataFusion] Update examples to use physical query 
plan
 Key: ARROW-6692
 URL: https://issues.apache.org/jira/browse/ARROW-6692
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust, Rust - DataFusion
Reporter: Andy Grove
 Fix For: 1.0.0


Update examples to use physical query plan



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6691) [Rust] [DataFusion] Use tokio and Futures instead of spawning threads

2019-09-25 Thread Andy Grove (Jira)
Andy Grove created ARROW-6691:
-

 Summary: [Rust] [DataFusion] Use tokio and Futures instead of 
spawning threads
 Key: ARROW-6691
 URL: https://issues.apache.org/jira/browse/ARROW-6691
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust, Rust - DataFusion
Reporter: Andy Grove
 Fix For: 1.0.0


The current implementation of the physical query plan uses "thread::spawn" 
which is expensive. We should switch to using Futures and tokio so that we are 
launching tasks in a thread pool instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6690) [Rust] [DataFusion] HashAggregate without GROUP BY should use SIMD

2019-09-25 Thread Andy Grove (Jira)
Andy Grove created ARROW-6690:
-

 Summary: [Rust] [DataFusion] HashAggregate without GROUP BY should 
use SIMD
 Key: ARROW-6690
 URL: https://issues.apache.org/jira/browse/ARROW-6690
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust, Rust - DataFusion
Reporter: Andy Grove
 Fix For: 1.0.0


Currently the implementation of HashAggregate in the new physical plan uses the 
same logic regardless of whether a grouping expression is used.

For the case where there is no grouping expression, such as "SELECT SUM(a) FROM 
b" we can use the compute kernels to perform an aggregate operation on each 
batch rather than iterating over each row and accumulating individual values.

This optimization already exists in the original implementation of aggregate 
queries direct from the logical plan.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6689) [Rust] [DataFusion] Optimize query execution

2019-09-25 Thread Andy Grove (Jira)
Andy Grove created ARROW-6689:
-

 Summary: [Rust] [DataFusion] Optimize query execution
 Key: ARROW-6689
 URL: https://issues.apache.org/jira/browse/ARROW-6689
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust, Rust - DataFusion
Reporter: Andy Grove
 Fix For: 1.0.0


There a number of optimizations that can be made to the new query execution and 
this is a top level story to track them all.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6688) [Packaging] Include s3 support in the conda packages

2019-09-25 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-6688:
--

 Summary: [Packaging] Include s3 support in the conda packages 
 Key: ARROW-6688
 URL: https://issues.apache.org/jira/browse/ARROW-6688
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Packaging
Reporter: Krisztian Szucs
Assignee: Krisztian Szucs






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6687) [Rust] [DataFusion]

2019-09-25 Thread Andy Grove (Jira)
Andy Grove created ARROW-6687:
-

 Summary: [Rust] [DataFusion]
 Key: ARROW-6687
 URL: https://issues.apache.org/jira/browse/ARROW-6687
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust, Rust - DataFusion
Affects Versions: 0.15.0
Reporter: Andy Grove


I received this bug report directly via email:

 

Hi,
 
I've just tried out the master branch of the arrow lib, the SQL interface for 
parquet file generated by pyarrow 0.14.1 and pandas 0.25.1
 
It returns incorrect num_rows for my file (with ~3000columns x 2456rows), it's 
actually the batch size number 1024*1024 instead of the 2456 rows. The query is 
simple SELECT col FROM data and it's the sample code you've created and works 
for the test file in the arrow testing repo.
 
Sorry for reporting the issue via mail, it was faster & easier this way. 
 
I'm super happy and grateful that you decided to add parquet support. This is 
an awesome project, keep up the good work!
 
Best regards,
Adam Lippai



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6686) [CI] Pull and push docker images to speed up the nightly builds

2019-09-25 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-6686:
--

 Summary: [CI] Pull and push docker images to speed up the nightly 
builds 
 Key: ARROW-6686
 URL: https://issues.apache.org/jira/browse/ARROW-6686
 Project: Apache Arrow
  Issue Type: Improvement
  Components: CI
Reporter: Krisztian Szucs
Assignee: Krisztian Szucs






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6685) [C++/Python] S3 FileStat object's base_path and type depends on trailing slash

2019-09-25 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-6685:
--

 Summary: [C++/Python] S3 FileStat object's base_path and type 
depends on trailing slash
 Key: ARROW-6685
 URL: https://issues.apache.org/jira/browse/ARROW-6685
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Krisztian Szucs


The current behaviour is:

{code:python}
s3fs.create_dir('bucket/directory/')
stats = s3fs.get_target_stats(['bucket/directory/'])
stats[0].type == FileType.File
stats[0].base_name == '/'
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6684) [C++/Python] S3FileSystem.create_dir should raise for a nested directory with recursive keyword set to False

2019-09-25 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-6684:
--

 Summary: [C++/Python] S3FileSystem.create_dir should raise for a 
nested directory with recursive keyword set to False
 Key: ARROW-6684
 URL: https://issues.apache.org/jira/browse/ARROW-6684
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Krisztian Szucs


{{s3fs.create_dir('bucket/deeply/nested/test-directory/', recursive=False)}} 
doesn't raise.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)