Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

2020-02-25 Thread Andy Grove
I was wondering if there had been any momentum on this (the BiDirectional
RPC design)?

I'm interested in this for the use case of Apache Spark sending a stream of
data to another process to invoke custom code and then receive a stream
back with the transformed data.

Thanks,

Andy.



On Fri, Dec 13, 2019 at 12:12 PM Jacques Nadeau  wrote:

> I support moving forward with the current proposal.
>
> On Thu, Dec 12, 2019 at 12:20 PM David Li  wrote:
>
> > Just following up here again, any other thoughts?
> >
> > I think we do have justifications for potentially separate streams in
> > a call, but that's more of an orthogonal question - it doesn't need to
> > be addressed here. I do agree that it very much complicates things.
> >
> > Thanks,
> > David
> >
> > On 11/29/19, Wes McKinney  wrote:
> > > I would generally agree with this. Note that you have the possibility
> > > to use unions-of-structs to send record batches with different schemas
> > > in the same stream, though with some added complexity on each side
> > >
> > > On Thu, Nov 28, 2019 at 10:37 AM Jacques Nadeau 
> > wrote:
> > >>
> > >> I'd vote for explicitly not supported. We should keep our primitives
> > >> narrow.
> > >>
> > >> On Wed, Nov 27, 2019, 1:17 PM David Li  wrote:
> > >>
> > >> > Thanks for the feedback.
> > >> >
> > >> > I do think if we had explicitly embraced gRPC from the beginning,
> > >> > there are a lot of places where things could be made more ergonomic,
> > >> > including with the metadata fields. But it would also have locked
> out
> > >> > us of potential future transports.
> > >> >
> > >> > On another note: I hesitate to put too much into this method, but we
> > >> > are looking at use cases where potentially, a client may want to
> > >> > upload multiple distinct datasets (with differing schemas). (This
> is a
> > >> > little tentative, and I can get more details...) Right now, each
> > >> > logical stream in Flight must have a single, consistent schema;
> would
> > >> > it make sense to look at ways to relax this, or declare this
> > >> > explicitly out of scope (and require multiple calls and coordination
> > >> > with the deployment topology) in order to accomplish this?
> > >> >
> > >> > Best,
> > >> > David
> > >> >
> > >> > On 11/27/19, Jacques Nadeau  wrote:
> > >> > > Fair enough. I'm okay with the bytes approach and the proposal
> looks
> > >> > > good
> > >> > > to me.
> > >> > >
> > >> > > On Fri, Nov 8, 2019 at 11:37 AM David Li 
> > >> > > wrote:
> > >> > >
> > >> > >> I've updated the proposal.
> > >> > >>
> > >> > >> On the subject of Protobuf Any vs bytes, and how to handle
> > >> > >> errors/metadata, I still think using bytes is preferable:
> > >> > >> - It doesn't require (conditionally) exposing or wrapping
> Protobuf
> > >> > types,
> > >> > >> - We wouldn't be able to practically expose the Protobuf field to
> > >> > >> C++
> > >> > >> users without causing build pains,
> > >> > >> - We can't let Python users take advantage of the Protobuf field
> > >> > >> without somehow being compatible with the Protobuf wheels (by
> > >> > >> linking
> > >> > >> to the same version, and doing magic to turn the C++ Protobufs
> into
> > >> > >> the Python ones),
> > >> > >> - All our other application-defined fields are already bytes.
> > >> > >>
> > >> > >> Applications that want structure can encode JSON or Protobuf Any
> > >> > >> into
> > >> > >> the bytes field themselves, much as you can already do for
> Ticket,
> > >> > >> commands in FlightDescriptors, and application metadata in
> > >> > >> DoGet/DoPut. I don't think this is (much) less efficient than
> using
> > >> > >> Any directly, since Any itself is a bytes field with a tag, and
> > must
> > >> > >> invoke the Protobuf deserializer again to read the actual
> message.
> > >> > >>
> > >> > >> If we decide on using bytes, then I don't think it makes sense to
> > >> > >> define a new message with a oneof either, since it would be
> > >> > >> redundant.
> > >> > >>
> > >> > >> Thanks,
> > >> > >> David
> > >> > >>
> > >> > >> On 11/7/19, David Li  wrote:
> > >> > >> > I've been extremely backlogged, I will update the proposal
> when I
> > >> > >> > get
> > >> > >> > a chance and reply here when done.
> > >> > >> >
> > >> > >> > Best,
> > >> > >> > David
> > >> > >> >
> > >> > >> > On 11/7/19, Wes McKinney  wrote:
> > >> > >> >> Bumping this discussion since a couple of weeks have passed.
> It
> > >> > >> >> seems
> > >> > >> >> there are still some questions here, could we summarize what
> are
> > >> > >> >> the
> > >> > >> >> alternatives along with any public API implications so we can
> > try
> > >> > >> >> to
> > >> > >> >> render a decision?
> > >> > >> >>
> > >> > >> >> On Sat, Oct 26, 2019 at 7:19 PM David Li <
> li.david...@gmail.com
> > >
> > >> > >> >> wrote:
> > >> > >> >>>
> > >> > >> >>> Hi Wes,
> > >> > >> >>>
> > >> > >> >>> Responses inline:
> > >> > >> >>>
> > >> > >> >>> On Sat, Oct 26, 2019, 13:46 Wes McKinney <
> 

[jira] [Created] (ARROW-7940) Unable to generate cmake build with settings other than default

2020-02-25 Thread Valery Vybornov (Jira)
Valery Vybornov created ARROW-7940:
--

 Summary: Unable to generate cmake build with settings other than 
default
 Key: ARROW-7940
 URL: https://issues.apache.org/jira/browse/ARROW-7940
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Affects Versions: 0.16.0
 Environment: Windows 10
Visual Studio 2019 Build Tools 16.4.5
Reporter: Valery Vybornov
 Attachments: log.txt

Steps to reproduce:
 # Install conda-forge as described here: 
[https://arrow.apache.org/docs/developers/cpp/windows.html#using-conda-forge-for-build-dependencies]
 # Install ninja+clcache 
[https://arrow.apache.org/docs/developers/cpp/windows.html#building-with-ninja-and-clcache]
 # (git bash) git clone [https://github.com/apache/arrow.git]
cd arrow/
git checkout apache-arrow-0.16.0
 # (cmd)
call C:\Users\vvv\Miniconda3\Scripts\activate.bat C:\Users\vvv\Miniconda3
call "C:\Program Files (x86)\Microsoft Visual 
Studio\2019\BuildTools\VC\Auxiliary\Build\vcvarsall.bat" x64
call conda activate arrow-dev
 # cd arrow\cpp
mkdir build
cd build
 # cmake -G "Ninja" -DARROW_BUILD_EXAMPLES=ON -DARROW_BUILD_UTILITIES=ON 
-DARROW_FLIGHT=ON -DARROW_GANDIVA=ON -DARROW_PARQUET=ON ..
 # cmake --build . --config Release

Expected results: Examples, utilities, flight, gandiva, parquet built.

Actual results: Default configuration and none of the above built. 
cmake_summary.json indicates all these features OFF. Following lines in the 
output of cmake:
{code:java}
-- Configuring done
You have changed variables that require your cache to be deleted.
Configure will be re-run and you may have to reset some variables.
The following variables have changed:
CMAKE_CXX_COMPILER= C:/Program Files (x86)/Microsoft Visual 
Studio/2019/BuildTools/VC/Tools/MSVC/14.24.28314/bin/Hostx64/x64/cl.exe
-- Building using CMake version: 3.16.4 {code}
Full cmake output attached



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7942) [C++][Parquet] Examine Arrow-decoding perf regressions introduced by PARQUET-1797

2020-02-25 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-7942:
---

 Summary: [C++][Parquet] Examine Arrow-decoding perf regressions 
introduced by PARQUET-1797
 Key: ARROW-7942
 URL: https://issues.apache.org/jira/browse/ARROW-7942
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Wes McKinney


See discussion at 
https://github.com/apache/arrow/pull/6440#issuecomment-591053705



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7941) [Rust] Logical plan should refer to columns by name not index

2020-02-25 Thread Andy Grove (Jira)
Andy Grove created ARROW-7941:
-

 Summary: [Rust] Logical plan should refer to columns by name not 
index
 Key: ARROW-7941
 URL: https://issues.apache.org/jira/browse/ARROW-7941
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust, Rust - DataFusion
Affects Versions: 0.16.0
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 1.0.0


I made a mistake in the design of the logical plan. It is better to refer to 
columns by name rather than index.

Benefits of making this change:
 * Allows for support for schemaless data sources e.g. JSON
 * Reduces the complexity of the optimizer rules

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Flight testing inconsistency for empty batches

2020-02-25 Thread David Li
Hey Bryan,

Thanks for looking into this issue. I would vote that we should
validate each batch independently, so we can catch issues related to
the structure of the data and not just the content. C++ doesn't do any
detection of empty batches per se, but on both ends it reads all the
data into a table, which would eliminate any empty batches.

It also wouldn't be reasonable to stop sending batches that are empty,
because Flight lets you attach metadata to batches, and so an empty
batch might still have metadata that the client or server wants.

Best,
David

On 2/24/20, Bryan Cutler  wrote:
> While looking into Null type testing for ARROW-7899, a couple small issues
> came up regarding Flight integration testing with empty batches (row count
> == 0) that could be worked out with a quick discussion. It seems there is a
> small difference between the C++ and Java Flight servers when there are
> empty record batches at the end of a stream, more details in PR
> https://github.com/apache/arrow/pull/6476.
>
> The Java server sends all record batches, even the empty ones, and the test
> client verifies each of these batches matches the batches read from a JSON
> file. The C++ servers seems to recognize if the end of the stream is only
> empty batches (please correct me if I'm wrong) and will not serve them.
> This seems reasonable, as there is no more actual data left in the stream.
> The C++ test client reads all batches into a table, does the same for the
> JSON file, and compares final Tables. I also noticed that empty batches in
> the middle of the stream will be served.  My questions are:
>
> 1) What is the expected behavior of a Flight server for empty record
> batches, can they be ignored and not sent to the Client?
>
> 2) Is it good enough to test against a final concatenation of all batches
> in the stream or should each batch be verified individually to ensure the
> server is sending out correctly batched data?
>
> Thanks,
> Bryan
>


Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

2020-02-25 Thread David Li
Hey Andy,

I've been rather busy unfortunately. I had started on an
implementation in C++ to provide as part of this discussion, but it's
not complete. I'm hoping to have more done in March.

Best,
David

On 2/25/20, Andy Grove  wrote:
> I was wondering if there had been any momentum on this (the BiDirectional
> RPC design)?
>
> I'm interested in this for the use case of Apache Spark sending a stream of
> data to another process to invoke custom code and then receive a stream
> back with the transformed data.
>
> Thanks,
>
> Andy.
>
>
>
> On Fri, Dec 13, 2019 at 12:12 PM Jacques Nadeau  wrote:
>
>> I support moving forward with the current proposal.
>>
>> On Thu, Dec 12, 2019 at 12:20 PM David Li  wrote:
>>
>> > Just following up here again, any other thoughts?
>> >
>> > I think we do have justifications for potentially separate streams in
>> > a call, but that's more of an orthogonal question - it doesn't need to
>> > be addressed here. I do agree that it very much complicates things.
>> >
>> > Thanks,
>> > David
>> >
>> > On 11/29/19, Wes McKinney  wrote:
>> > > I would generally agree with this. Note that you have the possibility
>> > > to use unions-of-structs to send record batches with different
>> > > schemas
>> > > in the same stream, though with some added complexity on each side
>> > >
>> > > On Thu, Nov 28, 2019 at 10:37 AM Jacques Nadeau 
>> > wrote:
>> > >>
>> > >> I'd vote for explicitly not supported. We should keep our primitives
>> > >> narrow.
>> > >>
>> > >> On Wed, Nov 27, 2019, 1:17 PM David Li 
>> > >> wrote:
>> > >>
>> > >> > Thanks for the feedback.
>> > >> >
>> > >> > I do think if we had explicitly embraced gRPC from the beginning,
>> > >> > there are a lot of places where things could be made more
>> > >> > ergonomic,
>> > >> > including with the metadata fields. But it would also have locked
>> out
>> > >> > us of potential future transports.
>> > >> >
>> > >> > On another note: I hesitate to put too much into this method, but
>> > >> > we
>> > >> > are looking at use cases where potentially, a client may want to
>> > >> > upload multiple distinct datasets (with differing schemas). (This
>> is a
>> > >> > little tentative, and I can get more details...) Right now, each
>> > >> > logical stream in Flight must have a single, consistent schema;
>> would
>> > >> > it make sense to look at ways to relax this, or declare this
>> > >> > explicitly out of scope (and require multiple calls and
>> > >> > coordination
>> > >> > with the deployment topology) in order to accomplish this?
>> > >> >
>> > >> > Best,
>> > >> > David
>> > >> >
>> > >> > On 11/27/19, Jacques Nadeau  wrote:
>> > >> > > Fair enough. I'm okay with the bytes approach and the proposal
>> looks
>> > >> > > good
>> > >> > > to me.
>> > >> > >
>> > >> > > On Fri, Nov 8, 2019 at 11:37 AM David Li 
>> > >> > > wrote:
>> > >> > >
>> > >> > >> I've updated the proposal.
>> > >> > >>
>> > >> > >> On the subject of Protobuf Any vs bytes, and how to handle
>> > >> > >> errors/metadata, I still think using bytes is preferable:
>> > >> > >> - It doesn't require (conditionally) exposing or wrapping
>> Protobuf
>> > >> > types,
>> > >> > >> - We wouldn't be able to practically expose the Protobuf field
>> > >> > >> to
>> > >> > >> C++
>> > >> > >> users without causing build pains,
>> > >> > >> - We can't let Python users take advantage of the Protobuf
>> > >> > >> field
>> > >> > >> without somehow being compatible with the Protobuf wheels (by
>> > >> > >> linking
>> > >> > >> to the same version, and doing magic to turn the C++ Protobufs
>> into
>> > >> > >> the Python ones),
>> > >> > >> - All our other application-defined fields are already bytes.
>> > >> > >>
>> > >> > >> Applications that want structure can encode JSON or Protobuf
>> > >> > >> Any
>> > >> > >> into
>> > >> > >> the bytes field themselves, much as you can already do for
>> Ticket,
>> > >> > >> commands in FlightDescriptors, and application metadata in
>> > >> > >> DoGet/DoPut. I don't think this is (much) less efficient than
>> using
>> > >> > >> Any directly, since Any itself is a bytes field with a tag, and
>> > must
>> > >> > >> invoke the Protobuf deserializer again to read the actual
>> message.
>> > >> > >>
>> > >> > >> If we decide on using bytes, then I don't think it makes sense
>> > >> > >> to
>> > >> > >> define a new message with a oneof either, since it would be
>> > >> > >> redundant.
>> > >> > >>
>> > >> > >> Thanks,
>> > >> > >> David
>> > >> > >>
>> > >> > >> On 11/7/19, David Li  wrote:
>> > >> > >> > I've been extremely backlogged, I will update the proposal
>> when I
>> > >> > >> > get
>> > >> > >> > a chance and reply here when done.
>> > >> > >> >
>> > >> > >> > Best,
>> > >> > >> > David
>> > >> > >> >
>> > >> > >> > On 11/7/19, Wes McKinney  wrote:
>> > >> > >> >> Bumping this discussion since a couple of weeks have passed.
>> It
>> > >> > >> >> seems
>> > >> > >> >> there are still some questions here, could we summarize what
>> are
>> 

[jira] [Created] (ARROW-7934) [C++] Fix UriEscape for empty string

2020-02-25 Thread Projjal Chanda (Jira)
Projjal Chanda created ARROW-7934:
-

 Summary: [C++] Fix UriEscape for empty string
 Key: ARROW-7934
 URL: https://issues.apache.org/jira/browse/ARROW-7934
 Project: Apache Arrow
  Issue Type: Task
Reporter: Projjal Chanda
Assignee: Projjal Chanda






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7935) [Java] Remove Netty dependency for BufferAllocator and ReferenceManager

2020-02-25 Thread Liya Fan (Jira)
Liya Fan created ARROW-7935:
---

 Summary: [Java] Remove Netty dependency for BufferAllocator and 
ReferenceManager
 Key: ARROW-7935
 URL: https://issues.apache.org/jira/browse/ARROW-7935
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: Liya Fan
Assignee: Liya Fan


With previous work (ARROW-7329 and ARROW-7505), Netty based allocation is only 
one of the possible implementations. So we need to revise BufferAllocator and 
ReferenceManager, to make them general, and independent Netty libraries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7936) [Python] FileSystem.from_uri test fails on python 3.5

2020-02-25 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-7936:
--

 Summary: [Python] FileSystem.from_uri test fails on python 3.5
 Key: ARROW-7936
 URL: https://issues.apache.org/jira/browse/ARROW-7936
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Reporter: Krisztian Szucs
 Fix For: 1.0.0


See build failure at 
https://dev.azure.com/ursa-labs/crossbow/_build/results?buildId=7535=logs=0da5d1d9-276d-5173-c4c4-9d4d4ed14fdb=6c939d89-0d1a-51f2-8b30-091a7a82e98c=288



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[NIGHTLY] Arrow Build Report for Job nightly-2020-02-25-0

2020-02-25 Thread Crossbow


Arrow Build Report for Job nightly-2020-02-25-0

All tasks: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-25-0

Failed Tasks:
- gandiva-jar-trusty:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-25-0-travis-gandiva-jar-trusty
- macos-r-autobrew:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-25-0-travis-macos-r-autobrew
- test-conda-python-3.7-pandas-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-25-0-circle-test-conda-python-3.7-pandas-master
- test-conda-python-3.7-turbodbc-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-25-0-circle-test-conda-python-3.7-turbodbc-latest
- test-conda-python-3.7-turbodbc-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-25-0-circle-test-conda-python-3.7-turbodbc-master
- test-conda-python-3.8-jpype:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-25-0-circle-test-conda-python-3.8-jpype
- wheel-manylinux1-cp35m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-25-0-azure-wheel-manylinux1-cp35m
- wheel-manylinux2010-cp35m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-25-0-azure-wheel-manylinux2010-cp35m
- wheel-manylinux2014-cp35m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-25-0-azure-wheel-manylinux2014-cp35m
- wheel-osx-cp35m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-25-0-travis-wheel-osx-cp35m
- wheel-osx-cp36m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-25-0-travis-wheel-osx-cp36m

Succeeded Tasks:
- centos-6:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-25-0-github-centos-6
- centos-7:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-25-0-github-centos-7
- centos-8:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-25-0-github-centos-8
- conda-linux-gcc-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-25-0-azure-conda-linux-gcc-py36
- conda-linux-gcc-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-25-0-azure-conda-linux-gcc-py37
- conda-linux-gcc-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-25-0-azure-conda-linux-gcc-py38
- conda-osx-clang-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-25-0-azure-conda-osx-clang-py36
- conda-osx-clang-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-25-0-azure-conda-osx-clang-py37
- conda-osx-clang-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-25-0-azure-conda-osx-clang-py38
- conda-win-vs2015-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-25-0-azure-conda-win-vs2015-py36
- conda-win-vs2015-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-25-0-azure-conda-win-vs2015-py37
- conda-win-vs2015-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-25-0-azure-conda-win-vs2015-py38
- debian-buster:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-25-0-github-debian-buster
- debian-stretch:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-25-0-github-debian-stretch
- gandiva-jar-osx:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-25-0-travis-gandiva-jar-osx
- homebrew-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-25-0-travis-homebrew-cpp
- test-conda-cpp-valgrind:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-25-0-circle-test-conda-cpp-valgrind
- test-conda-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-25-0-circle-test-conda-cpp
- test-conda-python-3.6:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-25-0-circle-test-conda-python-3.6
- test-conda-python-3.7-dask-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-25-0-circle-test-conda-python-3.7-dask-latest
- test-conda-python-3.7-hdfs-2.9.2:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-25-0-circle-test-conda-python-3.7-hdfs-2.9.2
- test-conda-python-3.7-pandas-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-25-0-circle-test-conda-python-3.7-pandas-latest
- test-conda-python-3.7-spark-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-02-25-0-circle-test-conda-python-3.7-spark-master
- test-conda-python-3.7:
  URL: 

[jira] [Created] (ARROW-7937) [Python][Packaging] Remove boost from the macos wheels

2020-02-25 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-7937:
--

 Summary: [Python][Packaging] Remove boost from the macos wheels
 Key: ARROW-7937
 URL: https://issues.apache.org/jira/browse/ARROW-7937
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Packaging, Python
Reporter: Krisztian Szucs
 Fix For: 1.0.0


Only boost_regex is required for libarrow but only on gcc < 4.9, see

https://github.com/apache/arrow/blob/f609298f8f00783a6704608ca8493227a552abab/cpp/src/parquet/metadata.cc#L38

so we can remove the bundled boost libraries from the macos wheels as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7938) [C++] Add tests for DayTimeIntervalBuilder

2020-02-25 Thread Ben Kietzman (Jira)
Ben Kietzman created ARROW-7938:
---

 Summary: [C++] Add tests for DayTimeIntervalBuilder
 Key: ARROW-7938
 URL: https://issues.apache.org/jira/browse/ARROW-7938
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Affects Versions: 0.16.0
Reporter: Ben Kietzman
Assignee: Micah Kornfield
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7939) Python crashes when reading parquet file compressed with snappy

2020-02-25 Thread Marc Bernot (Jira)
Marc Bernot created ARROW-7939:
--

 Summary: Python crashes when reading parquet file compressed with 
snappy
 Key: ARROW-7939
 URL: https://issues.apache.org/jira/browse/ARROW-7939
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.16.0
 Environment: Windows 7
python 3.6.9
pyarrow 0.16 from conda-forge
Reporter: Marc Bernot


When I installed pyarrow 0.16, some parquet files created with pyarrow 0.15.1 
would make python crash. I drilled down to the simplest example I could find.

It happens that some parquet files created with pyarrow 0.16 cannot either be 
read back. The example below works fine with arrays_ok but python crashes with 
arrays_nok.

Besides, it works fine with 'none', 'gzip' and 'brotli' compression. The 
problem seems to happen only with snappy.
{code:python}
import pyarrow.parquet as pq
import pyarrow as pa
arrays_ok = [[0,1]]
arrays_nok = [[0,1,2]]
table = pa.Table.from_arrays(arrays_nok,names=['a'])
pq.write_table(table,'foo.parquet',compression='snappy')
pq.read_table('foo.parquet')
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)