Re: [RESULT] [VOTE] Alter Arrow binary protocol to address 8-byte Flatbuffer alignment requirements (2nd vote)

2019-09-10 Thread Bryan Cutler
I have the patch for the EOS with Java writers up here
https://github.com/apache/arrow/pull/5345. Just to clarify, the EOS of
{0x, 0x} is used for both stream and file formats, in
non-legacy writing mode.

On Mon, Sep 9, 2019 at 8:01 PM Bryan Cutler  wrote:

> Sounds good to me also and I don't think we need a vote either.
>
> On Sat, Sep 7, 2019 at 7:36 PM Micah Kornfield 
> wrote:
>
>> +1 on this, I also don't think a vote is necessary as long as we make the
>> change before 0.15.0
>>
>> On Saturday, September 7, 2019, Wes McKinney  wrote:
>>
>> > I see, thank you for catching this nuance.
>> >
>> > I agree that using {0x, 0x} for EOS will resolve the
>> > issue while allowing implementations to be backwards compatible (i.e.
>> > handling the 4-byte EOS from older payloads).
>> >
>> > I'm not sure that we need to have a vote about this, what do others
>> think?
>> >
>> > On Sat, Sep 7, 2019 at 12:47 AM Ji Liu 
>> wrote:
>> > >
>> > > Hi all,
>> > >
>> > > During the java code review[1], seems there is a problem with the
>> > current implementations(C++/Java etc) when reaching EOS, since the new
>> > format EOS is 8 bytes and the reader only reads 4 bytes when reach the
>> end
>> > of stream, and the additional 4 bytes will not be read which cause
>> problems
>> > for following up readings.
>> > >
>> > > There are some optional suggestions[2] as below, we should reach
>> > consistent and fix this problem before 0.15 release.
>> > > i. For the new format, an 8-byte EOS token should look like
>> {0x,
>> > 0x}, so we read the continuation token first, and then know to
>> read
>> > the next 4 bytes, which are then 0 to signal EOS.ii. Reader just
>> remember
>> > the state, so if it reads the continuation token from the beginning,
>> then
>> > read all 8 bytes at the end.
>> > >
>> > > Thanks,
>> > > Ji Liu
>> > >
>> > > [1] https://github.com/apache/arrow/pull/5229
>> > > [2] https://github.com/apache/arrow/pull/5229#discussion_r321715682
>> > >
>> > >
>> > >
>> > >
>> > > --
>> > > From:Eric Erhardt 
>> > > Send Time:2019年9月5日(星期四) 07:16
>> > > To:dev@arrow.apache.org ; Ji Liu <
>> > niki...@aliyun.com>
>> > > Cc:emkornfield ; Paul Taylor <
>> ptay...@apache.org>
>> > > Subject:RE: [RESULT] [VOTE] Alter Arrow binary protocol to address
>> > 8-byte Flatbuffer alignment requirements (2nd vote)
>> > >
>> > > The C# PR is up.
>> > >
>> > > https://github.com/apache/arrow/pull/5280
>> > >
>> > > Eric
>> > >
>> > > -Original Message-
>> > > From: Eric Erhardt 
>> > > Sent: Wednesday, September 4, 2019 10:12 AM
>> > > To: dev@arrow.apache.org; Ji Liu 
>> > > Cc: emkornfield ; Paul Taylor <
>> ptay...@apache.org
>> > >
>> > > Subject: RE: [RESULT] [VOTE] Alter Arrow binary protocol to address
>> > 8-byte Flatbuffer alignment requirements (2nd vote)
>> > >
>> > > I'm working on a PR for the C# bindings. I hope to have it up in the
>> > next day or two. Integration tests for C# would be a great addition at
>> some
>> > point - it's been on my backlog. For now I plan on manually testing it.
>> > >
>> > > -Original Message-
>> > > From: Wes McKinney 
>> > > Sent: Tuesday, September 3, 2019 10:17 PM
>> > > To: Ji Liu 
>> > > Cc: emkornfield ; dev ;
>> > Paul Taylor 
>> > > Subject: Re: [RESULT] [VOTE] Alter Arrow binary protocol to address
>> > 8-byte Flatbuffer alignment requirements (2nd vote)
>> > >
>> > > hi folks,
>> > >
>> > > We now have patches up for Java, JS, and Go. How are we doing on the
>> > code reviews for getting these in?
>> > >
>> > > Since C# implements the binary protocol, the C# developers might want
>> to
>> > look at this before the 0.15.0 release also. Absent integration tests
>> it's
>> > difficult to verify the C# library, though
>> > >
>> > > Thanks
>> > >
>> > > On Thu, Aug 29, 2019 at 8:13 AM Ji Liu  wrote:
>> > > >
>> > > > Here is the Java implementation
>> > > >
>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith
>> > > > ub.com
>> %2Fapache%2Farrow%2Fpull%2F5229data=02%7C01%7CEric.Erhardt%
>> > > > 40microsoft.com
>> %7C90f02600c4ce40ff5c9008d730e66b68%7C72f988bf86f141af9
>> > > >
>> 1ab2d7cd011db47%7C1%7C0%7C637031638512163816sdata=b87u5x8lLvfdnU5
>> > > > 6LrGzYR8H0Jh8FfwY2cVjbOsY9hY%3Dreserved=0
>> > > >
>> > > > cc @Wes McKinney @emkornfield
>> > > >
>> > > > Thanks,
>> > > > Ji Liu
>> > > >
>> > > > --
>> > > > From:Ji Liu  Send Time:2019年8月28日(星期三)
>> > > > 17:34 To:emkornfield ; dev
>> > > >  Cc:Paul Taylor 
>> > > > Subject:Re: [RESULT] [VOTE] Alter Arrow binary protocol to address
>> > > > 8-byte Flatbuffer alignment requirements (2nd vote)
>> > > >
>> > > > I could take the Java implementation and will take a close watch on
>> > this issue in the next few days.
>> > > >
>> > > > Thanks,
>> > > > Ji Liu
>> > > >
>> > > >
>> > > > 

Re: Timeline for 0.15.0 release

2019-09-10 Thread Micah Kornfield
I should have a little more bandwidth to help with some of the packaging
starting tomorrow and going into the weekend.

On Tuesday, September 10, 2019, Wes McKinney  wrote:

> Hi folks,
>
> With the state of nightly packaging and integration builds things aren't
> looking too good for being in release readiness by the end of this week but
> maybe I'm wrong. I'm planning to be working to close as many issues as I
> can and also to help with the ongoing alignment fixes.
>
> Wes
>
> On Thu, Sep 5, 2019, 11:07 PM Micah Kornfield 
> wrote:
>
>> Just for reference [1] has a dashboard of the current issues:
>>
>> https://cwiki.apache.org/confluence/display/ARROW/Arrow+0.15.0+Release
>>
>> On Thu, Sep 5, 2019 at 3:43 PM Wes McKinney  wrote:
>>
>>> hi all,
>>>
>>> It doesn't seem like we're going to be in a position to release at the
>>> beginning of next week. I hope that one more week of work (or less)
>>> will be enough to get us there. Aside from merging the alignment
>>> changes, we need to make sure that our packaging jobs required for the
>>> release candidate are all working.
>>>
>>> If folks could remove issues from the 0.15.0 backlog that they don't
>>> think they will finish by end of next week that would help focus
>>> efforts (there are currently 78 issues in 0.15.0 still). I am looking
>>> to tackle a few small features related to dictionaries while the
>>> release window is still open.
>>>
>>> - Wes
>>>
>>> On Tue, Aug 27, 2019 at 3:48 PM Wes McKinney 
>>> wrote:
>>> >
>>> > hi,
>>> >
>>> > I think we should try to release the week of September 9, so
>>> > development work should be completed by end of next week.
>>> >
>>> > Does that seem reasonable?
>>> >
>>> > I plan to get up a patch for the protocol alignment changes for C++ in
>>> > the next couple of days -- I think that getting the alignment work
>>> > done is the main barrier to releasing.
>>> >
>>> > Thanks
>>> > Wes
>>> >
>>> > On Mon, Aug 19, 2019 at 12:25 PM Ji Liu 
>>> wrote:
>>> > >
>>> > > Hi, Wes, on the java side, I can think of several bugs that need to
>>> be fixed or reminded.
>>> > >
>>> > > i. ARROW-6040: Dictionary entries are required in IPC streams even
>>> when empty[1]
>>> > > This one is under review now, however through this PR we find that
>>> there seems a bug in java reading and writing dictionaries in IPC which is
>>> Inconsistent with spec[2] since it assumes all dictionaries are at the
>>> start of stream (see details in PR comments,  and this fix may not catch up
>>> with version 0.15). @Micah Kornfield
>>> > >
>>> > > ii. ARROW-1875: Write 64-bit ints as strings in integration test
>>> JSON files[3]
>>> > > Java side code already checked in, other implementations seems not.
>>> > >
>>> > > iii. ARROW-6202: OutOfMemory in JdbcAdapter[4]
>>> > > Caused by trying to load all records in one contiguous batch, fixed
>>> by providing iterator API for iteratively reading in ARROW-6219[5].
>>> > >
>>> > > Thanks,
>>> > > Ji Liu
>>> > >
>>> > > [1] https://github.com/apache/arrow/pull/4960
>>> > > [2] https://arrow.apache.org/docs/ipc.html
>>> > > [3] https://issues.apache.org/jira/browse/ARROW-1875
>>> > > [4] https://issues.apache.org/jira/browse/ARROW-6202[5]
>>> https://issues.apache.org/jira/browse/ARROW-6219
>>> > >
>>> > >
>>> > >
>>> > > --
>>> > > From:Wes McKinney 
>>> > > Send Time:2019年8月19日(星期一) 23:03
>>> > > To:dev 
>>> > > Subject:Re: Timeline for 0.15.0 release
>>> > >
>>> > > I'm going to work some on organizing the 0.15.0 backlog some this
>>> > > week, if anyone wants to help with grooming (particularly for
>>> > > languages other than C++/Python where I'm focusing) that would be
>>> > > helpful. There have been almost 500 JIRA issues opened since the
>>> > > 0.14.0 release, so we should make sure to check whether there's any
>>> > > regressions or other serious bugs that we should try to fix for
>>> > > 0.15.0.
>>> > >
>>> > > On Thu, Aug 15, 2019 at 6:23 PM Wes McKinney 
>>> wrote:
>>> > > >
>>> > > > The Windows wheel issue in 0.14.1 seems to be
>>> > > >
>>> > > > https://issues.apache.org/jira/browse/ARROW-6015
>>> > > >
>>> > > > I think the root cause could be the Windows changes in
>>> > > >
>>> > > > https://github.com/apache/arrow/commit/
>>> 223ae744cc2a12c60cecb5db593263a03c13f85a
>>> > > >
>>> > > > I would be appreciative if a volunteer would look into what was
>>> wrong
>>> > > > with the 0.14.1 wheels on Windows. Otherwise 0.15.0 Windows wheels
>>> > > > will be broken, too
>>> > > >
>>> > > > The bad wheels can be found at
>>> > > >
>>> > > > https://bintray.com/apache/arrow/python#files/python%2F0.14.1
>>> > > >
>>> > > > On Thu, Aug 15, 2019 at 1:28 PM Antoine Pitrou <
>>> solip...@pitrou.net> wrote:
>>> > > > >
>>> > > > > On Thu, 15 Aug 2019 11:17:07 -0700
>>> > > > > Micah Kornfield  wrote:
>>> > > > > > >
>>> > > > > > > In C++ they are
>>> > > > > > > independent, we could have 32-bit array lengths and
>>> 

[jira] [Created] (ARROW-6523) [C++][Dataset] arrow_dataset target does not depend on anything

2019-09-10 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-6523:
---

 Summary: [C++][Dataset] arrow_dataset target does not depend on 
anything
 Key: ARROW-6523
 URL: https://issues.apache.org/jira/browse/ARROW-6523
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Wes McKinney
 Fix For: 0.15.0


Other subcomponents have targets to allow their libraries or unit tests to be 
specifically built



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


Re: Consume tables from PyArrow independently from the python_arrow library

2019-09-10 Thread Tim Paine
Looking at the code it looks simple to add, I will look into it this week and 
do a PR if I get something useable.

Tim Paine
tim.paine.nyc
908-721-1185

> On Sep 10, 2019, at 19:35, Wes McKinney  wrote:
> 
> Hi Tim,
> 
> I see what you're saying now, sorry that I didn't understand sooner.
> 
> We actually need this feature to be able to pass instances of
> shared_ptr (under very controlled conditions) into R using
> reticulate, where T is any of
> 
> * Array
> * ChunkedArray
> * DataType
> * RecordBatch
> * Table
> * and some other classes
> 
> 
> I would suggest introducing a property on pyarrow Python objects that
> returns the memory address of the wrapped shared_ptr (i.e. the
> integer leading to shared_ptr*). Then you can create your copy of
> that. Would that work? The only reason this is not implemented is that
> no one has needed it yet, mechanically it does not strike me as that
> complex.
> 
> See https://issues.apache.org/jira/browse/ARROW-3750. My comment in
> November 2018 "Methods would need to be added to the Cython extension
> types to give the memory address of the smart pointer object they
> contain". I agree with my younger self. Are you up to submit a PR?
> 
> - Wes
> 
>> On Tue, Sep 10, 2019 at 6:31 PM Tim Paine  wrote:
>> 
>> The end goal is to go direct from pyarrow to wasm without intermediate 
>> transforms. I can definitely make it work as is, we'll just have to be 
>> careful that the code we compile to webassembly matches exactly either our 
>> local copy of arrow if the user hasn't installed pyarrow, otherwise their 
>> installed copy.
>> 
>> Tim Paine
>> tim.paine.nyc
>> 908-721-1185
>> 
>>> On Sep 10, 2019, at 19:12, Tim Paine  wrote:
>>> 
>>> We're building webassembly, so we obviously don't want to introduce a 
>>> pyarrow dependency. I don't want to do any pyarrow manipulations in c++, 
>>> just get the c++ table. I was hoping pyarrow might expose a raw pointer or 
>>> have something castable.
>>> 
>>> It seems to be a big limitation, there is no way of communicating a pyarrow 
>>> table to a c++ library that uses arrow without that library linking against 
>>> pyarrow.
>>> 
>>> Tim Paine
>>> tim.paine.nyc
>>> 908-721-1185
>>> 
 On Sep 10, 2019, at 17:44, Wes McKinney  wrote:
 
 The Python extension types are defined in Cython, not C or C++ so you need
 to load the Cython extensions in order to instantiate the classes.
 
 Why do you have 2 copies of the C++ library? That seems easy to fix. If you
 are using wheels from PyPI I would recommend that you switch to conda or
 your own wheels without the C++ libraries bundled.
 
> On Tue, Sep 10, 2019, 4:23 PM Tim Paine  wrote:
> 
> Is there no way to do it without PyArrow? My C++ library is building arrow
> itself, which means if I use PyArow I’ll end up having 2 copies: one from
> my local C++ only build, and one from PyArrow.
> 
>> On Sep 10, 2019, at 5:18 PM, Wes McKinney  wrote:
>> 
>> hi Tim,
>> 
>> You can use the functions in
>> 
>> 
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/python/pyarrow.h
>> 
>> You need to call "import_pyarrow()" from C++ before these APIs can be
>> used. It's similar to the NumPy C API in that regard
>> 
>> - Wes
>> 
>>> On Tue, Sep 10, 2019 at 4:13 PM Tim Paine  wrote:
>>> 
>>> Hey all, following up on a question I asked on stack overflow <
> https://stackoverflow.com/questions/57863751/how-to-convert-pyarrow-table-to-arrow-table-when-interfacing-between-pyarrow-in
>> .
>>> 
>>> It seems there is some code <
> https://arrow.apache.org/docs/python/extending.html#_CPPv412unwrap_tableP8PyObjectPNSt10shared_ptrI5TableEE>
> in PyArrow’s C++ to convert from a PyArrow table to an Arrow table. The
> problem with this is that my C++ library <
> https://github.com/finos/perspective> is going to build and link against
> Arrow on the C++ side rather than PyArrow side (because it will also be
> consumed in WebAssembly), so I want to avoid also linking against 
> PyArrow’s
> copy of the arrow library. I also need to look for PyArrow’s header files,
> which might conflict with the version in the local C++ code.
>>> 
>>> My solution right now is to just assert that PyArrow version == Arrow
> version and do some pruning (so I link against local libarrow and 
> PyArrow’s
> libarrow_python rather than use PyArrow’s libarrow), but ideally it would
> be great if there was a clean way to hand a PyArrow Table over to C++
> without requiring the C++ to have PyArrow (e.g. using only a PyObject *).
> Please forgive my ignorance/google skills if its already possible!
>>> 
>>> unwrap_table code:
>>> 
> https://github.com/apache/arrow/blob/c39e3508f93ea41410c2ae17783054d05592dc0e/python/pyarrow/public-api.pxi#L310
> <
> 

Re: Nightly build report for crossbow job nightly-2019-09-10-1

2019-09-10 Thread Wes McKinney
I just opened https://issues.apache.org/jira/browse/ARROW-6522 which
seems incidental to the failing Windows wheel build (not actually
caused by anything wheel-related)

On Tue, Sep 10, 2019 at 6:55 PM Neal Richardson
 wrote:
>
> I think they're ticketed. According to
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20priority%20%3D%20Blocker%20AND%20resolution%20%3D%20Unresolved%20AND%20fixVersion%20%3D%200.15.0%20AND%20component%20%3D%20%22Continuous%20Integration%22%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC,
> 4 of those failures are ticketed: spark, turbodbc, and the gandiva
> jars. The rest of the failures are wheels.
>
> Neal
>
> On Tue, Sep 10, 2019 at 2:17 PM Wes McKinney  wrote:
> >
> > I just opened
> >
> > https://issues.apache.org/jira/browse/ARROW-6518
> >
> > about the Python wheel failures. I suggest disabling Flight in the
> > wheels until someone else can help maintain them.
> >
> > Are there JIRA issues for the other failures?
> >
> > On Tue, Sep 10, 2019 at 3:44 PM Krisztián Szűcs
> >  wrote:
> > >
> > > Crossbow Report for Job nightly-2019-09-10-1
> > >
> > > All tasks:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1
> > >
> > > Failed Tasks:
> > > - docker-turbodbc-integration:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-circle-docker-turbodbc-integration
> > > - conda-win-vs2015-py37:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-appveyor-conda-win-vs2015-py37
> > > - gandiva-jar-trusty:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-travis-gandiva-jar-trusty
> > > - docker-clang-format:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-circle-docker-clang-format
> > > - wheel-osx-cp37m:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-travis-wheel-osx-cp37m
> > > - wheel-osx-cp27m:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-travis-wheel-osx-cp27m
> > > - wheel-osx-cp35m:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-travis-wheel-osx-cp35m
> > > - wheel-manylinux1-cp35m:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-travis-wheel-manylinux1-cp35m
> > > - wheel-osx-cp36m:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-travis-wheel-osx-cp36m
> > > - docker-spark-integration:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-circle-docker-spark-integration
> > > - wheel-win-cp35m:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-appveyor-wheel-win-cp35m
> > >
> > > Succeeded Tasks:
> > > - conda-osx-clang-py37:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-azure-conda-osx-clang-py37
> > > - docker-cpp:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-circle-docker-cpp
> > > - docker-cpp-cmake32:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-circle-docker-cpp-cmake32
> > > - wheel-win-cp36m:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-appveyor-wheel-win-cp36m
> > > - conda-osx-clang-py36:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-azure-conda-osx-clang-py36
> > > - wheel-manylinux1-cp27m:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-travis-wheel-manylinux1-cp27m
> > > - docker-python-3.6-nopandas:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-circle-docker-python-3.6-nopandas
> > > - docker-dask-integration:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-circle-docker-dask-integration
> > > - wheel-manylinux2010-cp27mu:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-travis-wheel-manylinux2010-cp27mu
> > > - ubuntu-disco:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-azure-ubuntu-disco
> > > - docker-go:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-circle-docker-go
> > > - conda-linux-gcc-py37:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-azure-conda-linux-gcc-py37
> > > - docker-js:
> > >   URL:
> > > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-circle-docker-js
> > > - docker-pandas-master:
> > >   URL:
> > > 

[jira] [Created] (ARROW-6522) [Python] Test suite fails with pandas 0.23.4, pytest 3.8.1

2019-09-10 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-6522:
---

 Summary: [Python] Test suite fails with pandas 0.23.4, pytest 3.8.1
 Key: ARROW-6522
 URL: https://issues.apache.org/jira/browse/ARROW-6522
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Reporter: Wes McKinney
 Fix For: 0.15.0


{code}
_ test_array_protocol _
def test_array_protocol():
if LooseVersion(pd.__version__) < '0.24.0':
>   pytest.skip(reason='IntegerArray only introduced in 0.24')
E   TypeError: unexpected keyword arguments: ['reason']
C:\Miniconda3-x64\envs\wheel-test\lib\site-packages\pyarrow\tests\test_pandas.py:2934:
 TypeError
=== short test summary info ===
{code}

See https://ci.appveyor.com/project/Ursa-Labs/crossbow/builds/27310212

[~jorisvandenbossche] can you have a look?



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


Re: Nightly build report for crossbow job nightly-2019-09-10-1

2019-09-10 Thread Neal Richardson
I think they're ticketed. According to
https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20priority%20%3D%20Blocker%20AND%20resolution%20%3D%20Unresolved%20AND%20fixVersion%20%3D%200.15.0%20AND%20component%20%3D%20%22Continuous%20Integration%22%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC,
4 of those failures are ticketed: spark, turbodbc, and the gandiva
jars. The rest of the failures are wheels.

Neal

On Tue, Sep 10, 2019 at 2:17 PM Wes McKinney  wrote:
>
> I just opened
>
> https://issues.apache.org/jira/browse/ARROW-6518
>
> about the Python wheel failures. I suggest disabling Flight in the
> wheels until someone else can help maintain them.
>
> Are there JIRA issues for the other failures?
>
> On Tue, Sep 10, 2019 at 3:44 PM Krisztián Szűcs
>  wrote:
> >
> > Crossbow Report for Job nightly-2019-09-10-1
> >
> > All tasks:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1
> >
> > Failed Tasks:
> > - docker-turbodbc-integration:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-circle-docker-turbodbc-integration
> > - conda-win-vs2015-py37:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-appveyor-conda-win-vs2015-py37
> > - gandiva-jar-trusty:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-travis-gandiva-jar-trusty
> > - docker-clang-format:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-circle-docker-clang-format
> > - wheel-osx-cp37m:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-travis-wheel-osx-cp37m
> > - wheel-osx-cp27m:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-travis-wheel-osx-cp27m
> > - wheel-osx-cp35m:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-travis-wheel-osx-cp35m
> > - wheel-manylinux1-cp35m:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-travis-wheel-manylinux1-cp35m
> > - wheel-osx-cp36m:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-travis-wheel-osx-cp36m
> > - docker-spark-integration:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-circle-docker-spark-integration
> > - wheel-win-cp35m:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-appveyor-wheel-win-cp35m
> >
> > Succeeded Tasks:
> > - conda-osx-clang-py37:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-azure-conda-osx-clang-py37
> > - docker-cpp:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-circle-docker-cpp
> > - docker-cpp-cmake32:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-circle-docker-cpp-cmake32
> > - wheel-win-cp36m:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-appveyor-wheel-win-cp36m
> > - conda-osx-clang-py36:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-azure-conda-osx-clang-py36
> > - wheel-manylinux1-cp27m:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-travis-wheel-manylinux1-cp27m
> > - docker-python-3.6-nopandas:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-circle-docker-python-3.6-nopandas
> > - docker-dask-integration:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-circle-docker-dask-integration
> > - wheel-manylinux2010-cp27mu:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-travis-wheel-manylinux2010-cp27mu
> > - ubuntu-disco:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-azure-ubuntu-disco
> > - docker-go:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-circle-docker-go
> > - conda-linux-gcc-py37:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-azure-conda-linux-gcc-py37
> > - docker-js:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-circle-docker-js
> > - docker-pandas-master:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-circle-docker-pandas-master
> > - docker-hdfs-integration:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-circle-docker-hdfs-integration
> > - wheel-manylinux1-cp27mu:
> >   URL:
> > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-travis-wheel-manylinux1-cp27mu
> > - docker-docs:
> >   URL:
> > 

[jira] [Created] (ARROW-6521) [C++] Add function to arrow:: namespace that returns the current ABI version

2019-09-10 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-6521:
---

 Summary: [C++] Add function to arrow:: namespace that returns the 
current ABI version
 Key: ARROW-6521
 URL: https://issues.apache.org/jira/browse/ARROW-6521
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Wes McKinney






--
This message was sent by Atlassian Jira
(v8.3.2#803003)


Re: Consume tables from PyArrow independently from the python_arrow library

2019-09-10 Thread Wes McKinney
Hi Tim,

I see what you're saying now, sorry that I didn't understand sooner.

We actually need this feature to be able to pass instances of
shared_ptr (under very controlled conditions) into R using
reticulate, where T is any of

* Array
* ChunkedArray
* DataType
* RecordBatch
* Table
* and some other classes


I would suggest introducing a property on pyarrow Python objects that
returns the memory address of the wrapped shared_ptr (i.e. the
integer leading to shared_ptr*). Then you can create your copy of
that. Would that work? The only reason this is not implemented is that
no one has needed it yet, mechanically it does not strike me as that
complex.

See https://issues.apache.org/jira/browse/ARROW-3750. My comment in
November 2018 "Methods would need to be added to the Cython extension
types to give the memory address of the smart pointer object they
contain". I agree with my younger self. Are you up to submit a PR?

- Wes

On Tue, Sep 10, 2019 at 6:31 PM Tim Paine  wrote:
>
> The end goal is to go direct from pyarrow to wasm without intermediate 
> transforms. I can definitely make it work as is, we'll just have to be 
> careful that the code we compile to webassembly matches exactly either our 
> local copy of arrow if the user hasn't installed pyarrow, otherwise their 
> installed copy.
>
> Tim Paine
> tim.paine.nyc
> 908-721-1185
>
> > On Sep 10, 2019, at 19:12, Tim Paine  wrote:
> >
> > We're building webassembly, so we obviously don't want to introduce a 
> > pyarrow dependency. I don't want to do any pyarrow manipulations in c++, 
> > just get the c++ table. I was hoping pyarrow might expose a raw pointer or 
> > have something castable.
> >
> > It seems to be a big limitation, there is no way of communicating a pyarrow 
> > table to a c++ library that uses arrow without that library linking against 
> > pyarrow.
> >
> > Tim Paine
> > tim.paine.nyc
> > 908-721-1185
> >
> >> On Sep 10, 2019, at 17:44, Wes McKinney  wrote:
> >>
> >> The Python extension types are defined in Cython, not C or C++ so you need
> >> to load the Cython extensions in order to instantiate the classes.
> >>
> >> Why do you have 2 copies of the C++ library? That seems easy to fix. If you
> >> are using wheels from PyPI I would recommend that you switch to conda or
> >> your own wheels without the C++ libraries bundled.
> >>
> >>> On Tue, Sep 10, 2019, 4:23 PM Tim Paine  wrote:
> >>>
> >>> Is there no way to do it without PyArrow? My C++ library is building arrow
> >>> itself, which means if I use PyArow I’ll end up having 2 copies: one from
> >>> my local C++ only build, and one from PyArrow.
> >>>
>  On Sep 10, 2019, at 5:18 PM, Wes McKinney  wrote:
> 
>  hi Tim,
> 
>  You can use the functions in
> 
> 
> >>> https://github.com/apache/arrow/blob/master/cpp/src/arrow/python/pyarrow.h
> 
>  You need to call "import_pyarrow()" from C++ before these APIs can be
>  used. It's similar to the NumPy C API in that regard
> 
>  - Wes
> 
> > On Tue, Sep 10, 2019 at 4:13 PM Tim Paine  wrote:
> >
> > Hey all, following up on a question I asked on stack overflow <
> >>> https://stackoverflow.com/questions/57863751/how-to-convert-pyarrow-table-to-arrow-table-when-interfacing-between-pyarrow-in
>  .
> >
> > It seems there is some code <
> >>> https://arrow.apache.org/docs/python/extending.html#_CPPv412unwrap_tableP8PyObjectPNSt10shared_ptrI5TableEE>
> >>> in PyArrow’s C++ to convert from a PyArrow table to an Arrow table. The
> >>> problem with this is that my C++ library <
> >>> https://github.com/finos/perspective> is going to build and link against
> >>> Arrow on the C++ side rather than PyArrow side (because it will also be
> >>> consumed in WebAssembly), so I want to avoid also linking against 
> >>> PyArrow’s
> >>> copy of the arrow library. I also need to look for PyArrow’s header files,
> >>> which might conflict with the version in the local C++ code.
> >
> > My solution right now is to just assert that PyArrow version == Arrow
> >>> version and do some pruning (so I link against local libarrow and 
> >>> PyArrow’s
> >>> libarrow_python rather than use PyArrow’s libarrow), but ideally it would
> >>> be great if there was a clean way to hand a PyArrow Table over to C++
> >>> without requiring the C++ to have PyArrow (e.g. using only a PyObject *).
> >>> Please forgive my ignorance/google skills if its already possible!
> >
> > unwrap_table code:
> >
> >>> https://github.com/apache/arrow/blob/c39e3508f93ea41410c2ae17783054d05592dc0e/python/pyarrow/public-api.pxi#L310
> >>> <
> >>> https://github.com/apache/arrow/blob/c39e3508f93ea41410c2ae17783054d05592dc0e/python/pyarrow/public-api.pxi#L310
> 
> >
> > library pruning:
> >
> >>> https://github.com/finos/perspective/blob/python_arrow/cmake/modules/FindPyArrow.cmake#L53
> >>> <
> >>> https://github.com/finos/perspective/blob/python_arrow/cmake/modules/FindPyArrow.cmake#L53

Re: Consume tables from PyArrow independently from the python_arrow library

2019-09-10 Thread Tim Paine
We're building webassembly, so we obviously don't want to introduce a pyarrow 
dependency. I don't want to do any pyarrow manipulations in c++, just get the 
c++ table. I was hoping pyarrow might expose a raw pointer or have something 
castable.

It seems to be a big limitation, there is no way of communicating a pyarrow 
table to a c++ library that uses arrow without that library linking against 
pyarrow.

Tim Paine
tim.paine.nyc
908-721-1185

> On Sep 10, 2019, at 17:44, Wes McKinney  wrote:
> 
> The Python extension types are defined in Cython, not C or C++ so you need
> to load the Cython extensions in order to instantiate the classes.
> 
> Why do you have 2 copies of the C++ library? That seems easy to fix. If you
> are using wheels from PyPI I would recommend that you switch to conda or
> your own wheels without the C++ libraries bundled.
> 
>> On Tue, Sep 10, 2019, 4:23 PM Tim Paine  wrote:
>> 
>> Is there no way to do it without PyArrow? My C++ library is building arrow
>> itself, which means if I use PyArow I’ll end up having 2 copies: one from
>> my local C++ only build, and one from PyArrow.
>> 
>>> On Sep 10, 2019, at 5:18 PM, Wes McKinney  wrote:
>>> 
>>> hi Tim,
>>> 
>>> You can use the functions in
>>> 
>>> 
>> https://github.com/apache/arrow/blob/master/cpp/src/arrow/python/pyarrow.h
>>> 
>>> You need to call "import_pyarrow()" from C++ before these APIs can be
>>> used. It's similar to the NumPy C API in that regard
>>> 
>>> - Wes
>>> 
 On Tue, Sep 10, 2019 at 4:13 PM Tim Paine  wrote:
 
 Hey all, following up on a question I asked on stack overflow <
>> https://stackoverflow.com/questions/57863751/how-to-convert-pyarrow-table-to-arrow-table-when-interfacing-between-pyarrow-in
>>> .
 
 It seems there is some code <
>> https://arrow.apache.org/docs/python/extending.html#_CPPv412unwrap_tableP8PyObjectPNSt10shared_ptrI5TableEE>
>> in PyArrow’s C++ to convert from a PyArrow table to an Arrow table. The
>> problem with this is that my C++ library <
>> https://github.com/finos/perspective> is going to build and link against
>> Arrow on the C++ side rather than PyArrow side (because it will also be
>> consumed in WebAssembly), so I want to avoid also linking against PyArrow’s
>> copy of the arrow library. I also need to look for PyArrow’s header files,
>> which might conflict with the version in the local C++ code.
 
 My solution right now is to just assert that PyArrow version == Arrow
>> version and do some pruning (so I link against local libarrow and PyArrow’s
>> libarrow_python rather than use PyArrow’s libarrow), but ideally it would
>> be great if there was a clean way to hand a PyArrow Table over to C++
>> without requiring the C++ to have PyArrow (e.g. using only a PyObject *).
>> Please forgive my ignorance/google skills if its already possible!
 
 unwrap_table code:
 
>> https://github.com/apache/arrow/blob/c39e3508f93ea41410c2ae17783054d05592dc0e/python/pyarrow/public-api.pxi#L310
>> <
>> https://github.com/apache/arrow/blob/c39e3508f93ea41410c2ae17783054d05592dc0e/python/pyarrow/public-api.pxi#L310
>>> 
 
 library pruning:
 
>> https://github.com/finos/perspective/blob/python_arrow/cmake/modules/FindPyArrow.cmake#L53
>> <
>> https://github.com/finos/perspective/blob/python_arrow/cmake/modules/FindPyArrow.cmake#L53
>>> 
 
 
 
 
 Tim
>> 
>> 


Re: Timeline for 0.15.0 release

2019-09-10 Thread Wes McKinney
Hi folks,

With the state of nightly packaging and integration builds things aren't
looking too good for being in release readiness by the end of this week but
maybe I'm wrong. I'm planning to be working to close as many issues as I
can and also to help with the ongoing alignment fixes.

Wes

On Thu, Sep 5, 2019, 11:07 PM Micah Kornfield  wrote:

> Just for reference [1] has a dashboard of the current issues:
>
> https://cwiki.apache.org/confluence/display/ARROW/Arrow+0.15.0+Release
>
> On Thu, Sep 5, 2019 at 3:43 PM Wes McKinney  wrote:
>
>> hi all,
>>
>> It doesn't seem like we're going to be in a position to release at the
>> beginning of next week. I hope that one more week of work (or less)
>> will be enough to get us there. Aside from merging the alignment
>> changes, we need to make sure that our packaging jobs required for the
>> release candidate are all working.
>>
>> If folks could remove issues from the 0.15.0 backlog that they don't
>> think they will finish by end of next week that would help focus
>> efforts (there are currently 78 issues in 0.15.0 still). I am looking
>> to tackle a few small features related to dictionaries while the
>> release window is still open.
>>
>> - Wes
>>
>> On Tue, Aug 27, 2019 at 3:48 PM Wes McKinney  wrote:
>> >
>> > hi,
>> >
>> > I think we should try to release the week of September 9, so
>> > development work should be completed by end of next week.
>> >
>> > Does that seem reasonable?
>> >
>> > I plan to get up a patch for the protocol alignment changes for C++ in
>> > the next couple of days -- I think that getting the alignment work
>> > done is the main barrier to releasing.
>> >
>> > Thanks
>> > Wes
>> >
>> > On Mon, Aug 19, 2019 at 12:25 PM Ji Liu 
>> wrote:
>> > >
>> > > Hi, Wes, on the java side, I can think of several bugs that need to
>> be fixed or reminded.
>> > >
>> > > i. ARROW-6040: Dictionary entries are required in IPC streams even
>> when empty[1]
>> > > This one is under review now, however through this PR we find that
>> there seems a bug in java reading and writing dictionaries in IPC which is
>> Inconsistent with spec[2] since it assumes all dictionaries are at the
>> start of stream (see details in PR comments,  and this fix may not catch up
>> with version 0.15). @Micah Kornfield
>> > >
>> > > ii. ARROW-1875: Write 64-bit ints as strings in integration test JSON
>> files[3]
>> > > Java side code already checked in, other implementations seems not.
>> > >
>> > > iii. ARROW-6202: OutOfMemory in JdbcAdapter[4]
>> > > Caused by trying to load all records in one contiguous batch, fixed
>> by providing iterator API for iteratively reading in ARROW-6219[5].
>> > >
>> > > Thanks,
>> > > Ji Liu
>> > >
>> > > [1] https://github.com/apache/arrow/pull/4960
>> > > [2] https://arrow.apache.org/docs/ipc.html
>> > > [3] https://issues.apache.org/jira/browse/ARROW-1875
>> > > [4] https://issues.apache.org/jira/browse/ARROW-6202[5]
>> https://issues.apache.org/jira/browse/ARROW-6219
>> > >
>> > >
>> > >
>> > > --
>> > > From:Wes McKinney 
>> > > Send Time:2019年8月19日(星期一) 23:03
>> > > To:dev 
>> > > Subject:Re: Timeline for 0.15.0 release
>> > >
>> > > I'm going to work some on organizing the 0.15.0 backlog some this
>> > > week, if anyone wants to help with grooming (particularly for
>> > > languages other than C++/Python where I'm focusing) that would be
>> > > helpful. There have been almost 500 JIRA issues opened since the
>> > > 0.14.0 release, so we should make sure to check whether there's any
>> > > regressions or other serious bugs that we should try to fix for
>> > > 0.15.0.
>> > >
>> > > On Thu, Aug 15, 2019 at 6:23 PM Wes McKinney 
>> wrote:
>> > > >
>> > > > The Windows wheel issue in 0.14.1 seems to be
>> > > >
>> > > > https://issues.apache.org/jira/browse/ARROW-6015
>> > > >
>> > > > I think the root cause could be the Windows changes in
>> > > >
>> > > >
>> https://github.com/apache/arrow/commit/223ae744cc2a12c60cecb5db593263a03c13f85a
>> > > >
>> > > > I would be appreciative if a volunteer would look into what was
>> wrong
>> > > > with the 0.14.1 wheels on Windows. Otherwise 0.15.0 Windows wheels
>> > > > will be broken, too
>> > > >
>> > > > The bad wheels can be found at
>> > > >
>> > > > https://bintray.com/apache/arrow/python#files/python%2F0.14.1
>> > > >
>> > > > On Thu, Aug 15, 2019 at 1:28 PM Antoine Pitrou 
>> wrote:
>> > > > >
>> > > > > On Thu, 15 Aug 2019 11:17:07 -0700
>> > > > > Micah Kornfield  wrote:
>> > > > > > >
>> > > > > > > In C++ they are
>> > > > > > > independent, we could have 32-bit array lengths and
>> variable-length
>> > > > > > > types with 64-bit offsets if we wanted (we just wouldn't be
>> able to
>> > > > > > > have a List child with more than INT32_MAX elements).
>> > > > > >
>> > > > > > I think the point is we could do this in C++ but we don't.  I'm
>> not sure we
>> > > > > > would have introduced the "Large" types if we did.
>> > > 

[jira] [Created] (ARROW-6520) Segmentation fault on writing tables with fixed size binary fields

2019-09-10 Thread Furkan Tektas (Jira)
Furkan Tektas created ARROW-6520:


 Summary: Segmentation fault on writing tables with fixed size 
binary fields 
 Key: ARROW-6520
 URL: https://issues.apache.org/jira/browse/ARROW-6520
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.14.1
 Environment: Arch Linux x86_64
arrow-cpp 0.14.1   py37h6b969ab_1conda-forge
parquet-cpp   1.5.1 2conda-forge
pyarrow   0.14.1   py37h8b68381_0conda-forge
python3.7.3h33d41f4_1conda-forge

Reporter: Furkan Tektas


I'm not sure if this should be reported to Parquet or here.

When I tried to serialize a pyarrow table with a fixed size binary field (holds 
16 byte UUID4 information) to a parquet file, segmentation fault occurs.

Here is the minimal example to reproduce:

 
{color:#569cd6}import{color}{color:#d4d4d4} pyarrow 
{color}{color:#569cd6}as{color}{color:#d4d4d4} pa{color}
{color:#569cd6}from{color}{color:#d4d4d4} pyarrow 
{color}{color:#569cd6}import{color}{color:#d4d4d4} parquet 
{color}{color:#569cd6}as{color}{color:#d4d4d4} pq{color}

{color:#d4d4d4}data {color}{color:#d4d4d4}={color}{color:#d4d4d4} 
{{color}{color:#ce9178}"col"{color}{color:#d4d4d4}: 
pa.array([{color}{color:#569cd6}b{color}{color:#ce9178}"1234"{color}{color:#d4d4d4}
 {color}{color:#569cd6}for{color}{color:#d4d4d4} _ 
{color}{color:#569cd6}in{color}{color:#d4d4d4} 
range({color}{color:#b5cea8}10{color}{color:#d4d4d4})])}{color}
{color:#d4d4d4}fields {color}{color:#d4d4d4}={color}{color:#d4d4d4} 
[({color}{color:#ce9178}"col"{color}{color:#d4d4d4}, 
pa.binary({color}{color:#b5cea8}4{color}{color:#d4d4d4}))]{color}
{color:#d4d4d4}schema {color}{color:#d4d4d4}={color}{color:#d4d4d4} 
pa.schema(fields){color}
{color:#d4d4d4}table {color}{color:#d4d4d4}={color}{color:#d4d4d4} 
pa.table(data, schema){color}
{color:#d4d4d4}pq.write_table(table, 
{color}{color:#ce9178}"test.parquet"{color}{color:#d4d4d4}){color}
{color:#FF}*{{segmentation fault (core dumped) ipython}}*{color}

{{Yet, it works if I don't specify the size of the binary field.}}
{color:#569cd6}import{color}{color:#d4d4d4} pyarrow 
{color}{color:#569cd6}as{color}{color:#d4d4d4} pa{color}
{color:#569cd6}from{color}{color:#d4d4d4} pyarrow 
{color}{color:#569cd6}import{color}{color:#d4d4d4} parquet 
{color}{color:#569cd6}as{color}{color:#d4d4d4} pq{color}

{color:#d4d4d4}data {color}{color:#d4d4d4}={color}{color:#d4d4d4} 
{{color}{color:#ce9178}"col"{color}{color:#d4d4d4}: 
pa.array([{color}{color:#569cd6}b{color}{color:#ce9178}"1234"{color}{color:#d4d4d4}
 {color}{color:#569cd6}for{color}{color:#d4d4d4} _ 
{color}{color:#569cd6}in{color}{color:#d4d4d4} 
range({color}{color:#b5cea8}10{color}{color:#d4d4d4})])}{color}
{color:#d4d4d4}fields {color}{color:#d4d4d4}={color}{color:#d4d4d4} 
[({color}{color:#ce9178}"col"{color}{color:#d4d4d4}, pa.binary())]{color}
{color:#d4d4d4}schema {color}{color:#d4d4d4}={color}{color:#d4d4d4} 
pa.schema(fields){color}
{color:#d4d4d4}table {color}{color:#d4d4d4}={color}{color:#d4d4d4} 
pa.table(data, schema){color}
{color:#d4d4d4}pq.write_table(table, 
{color}{color:#ce9178}"test.parquet"{color}{color:#d4d4d4}){color}
 

Thanks,



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


Re: Consume tables from PyArrow independently from the python_arrow library

2019-09-10 Thread Wes McKinney
The Python extension types are defined in Cython, not C or C++ so you need
to load the Cython extensions in order to instantiate the classes.

Why do you have 2 copies of the C++ library? That seems easy to fix. If you
are using wheels from PyPI I would recommend that you switch to conda or
your own wheels without the C++ libraries bundled.

On Tue, Sep 10, 2019, 4:23 PM Tim Paine  wrote:

> Is there no way to do it without PyArrow? My C++ library is building arrow
> itself, which means if I use PyArow I’ll end up having 2 copies: one from
> my local C++ only build, and one from PyArrow.
>
> > On Sep 10, 2019, at 5:18 PM, Wes McKinney  wrote:
> >
> > hi Tim,
> >
> > You can use the functions in
> >
> >
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/python/pyarrow.h
> >
> > You need to call "import_pyarrow()" from C++ before these APIs can be
> > used. It's similar to the NumPy C API in that regard
> >
> > - Wes
> >
> > On Tue, Sep 10, 2019 at 4:13 PM Tim Paine  wrote:
> >>
> >> Hey all, following up on a question I asked on stack overflow <
> https://stackoverflow.com/questions/57863751/how-to-convert-pyarrow-table-to-arrow-table-when-interfacing-between-pyarrow-in
> >.
> >>
> >> It seems there is some code <
> https://arrow.apache.org/docs/python/extending.html#_CPPv412unwrap_tableP8PyObjectPNSt10shared_ptrI5TableEE>
> in PyArrow’s C++ to convert from a PyArrow table to an Arrow table. The
> problem with this is that my C++ library <
> https://github.com/finos/perspective> is going to build and link against
> Arrow on the C++ side rather than PyArrow side (because it will also be
> consumed in WebAssembly), so I want to avoid also linking against PyArrow’s
> copy of the arrow library. I also need to look for PyArrow’s header files,
> which might conflict with the version in the local C++ code.
> >>
> >> My solution right now is to just assert that PyArrow version == Arrow
> version and do some pruning (so I link against local libarrow and PyArrow’s
> libarrow_python rather than use PyArrow’s libarrow), but ideally it would
> be great if there was a clean way to hand a PyArrow Table over to C++
> without requiring the C++ to have PyArrow (e.g. using only a PyObject *).
> Please forgive my ignorance/google skills if its already possible!
> >>
> >> unwrap_table code:
> >>
> https://github.com/apache/arrow/blob/c39e3508f93ea41410c2ae17783054d05592dc0e/python/pyarrow/public-api.pxi#L310
> <
> https://github.com/apache/arrow/blob/c39e3508f93ea41410c2ae17783054d05592dc0e/python/pyarrow/public-api.pxi#L310
> >
> >>
> >> library pruning:
> >>
> https://github.com/finos/perspective/blob/python_arrow/cmake/modules/FindPyArrow.cmake#L53
> <
> https://github.com/finos/perspective/blob/python_arrow/cmake/modules/FindPyArrow.cmake#L53
> >
> >>
> >>
> >>
> >>
> >> Tim
>
>


Re: Consume tables from PyArrow independently from the python_arrow library

2019-09-10 Thread Tim Paine
Is there no way to do it without PyArrow? My C++ library is building arrow 
itself, which means if I use PyArow I’ll end up having 2 copies: one from my 
local C++ only build, and one from PyArrow.

> On Sep 10, 2019, at 5:18 PM, Wes McKinney  wrote:
> 
> hi Tim,
> 
> You can use the functions in
> 
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/python/pyarrow.h
> 
> You need to call "import_pyarrow()" from C++ before these APIs can be
> used. It's similar to the NumPy C API in that regard
> 
> - Wes
> 
> On Tue, Sep 10, 2019 at 4:13 PM Tim Paine  wrote:
>> 
>> Hey all, following up on a question I asked on stack overflow 
>> .
>> 
>> It seems there is some code 
>> 
>>  in PyArrow’s C++ to convert from a PyArrow table to an Arrow table. The 
>> problem with this is that my C++ library 
>>  is going to build and link against 
>> Arrow on the C++ side rather than PyArrow side (because it will also be 
>> consumed in WebAssembly), so I want to avoid also linking against PyArrow’s 
>> copy of the arrow library. I also need to look for PyArrow’s header files, 
>> which might conflict with the version in the local C++ code.
>> 
>> My solution right now is to just assert that PyArrow version == Arrow 
>> version and do some pruning (so I link against local libarrow and PyArrow’s 
>> libarrow_python rather than use PyArrow’s libarrow), but ideally it would be 
>> great if there was a clean way to hand a PyArrow Table over to C++ without 
>> requiring the C++ to have PyArrow (e.g. using only a PyObject *). Please 
>> forgive my ignorance/google skills if its already possible!
>> 
>> unwrap_table code:
>> https://github.com/apache/arrow/blob/c39e3508f93ea41410c2ae17783054d05592dc0e/python/pyarrow/public-api.pxi#L310
>>  
>> 
>> 
>> library pruning:
>> https://github.com/finos/perspective/blob/python_arrow/cmake/modules/FindPyArrow.cmake#L53
>>  
>> 
>> 
>> 
>> 
>> 
>> Tim



[jira] [Created] (ARROW-6519) [Java] Use IPC continuation token to mark EOS

2019-09-10 Thread Bryan Cutler (Jira)
Bryan Cutler created ARROW-6519:
---

 Summary: [Java] Use IPC continuation token to mark EOS
 Key: ARROW-6519
 URL: https://issues.apache.org/jira/browse/ARROW-6519
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Java
Reporter: Bryan Cutler
Assignee: Bryan Cutler
 Fix For: 0.15.0


For Arrow stream in non-legacy mode, the EOS identifier should be \{0x, 
0x}. This way, all bytes sent by the writer can be read.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


Re: Consume tables from PyArrow independently from the python_arrow library

2019-09-10 Thread Wes McKinney
hi Tim,

You can use the functions in

https://github.com/apache/arrow/blob/master/cpp/src/arrow/python/pyarrow.h

You need to call "import_pyarrow()" from C++ before these APIs can be
used. It's similar to the NumPy C API in that regard

- Wes

On Tue, Sep 10, 2019 at 4:13 PM Tim Paine  wrote:
>
> Hey all, following up on a question I asked on stack overflow 
> .
>
> It seems there is some code 
> 
>  in PyArrow’s C++ to convert from a PyArrow table to an Arrow table. The 
> problem with this is that my C++ library 
>  is going to build and link against 
> Arrow on the C++ side rather than PyArrow side (because it will also be 
> consumed in WebAssembly), so I want to avoid also linking against PyArrow’s 
> copy of the arrow library. I also need to look for PyArrow’s header files, 
> which might conflict with the version in the local C++ code.
>
> My solution right now is to just assert that PyArrow version == Arrow version 
> and do some pruning (so I link against local libarrow and PyArrow’s 
> libarrow_python rather than use PyArrow’s libarrow), but ideally it would be 
> great if there was a clean way to hand a PyArrow Table over to C++ without 
> requiring the C++ to have PyArrow (e.g. using only a PyObject *). Please 
> forgive my ignorance/google skills if its already possible!
>
> unwrap_table code:
> https://github.com/apache/arrow/blob/c39e3508f93ea41410c2ae17783054d05592dc0e/python/pyarrow/public-api.pxi#L310
>  
> 
>
> library pruning:
> https://github.com/finos/perspective/blob/python_arrow/cmake/modules/FindPyArrow.cmake#L53
>  
> 
>
>
>
>
> Tim


Re: Nightly build report for crossbow job nightly-2019-09-10-1

2019-09-10 Thread Wes McKinney
I just opened

https://issues.apache.org/jira/browse/ARROW-6518

about the Python wheel failures. I suggest disabling Flight in the
wheels until someone else can help maintain them.

Are there JIRA issues for the other failures?

On Tue, Sep 10, 2019 at 3:44 PM Krisztián Szűcs
 wrote:
>
> Crossbow Report for Job nightly-2019-09-10-1
>
> All tasks:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1
>
> Failed Tasks:
> - docker-turbodbc-integration:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-circle-docker-turbodbc-integration
> - conda-win-vs2015-py37:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-appveyor-conda-win-vs2015-py37
> - gandiva-jar-trusty:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-travis-gandiva-jar-trusty
> - docker-clang-format:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-circle-docker-clang-format
> - wheel-osx-cp37m:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-travis-wheel-osx-cp37m
> - wheel-osx-cp27m:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-travis-wheel-osx-cp27m
> - wheel-osx-cp35m:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-travis-wheel-osx-cp35m
> - wheel-manylinux1-cp35m:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-travis-wheel-manylinux1-cp35m
> - wheel-osx-cp36m:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-travis-wheel-osx-cp36m
> - docker-spark-integration:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-circle-docker-spark-integration
> - wheel-win-cp35m:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-appveyor-wheel-win-cp35m
>
> Succeeded Tasks:
> - conda-osx-clang-py37:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-azure-conda-osx-clang-py37
> - docker-cpp:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-circle-docker-cpp
> - docker-cpp-cmake32:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-circle-docker-cpp-cmake32
> - wheel-win-cp36m:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-appveyor-wheel-win-cp36m
> - conda-osx-clang-py36:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-azure-conda-osx-clang-py36
> - wheel-manylinux1-cp27m:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-travis-wheel-manylinux1-cp27m
> - docker-python-3.6-nopandas:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-circle-docker-python-3.6-nopandas
> - docker-dask-integration:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-circle-docker-dask-integration
> - wheel-manylinux2010-cp27mu:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-travis-wheel-manylinux2010-cp27mu
> - ubuntu-disco:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-azure-ubuntu-disco
> - docker-go:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-circle-docker-go
> - conda-linux-gcc-py37:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-azure-conda-linux-gcc-py37
> - docker-js:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-circle-docker-js
> - docker-pandas-master:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-circle-docker-pandas-master
> - docker-hdfs-integration:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-circle-docker-hdfs-integration
> - wheel-manylinux1-cp27mu:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-travis-wheel-manylinux1-cp27mu
> - docker-docs:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-circle-docker-docs
> - ubuntu-xenial:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-azure-ubuntu-xenial
> - wheel-manylinux2010-cp37m:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-travis-wheel-manylinux2010-cp37m
> - gandiva-jar-osx:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-travis-gandiva-jar-osx
> - conda-osx-clang-py27:
>   URL:
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-azure-conda-osx-clang-py27
> - wheel-manylinux2010-cp35m:
>   URL:
> 

[jira] [Created] (ARROW-6518) [Packaging][Python] Flight failing in Python wheel builds

2019-09-10 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-6518:
---

 Summary: [Packaging][Python] Flight failing in Python wheel builds
 Key: ARROW-6518
 URL: https://issues.apache.org/jira/browse/ARROW-6518
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Packaging, Python
Reporter: Wes McKinney
 Fix For: 0.15.0


See example failure

https://travis-ci.org/ursa-labs/crossbow/builds/583167489?utm_source=github_status_medium=notification

{code}
[ 30%] Generating Flight.pb.cc, Flight.pb.h, Flight.grpc.pb.cc, Flight.grpc.pb.h
dyld: Library not loaded: /usr/local/opt/gperftools/lib/libprofiler.0.dylib
  Referenced from: /usr/local/Cellar/grpc/1.23.0_2/bin/grpc_cpp_plugin
  Reason: image not found
--grpc_out: protoc-gen-grpc: Plugin killed by signal 6.
make[2]: *** [src/arrow/flight/Flight.pb.cc] Error 1
make[2]: *** Deleting file `src/arrow/flight/Flight.pb.cc'
make[1]: *** [src/arrow/flight/CMakeFiles/flight_grpc_gen.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs
{code}

I suggest disabling Flight in the wheel builds



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


Consume tables from PyArrow independently from the python_arrow library

2019-09-10 Thread Tim Paine
Hey all, following up on a question I asked on stack overflow 
.

It seems there is some code 

 in PyArrow’s C++ to convert from a PyArrow table to an Arrow table. The 
problem with this is that my C++ library  
is going to build and link against Arrow on the C++ side rather than PyArrow 
side (because it will also be consumed in WebAssembly), so I want to avoid also 
linking against PyArrow’s copy of the arrow library. I also need to look for 
PyArrow’s header files, which might conflict with the version in the local C++ 
code.

My solution right now is to just assert that PyArrow version == Arrow version 
and do some pruning (so I link against local libarrow and PyArrow’s 
libarrow_python rather than use PyArrow’s libarrow), but ideally it would be 
great if there was a clean way to hand a PyArrow Table over to C++ without 
requiring the C++ to have PyArrow (e.g. using only a PyObject *). Please 
forgive my ignorance/google skills if its already possible! 

unwrap_table code:
https://github.com/apache/arrow/blob/c39e3508f93ea41410c2ae17783054d05592dc0e/python/pyarrow/public-api.pxi#L310
 


library pruning:
https://github.com/finos/perspective/blob/python_arrow/cmake/modules/FindPyArrow.cmake#L53
 





Tim

Nightly build report for crossbow job nightly-2019-09-10-1

2019-09-10 Thread Krisztián Szűcs
Crossbow Report for Job nightly-2019-09-10-1

All tasks:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1

Failed Tasks:
- docker-turbodbc-integration:
  URL:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-circle-docker-turbodbc-integration
- conda-win-vs2015-py37:
  URL:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-appveyor-conda-win-vs2015-py37
- gandiva-jar-trusty:
  URL:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-travis-gandiva-jar-trusty
- docker-clang-format:
  URL:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-circle-docker-clang-format
- wheel-osx-cp37m:
  URL:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-travis-wheel-osx-cp37m
- wheel-osx-cp27m:
  URL:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-travis-wheel-osx-cp27m
- wheel-osx-cp35m:
  URL:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-travis-wheel-osx-cp35m
- wheel-manylinux1-cp35m:
  URL:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-travis-wheel-manylinux1-cp35m
- wheel-osx-cp36m:
  URL:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-travis-wheel-osx-cp36m
- docker-spark-integration:
  URL:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-circle-docker-spark-integration
- wheel-win-cp35m:
  URL:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-appveyor-wheel-win-cp35m

Succeeded Tasks:
- conda-osx-clang-py37:
  URL:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-azure-conda-osx-clang-py37
- docker-cpp:
  URL:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-circle-docker-cpp
- docker-cpp-cmake32:
  URL:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-circle-docker-cpp-cmake32
- wheel-win-cp36m:
  URL:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-appveyor-wheel-win-cp36m
- conda-osx-clang-py36:
  URL:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-azure-conda-osx-clang-py36
- wheel-manylinux1-cp27m:
  URL:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-travis-wheel-manylinux1-cp27m
- docker-python-3.6-nopandas:
  URL:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-circle-docker-python-3.6-nopandas
- docker-dask-integration:
  URL:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-circle-docker-dask-integration
- wheel-manylinux2010-cp27mu:
  URL:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-travis-wheel-manylinux2010-cp27mu
- ubuntu-disco:
  URL:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-azure-ubuntu-disco
- docker-go:
  URL:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-circle-docker-go
- conda-linux-gcc-py37:
  URL:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-azure-conda-linux-gcc-py37
- docker-js:
  URL:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-circle-docker-js
- docker-pandas-master:
  URL:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-circle-docker-pandas-master
- docker-hdfs-integration:
  URL:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-circle-docker-hdfs-integration
- wheel-manylinux1-cp27mu:
  URL:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-travis-wheel-manylinux1-cp27mu
- docker-docs:
  URL:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-circle-docker-docs
- ubuntu-xenial:
  URL:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-azure-ubuntu-xenial
- wheel-manylinux2010-cp37m:
  URL:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-travis-wheel-manylinux2010-cp37m
- gandiva-jar-osx:
  URL:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-travis-gandiva-jar-osx
- conda-osx-clang-py27:
  URL:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-azure-conda-osx-clang-py27
- wheel-manylinux2010-cp35m:
  URL:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-travis-wheel-manylinux2010-cp35m
- debian-buster:
  URL:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-azure-debian-buster
- ubuntu-bionic:
  URL:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-azure-ubuntu-bionic
- conda-linux-gcc-py36:
  URL:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-09-10-1-azure-conda-linux-gcc-py36
- docker-python-2.7:
  URL:

[Java] How to use RootAllocator in a low memory setting?

2019-09-10 Thread Andong Zhan
Hi folks,

When I run this simple code with JVM setting: "-Xmx64m"


import org.apache.arrow.memory.RootAllocator;


public class TestArrow

{

  public static void main(String args[]) throws Exception

  {

new RootAllocator(Integer.MAX_VALUE);

  }

}


and got the following error


Picked up JAVA_TOOL_OPTIONS:
-Djavax.net.ssl.trustStore=/etc/pki/ca-trust/extracted/java/cacerts

Exception in thread "main" java.lang.ExceptionInInitializerError

at
org.apache.arrow.memory.BaseAllocator.createEmpty(BaseAllocator.java:263)

at org.apache.arrow.memory.BaseAllocator.(BaseAllocator.java:89)

at org.apache.arrow.memory.RootAllocator.(RootAllocator.java:34)

at org.apache.arrow.memory.RootAllocator.(RootAllocator.java:30)

at com.snowflake.TestArrow.main(TestArrow.java:13)


Caused by: java.lang.NullPointerException

at
io.netty.buffer.PooledByteBufAllocatorL$InnerAllocator.(PooledByteBufAllocatorL.java:145)

at
io.netty.buffer.PooledByteBufAllocatorL.(PooledByteBufAllocatorL.java:49)

at
org.apache.arrow.memory.AllocationManager.(AllocationManager.java:61)
... 5 more

Process finished with exit code 1


So how to use RootAllocator in such low memory case?

I also post an issue here: https://issues.apache.org/jira/browse/ARROW-6500

Thanks,

Andong


[jira] [Created] (ARROW-6517) [Go] Read and write 64-bit integers as strings in integration test JSON format

2019-09-10 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-6517:
---

 Summary: [Go] Read and write 64-bit integers as strings in 
integration test JSON format
 Key: ARROW-6517
 URL: https://issues.apache.org/jira/browse/ARROW-6517
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Go
Reporter: Wes McKinney






--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (ARROW-6516) [JS] Read and write int64/uint64 as strings in integration test JSON format

2019-09-10 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-6516:
---

 Summary: [JS] Read and write int64/uint64 as strings in 
integration test JSON format
 Key: ARROW-6516
 URL: https://issues.apache.org/jira/browse/ARROW-6516
 Project: Apache Arrow
  Issue Type: Improvement
  Components: JavaScript
Reporter: Wes McKinney






--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (ARROW-6515) [C++] Remove IsSignedInt trait

2019-09-10 Thread Benjamin Kietzman (Jira)
Benjamin Kietzman created ARROW-6515:


 Summary: [C++] Remove IsSignedInt trait
 Key: ARROW-6515
 URL: https://issues.apache.org/jira/browse/ARROW-6515
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Benjamin Kietzman
Assignee: Benjamin Kietzman


{{IsSignedInt}} takes either an array or a type as a type argument, which is 
surprisingly atypical for traits. Furthermore whereas {{is_signed_integer}} 
returns false for date and other types which are represented by but not 
identical to integers {{IsSignedInt}} returns true by checking only the 
{{c_type}}, which leads to {{static_assert(IsSignedInt::value, 
"")}}. Finally the declaration of {{IsSignedInt}} is far from readable due to 
nested macro usage.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (ARROW-6514) [Developer][C++][CMake] LLVM tools are restricted to the exact version 7.0

2019-09-10 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-6514:
--

 Summary: [Developer][C++][CMake] LLVM tools are restricted to the 
exact version 7.0
 Key: ARROW-6514
 URL: https://issues.apache.org/jira/browse/ARROW-6514
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Krisztian Szucs


I have LLVM 7.1 installed locally, and FindClangTools couldn't locate it 
because ARROW_LLVM_VERSION is [hardcoded to 
7.0|https://github.com/apache/arrow/blob/3f2a33f902983c0d395e0480e8a8df40ed5da29c/cpp/CMakeLists.txt#L91-L99]
 and clang tools is [restricted to the minor 
version|https://github.com/apache/arrow/blob/3f2a33f902983c0d395e0480e8a8df40ed5da29c/cpp/cmake_modules/FindClangTools.cmake#L78].

If it makes sense to restrict clang tools location down to the minor version, 
then we need to pass the located LLVM's version instead of the hardcoded one.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (ARROW-6513) [CI] The conda environment files arrow/ci/conda_env_*.yml should have .txt extension

2019-09-10 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-6513:
--

 Summary: [CI] The conda environment files arrow/ci/conda_env_*.yml 
should have .txt extension
 Key: ARROW-6513
 URL: https://issues.apache.org/jira/browse/ARROW-6513
 Project: Apache Arrow
  Issue Type: Improvement
  Components: CI
Reporter: Krisztian Szucs


The files `arrow/ci/conda_env_*.yml` files are not yaml files, we should rename 
them to use txt extension.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (ARROW-6512) [Developer] Tracking issue: use strings for 64-bit int64 and uint64 in integration test JSON format

2019-09-10 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-6512:
---

 Summary: [Developer] Tracking issue: use strings for 64-bit int64 
and uint64 in integration test JSON format
 Key: ARROW-6512
 URL: https://issues.apache.org/jira/browse/ARROW-6512
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Developer Tools
Reporter: Wes McKinney


This involves

C++, Java, Go, JS. I will link sub-issues so we can track



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (ARROW-6511) [Developer] Remove run_docker_compose.sh

2019-09-10 Thread Benjamin Kietzman (Jira)
Benjamin Kietzman created ARROW-6511:


 Summary: [Developer] Remove run_docker_compose.sh
 Key: ARROW-6511
 URL: https://issues.apache.org/jira/browse/ARROW-6511
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Developer Tools
Reporter: Benjamin Kietzman


dev/run_docker_compose.sh and Makefile.docker perform fundamentally the same 
function: run docker-compose conveniently. Consolidating them is probably 
worthwhile, and since Makefile.docker also builds dependencies it seems the 
natural choice.

update dev/README.md as well



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (ARROW-6510) [Python][Filesystem] Expose nanosecond resolution mtime

2019-09-10 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-6510:
--

 Summary: [Python][Filesystem] Expose nanosecond resolution mtime
 Key: ARROW-6510
 URL: https://issues.apache.org/jira/browse/ARROW-6510
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Krisztian Szucs


FileStats.mtime returns a microsecond resolution datetime object. At some point 
we should also expose a mtime_ns attribute that gives the exact nanoseconds as 
an integer, like os.stat does: 
https://docs.python.org/3/library/pathlib.html#pathlib.PurePath.as_posix



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (ARROW-6509) [CI] Java test failures on Travis

2019-09-10 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-6509:
-

 Summary: [CI] Java test failures on Travis
 Key: ARROW-6509
 URL: https://issues.apache.org/jira/browse/ARROW-6509
 Project: Apache Arrow
  Issue Type: Bug
  Components: Continuous Integration, Java
Reporter: Antoine Pitrou
 Fix For: 0.15.0


This seems to happen more or less frequently on the Python - Java build (with 
jpype enabled).
See warnings and errors starting from 
https://travis-ci.org/apache/arrow/jobs/583069089#L6662




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (ARROW-6508) [C++] Add Tensor and SparseTensor factory function with validations

2019-09-10 Thread Kenta Murata (Jira)
Kenta Murata created ARROW-6508:
---

 Summary: [C++] Add Tensor and SparseTensor factory function with 
validations
 Key: ARROW-6508
 URL: https://issues.apache.org/jira/browse/ARROW-6508
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Kenta Murata


Now Tensor and SparseTensor only have their constructors, but not factory 
functions that validate the parameters.
We need such factory functions for creating Tensor and SparseTensor from 
parameters given from the external source.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (ARROW-6507) [C++] Add ExtensionArray::ExtensionValidate for custom validation?

2019-09-10 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-6507:


 Summary: [C++] Add ExtensionArray::ExtensionValidate for custom 
validation?
 Key: ARROW-6507
 URL: https://issues.apache.org/jira/browse/ARROW-6507
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Joris Van den Bossche


>From discussing ARROW-6506, [~bkietz] said: an extension type might place more 
>constraints on an array than those implicit in its storage type, and users 
>will probably expect to be able to plug those into {{Validate}}.

So we could have a {{ExtensionArray::ExtensionValidate}} that the visitor for 
{{ExtensionArray}} can call, similarly like there is also an 
{{ExtensionType::ExtensionEquals}} that the visitor calls when extension types 
are checked for equality.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (ARROW-6506) [C++] Validation of ExtensionType with nested type fails

2019-09-10 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-6506:


 Summary: [C++] Validation of ExtensionType with nested type fails
 Key: ARROW-6506
 URL: https://issues.apache.org/jira/browse/ARROW-6506
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Joris Van den Bossche
 Fix For: 0.15.0


A reproducer using the Python ExtensionType:

{code}
class MyStructType(pa.ExtensionType): 

def __init__(self): 
storage_type = pa.struct([('a', pa.int64()), ('b', pa.int64())]) 
pa.ExtensionType.__init__(self, storage_type, 'my_struct_type') 

def __arrow_ext_serialize__(self): 
return b'' 

@classmethod 
def __arrow_ext_deserialize__(self, storage_type, serialized): 
return MyStructType() 

ty = MyStructType()
storage_array = pa.array([{'a': 1, 'b': 2}], ty.storage_type) 
arr = pa.ExtensionArray.from_storage(ty, storage_array) 
{code}

then validating this array fails because it expects no children (the extension 
array itself has no children, only the storage array):

{code}
In [8]: arr.validate()   
---
ArrowInvalid  Traceback (most recent call last)
 in 
> 1 arr.validate()

~/scipy/repos/arrow/python/pyarrow/array.pxi in pyarrow.lib.Array.validate()

~/scipy/repos/arrow/python/pyarrow/error.pxi in pyarrow.lib.check_status()

ArrowInvalid: Expected 0 child arrays in array of type 
extension, got 2
{code}




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (ARROW-6505) [Website] Add new committers

2019-09-10 Thread Kenta Murata (Jira)
Kenta Murata created ARROW-6505:
---

 Summary: [Website] Add new committers
 Key: ARROW-6505
 URL: https://issues.apache.org/jira/browse/ARROW-6505
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Website
Reporter: Kenta Murata
Assignee: Kenta Murata


I'd like to add new committers on the committer list.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)