Re: Support for numpy matrix

2019-04-01 Thread Mitar
Hi!

I agree. This is in fact all information which is already there. :-)


Mitar

On Sat, Mar 30, 2019 at 8:40 PM Wes McKinney  wrote:
>
> hi Mitar,
>
> Let's discuss further on JIRA? It's best to keep all the information
> about the issue in one place.
>
> Thanks
>
> On Sat, Mar 30, 2019 at 7:42 PM Mitar  wrote:
> >
> > Hi!
> >
> > I added:
> >
> > serialization_context.register_type(
> > np.matrix, 'np.matrix',
> > custom_serializer=_serialize_numpy_array_list,
> > custom_deserializer=_deserialize_numpy_array_list)
> >
> > But it did not help. Probably also because np.matrix is a subclas of
> > np.ndarray anyway. So no change here.
> >
> > An interesting fact is that this worked in older versions of numpy,
> > but stopped in numpy 1.15.2. It works with numpy 1.14.3. So it is them
> > changing something.
> >
> >
> > Mitar
> >
> > On Sat, Mar 30, 2019 at 3:34 PM Philipp Moritz  wrote:
> > >
> > > Hey Mitar,
> > >
> > > It might be as simple as adding a handler here:
> > > https://github.com/apache/arrow/blob/master/python/pyarrow/serialization.py#L300
> > >
> > > Do you want to try that?
> > >
> > > -- Philipp.
> > >
> > > On Sat, Mar 30, 2019 at 3:22 PM Mitar  wrote:
> > >
> > > > Hi!
> > > >
> > > > I do not know where to start looking into this? Not sure if I have
> > > > enough knowledge about arrow to be able to make a PR.
> > > >
> > > >
> > > > Miar
> > > >
> > > > On Sat, Mar 30, 2019 at 3:17 PM Wes McKinney  
> > > > wrote:
> > > > >
> > > > > hi Mitar,
> > > > >
> > > > > I see you reported the issue on October 2 and no one has volunteered
> > > > > to fix it yet. Are you up to submit a PR?
> > > > >
> > > > > Thanks
> > > > > Wes
> > > > >
> > > > > On Sat, Mar 30, 2019 at 5:14 PM Mitar  wrote:
> > > > > >
> > > > > > Hi!
> > > > > >
> > > > > > It seems numpy's matrix is not supported in recent versions of 
> > > > > > pyarrow:
> > > > > >
> > > > > > https://issues.apache.org/jira/browse/ARROW-3399
> > > > > >
> > > > > > Any ideas why this would be happening?
> > > > > >
> > > > > >
> > > > > > Mitar
> > > > > >
> > > > > > --
> > > > > > http://mitar.tnode.com/
> > > > > > https://twitter.com/mitar_m
> > > >
> > > >
> > > >
> > > > --
> > > > http://mitar.tnode.com/
> > > > https://twitter.com/mitar_m
> > > >
> >
> >
> >
> > --
> > http://mitar.tnode.com/
> > https://twitter.com/mitar_m



-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m


Re: Support for numpy matrix

2019-03-30 Thread Mitar
Hi!

I added:

serialization_context.register_type(
np.matrix, 'np.matrix',
custom_serializer=_serialize_numpy_array_list,
custom_deserializer=_deserialize_numpy_array_list)

But it did not help. Probably also because np.matrix is a subclas of
np.ndarray anyway. So no change here.

An interesting fact is that this worked in older versions of numpy,
but stopped in numpy 1.15.2. It works with numpy 1.14.3. So it is them
changing something.


Mitar

On Sat, Mar 30, 2019 at 3:34 PM Philipp Moritz  wrote:
>
> Hey Mitar,
>
> It might be as simple as adding a handler here:
> https://github.com/apache/arrow/blob/master/python/pyarrow/serialization.py#L300
>
> Do you want to try that?
>
> -- Philipp.
>
> On Sat, Mar 30, 2019 at 3:22 PM Mitar  wrote:
>
> > Hi!
> >
> > I do not know where to start looking into this? Not sure if I have
> > enough knowledge about arrow to be able to make a PR.
> >
> >
> > Miar
> >
> > On Sat, Mar 30, 2019 at 3:17 PM Wes McKinney  wrote:
> > >
> > > hi Mitar,
> > >
> > > I see you reported the issue on October 2 and no one has volunteered
> > > to fix it yet. Are you up to submit a PR?
> > >
> > > Thanks
> > > Wes
> > >
> > > On Sat, Mar 30, 2019 at 5:14 PM Mitar  wrote:
> > > >
> > > > Hi!
> > > >
> > > > It seems numpy's matrix is not supported in recent versions of pyarrow:
> > > >
> > > > https://issues.apache.org/jira/browse/ARROW-3399
> > > >
> > > > Any ideas why this would be happening?
> > > >
> > > >
> > > > Mitar
> > > >
> > > > --
> > > > http://mitar.tnode.com/
> > > > https://twitter.com/mitar_m
> >
> >
> >
> > --
> > http://mitar.tnode.com/
> > https://twitter.com/mitar_m
> >



-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m


Re: Support for numpy matrix

2019-03-30 Thread Mitar
Hi!

I do not know where to start looking into this? Not sure if I have
enough knowledge about arrow to be able to make a PR.


Miar

On Sat, Mar 30, 2019 at 3:17 PM Wes McKinney  wrote:
>
> hi Mitar,
>
> I see you reported the issue on October 2 and no one has volunteered
> to fix it yet. Are you up to submit a PR?
>
> Thanks
> Wes
>
> On Sat, Mar 30, 2019 at 5:14 PM Mitar  wrote:
> >
> > Hi!
> >
> > It seems numpy's matrix is not supported in recent versions of pyarrow:
> >
> > https://issues.apache.org/jira/browse/ARROW-3399
> >
> > Any ideas why this would be happening?
> >
> >
> > Mitar
> >
> > --
> > http://mitar.tnode.com/
> > https://twitter.com/mitar_m



-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m


Support for numpy matrix

2019-03-30 Thread Mitar
Hi!

It seems numpy's matrix is not supported in recent versions of pyarrow:

https://issues.apache.org/jira/browse/ARROW-3399

Any ideas why this would be happening?


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m


Efficient Pandas serialization for mixed object and numeric DataFrames

2018-10-18 Thread Mitar
Hi!

It seems that if a DataFrame contains both numeric and object columns,
the whole DataFrame is pickled and not that only object columns are
pickled? Is this right? Are there any plans to improve this?


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m


[jira] [Created] (ARROW-3399) Cannot serialize numpy matrix object

2018-10-02 Thread Mitar (JIRA)
Mitar created ARROW-3399:


 Summary: Cannot serialize numpy matrix object
 Key: ARROW-3399
 URL: https://issues.apache.org/jira/browse/ARROW-3399
 Project: Apache Arrow
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Mitar


This is a regression from 0.9.0 and happens with 0.10.0 with Python 3.6.5 on 
Linux.
{code:java}
from pyarrow import plasma
import numpy
import time
import subprocess
import os
import signal

m = numpy.matrix(numpy.array([[1, 2], [3, 4]]))

process = subprocess.Popen(['plasma_store', '-m', '100', '-s', 
'/tmp/plasma', '-d', '/dev/shm'], stdout=subprocess.DEVNULL, 
stderr=subprocess.DEVNULL, encoding='utf8', preexec_fn=os.setpgrp)
time.sleep(5)
client = plasma.connect('/tmp/plasma', '', 0)

try:
client.put(m)
finally:
client.disconnect()
os.killpg(os.getpgid(process.pid), signal.SIGTERM)
{code}
Error:
{noformat}
  File "pyarrow/_plasma.pyx", line 397, in pyarrow._plasma.PlasmaClient.put
  File "pyarrow/serialization.pxi", line 338, in pyarrow.lib.serialize
  File "pyarrow/error.pxi", line 89, in pyarrow.lib.check_status
pyarrow.lib.ArrowNotImplementedError: This object exceeds the maximum recursion 
depth. It may contain itself recursively.{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [RESULT] [VOTE] Release Apache Arrow 0.9.0 (RC2)

2018-03-22 Thread Mitar
Hi!

Oh, no worries. Thanks for working on this. I just thought that
because the website went up it is ready and thought that there is some
bug there. I understand it takes time to do a release, properly.


Mitar

On Thu, Mar 22, 2018 at 11:35 AM, Phillip Cloud <cpcl...@gmail.com> wrote:
> We are working on getting those wheels up as fast as we can. They should be
> available very soon. In the meantime, you can install pyarrow 0.9.0 with
> conda if you'd like.
>
> On Thu, Mar 22, 2018 at 2:19 PM Mitar <mmi...@gmail.com> wrote:
>
>> Hi!
>>
>> The website seems to say that there is already a pyarrow 0.9.0
>> package, but it does not seem to be there yet:
>>
>> https://arrow.apache.org/install/#python-wheels-on-pypi-unofficial
>> https://pypi.python.org/pypi/pyarrow
>>
>> BTW, why are Python packages unofficial?
>>
>>
>> Mitar
>>
>> On Thu, Mar 22, 2018 at 8:33 AM, Phillip Cloud <cpcl...@gmail.com> wrote:
>> > I'm working on updating the API docs.
>> >
>> > On Wed, Mar 21, 2018 at 10:24 PM Wes McKinney <wesmck...@gmail.com>
>> wrote:
>> >
>> >> I have put up blog posts to go out tomorrow about the Go code donation
>> >> and the release
>> >>
>> >> https://github.com/apache/arrow/pull/1776
>> >> https://github.com/apache/arrow/pull/1777
>> >>
>> >> Would someone like to take a crack at updating the generated API
>> >> documentation?
>> >>
>> >> On Wed, Mar 21, 2018 at 10:42 AM, Wes McKinney <wesmck...@gmail.com>
>> >> wrote:
>> >> > If any items are missing from
>> >> >
>> >>
>> https://github.com/apache/arrow/blob/master/dev/release/RELEASE_MANAGEMENT.md
>> >> ,
>> >> > let's definitely add them. I'd like the RM process to be reasonably
>> >> > fool-proof
>> >> >
>> >> > On Wed, Mar 21, 2018 at 10:07 AM, Phillip Cloud <cpcl...@gmail.com>
>> >> wrote:
>> >> >> Charles.Cloud
>> >> >>
>> >> >> On Wed, Mar 21, 2018 at 8:53 AM Uwe L. Korn <uw...@xhochy.com>
>> wrote:
>> >> >>
>> >> >>> At least I have not. Philip, what is your login on pypi.python.org
>> so
>> >> I
>> >> >>> can add you as a maintainer there?
>> >> >>>
>> >> >>> On Wed, Mar 21, 2018, at 1:49 PM, Phillip Cloud wrote:
>> >> >>> > Has anyone started on pip wheels yet? If not, I will start
>> cranking
>> >> on it
>> >> >>> > today.
>> >> >>> >
>> >> >>> > On Tue, Mar 20, 2018, 22:47 Wes McKinney <wesmck...@gmail.com>
>> >> wrote:
>> >> >>> >
>> >> >>> > > I haven't been able to draft a release blog post yet. We also
>> have
>> >> >>> > > more packaging work to do. I suggest we announce Thursday
>> morning
>> >> and
>> >> >>> > > try to get the packaging completed -- we have conda-forge done
>> as
>> >> of
>> >> >>> > > right now, but pip and Java need to get uploaded.
>> >> >>> > >
>> >> >>> > > On Tue, Mar 20, 2018 at 2:58 AM, Siddharth Teotia <
>> >> >>> siddha...@dremio.com>
>> >> >>> > > wrote:
>> >> >>> > > > FYI: Created a PR for website update.
>> >> >>> > > >
>> >> >>> > > > On Mon, Mar 19, 2018 at 3:38 PM, Phillip Cloud <
>> >> cpcl...@gmail.com>
>> >> >>> > > wrote:
>> >> >>> > > >
>> >> >>> > > >> Great! I'll volunteer to handle the conda-forge feedstock
>> >> updates.
>> >> >>> > > >>
>> >> >>> > > >> On Mon, Mar 19, 2018 at 6:09 PM Wes McKinney <
>> >> wesmck...@gmail.com>
>> >> >>> > > wrote:
>> >> >>> > > >>
>> >> >>> > > >> > With 4 binding +1 votes, 2 non-binding +1, and no other
>> >> votes, the
>> >> >>> > > >> > vote passes. Thanks all!
>> >> >>> > > >> >
>> >> >>> > > >> > Let's get busy updating the C++,

Re: [RESULT] [VOTE] Release Apache Arrow 0.9.0 (RC2)

2018-03-22 Thread Mitar
Hi!

The website seems to say that there is already a pyarrow 0.9.0
package, but it does not seem to be there yet:

https://arrow.apache.org/install/#python-wheels-on-pypi-unofficial
https://pypi.python.org/pypi/pyarrow

BTW, why are Python packages unofficial?


Mitar

On Thu, Mar 22, 2018 at 8:33 AM, Phillip Cloud <cpcl...@gmail.com> wrote:
> I'm working on updating the API docs.
>
> On Wed, Mar 21, 2018 at 10:24 PM Wes McKinney <wesmck...@gmail.com> wrote:
>
>> I have put up blog posts to go out tomorrow about the Go code donation
>> and the release
>>
>> https://github.com/apache/arrow/pull/1776
>> https://github.com/apache/arrow/pull/1777
>>
>> Would someone like to take a crack at updating the generated API
>> documentation?
>>
>> On Wed, Mar 21, 2018 at 10:42 AM, Wes McKinney <wesmck...@gmail.com>
>> wrote:
>> > If any items are missing from
>> >
>> https://github.com/apache/arrow/blob/master/dev/release/RELEASE_MANAGEMENT.md
>> ,
>> > let's definitely add them. I'd like the RM process to be reasonably
>> > fool-proof
>> >
>> > On Wed, Mar 21, 2018 at 10:07 AM, Phillip Cloud <cpcl...@gmail.com>
>> wrote:
>> >> Charles.Cloud
>> >>
>> >> On Wed, Mar 21, 2018 at 8:53 AM Uwe L. Korn <uw...@xhochy.com> wrote:
>> >>
>> >>> At least I have not. Philip, what is your login on pypi.python.org so
>> I
>> >>> can add you as a maintainer there?
>> >>>
>> >>> On Wed, Mar 21, 2018, at 1:49 PM, Phillip Cloud wrote:
>> >>> > Has anyone started on pip wheels yet? If not, I will start cranking
>> on it
>> >>> > today.
>> >>> >
>> >>> > On Tue, Mar 20, 2018, 22:47 Wes McKinney <wesmck...@gmail.com>
>> wrote:
>> >>> >
>> >>> > > I haven't been able to draft a release blog post yet. We also have
>> >>> > > more packaging work to do. I suggest we announce Thursday morning
>> and
>> >>> > > try to get the packaging completed -- we have conda-forge done as
>> of
>> >>> > > right now, but pip and Java need to get uploaded.
>> >>> > >
>> >>> > > On Tue, Mar 20, 2018 at 2:58 AM, Siddharth Teotia <
>> >>> siddha...@dremio.com>
>> >>> > > wrote:
>> >>> > > > FYI: Created a PR for website update.
>> >>> > > >
>> >>> > > > On Mon, Mar 19, 2018 at 3:38 PM, Phillip Cloud <
>> cpcl...@gmail.com>
>> >>> > > wrote:
>> >>> > > >
>> >>> > > >> Great! I'll volunteer to handle the conda-forge feedstock
>> updates.
>> >>> > > >>
>> >>> > > >> On Mon, Mar 19, 2018 at 6:09 PM Wes McKinney <
>> wesmck...@gmail.com>
>> >>> > > wrote:
>> >>> > > >>
>> >>> > > >> > With 4 binding +1 votes, 2 non-binding +1, and no other
>> votes, the
>> >>> > > >> > vote passes. Thanks all!
>> >>> > > >> >
>> >>> > > >> > Let's get busy updating the C++, Python, Java packages and
>> >>> updating
>> >>> > > >> > the website. I will be able to draft a 0.9.0 blog post for the
>> >>> website
>> >>> > > >> > later today or tomorrow morning. I suggest we announce the
>> >>> release on
>> >>> > > >> > Wednesday morning after we have a chance to move along the
>> binary
>> >>> > > >> > packaging process.
>> >>> > > >> >
>> >>> > > >> > Thanks
>> >>> > > >> > Wes
>> >>> > > >> >
>> >>> > > >> > On Mon, Mar 19, 2018 at 2:47 PM, Phillip Cloud <
>> cpcl...@gmail.com
>> >>> >
>> >>> > > >> wrote:
>> >>> > > >> > > Just verified on windows, all systems are go for launch.
>> >>> > > >> > >
>> >>> > > >> > > On Mon, Mar 19, 2018 at 12:51 PM Li Jin <
>> ice.xell...@gmail.com>
>> >>> > > wrote:
>> >>> > > >> > >
>> >>> > > >> > >> +1
>

[jira] [Created] (ARROW-2273) Cannot deserialize pandas SparseDataFrame

2018-03-06 Thread Mitar (JIRA)
Mitar created ARROW-2273:


 Summary: Cannot deserialize pandas SparseDataFrame
 Key: ARROW-2273
 URL: https://issues.apache.org/jira/browse/ARROW-2273
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.9.0
Reporter: Mitar


>>> import pyarrow
>>> import pandas
>>> a = pandas.SparseDataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})
>>> pyarrow.deserialize(pyarrow.serialize(a).to_buffer())
Traceback (most recent call last):
  File "", line 1, in 
  File "serialization.pxi", line 441, in pyarrow.lib.deserialize
  File "serialization.pxi", line 404, in pyarrow.lib.deserialize_from
  File "serialization.pxi", line 257, in 
pyarrow.lib.SerializedPyObject.deserialize
  File "serialization.pxi", line 174, in 
pyarrow.lib.SerializationContext._deserialize_callback
  File 
".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/serialization.py", 
line 77, in _deserialize_pandas_dataframe
return pdcompat.serialized_dict_to_dataframe(data)
  File 
".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
line 450, in serialized_dict_to_dataframe
for block in data['blocks']]
  File 
".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
line 450, in 
for block in data['blocks']]
  File 
".../.virtualenv/arrow/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
line 478, in _reconstruct_block
block = _int.make_block(block_arr, placement=placement)
  File 
".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", 
line 2957, in make_block
return klass(values, ndim=ndim, fastpath=fastpath, placement=placement)
  File 
".../.virtualenv/arrow/lib/python3.6/site-packages/pandas/core/internals.py", 
line 120, in __init__
len(self.mgr_locs)))
ValueError: Wrong number of items passed 3, placement implies 1




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2269) Cannot build bdist_wheel for Python

2018-03-06 Thread Mitar (JIRA)
Mitar created ARROW-2269:


 Summary: Cannot build bdist_wheel for Python
 Key: ARROW-2269
 URL: https://issues.apache.org/jira/browse/ARROW-2269
 Project: Apache Arrow
  Issue Type: Bug
  Components: Packaging
Affects Versions: 0.9.0
Reporter: Mitar


I am trying current master.

I ran:

{{python setup.py build_ext --build-type=$ARROW_BUILD_TYPE --with-parquet 
--with-plasma --bundle-arrow-cpp bdist_wheel }}

Output:

{{running build_ext creating build creating build/temp.linux-x86_64-3.6 -- 
Runnning cmake for pyarrow cmake 
-DPYTHON_EXECUTABLE=.../Temp/arrow/pyarrow/bin/python 
-DPYARROW_BUILD_PARQUET=on -DPYARROW_BOOST_USE_SHARED=on 
-DPYARROW_BUILD_PLASMA=on -DPYARROW_BUNDLE_ARROW_CPP=ON 
-DCMAKE_BUILD_TYPE=release .../Temp/arrow/arrow/python -- The C compiler 
identification is GNU 7.2.0 -- The CXX compiler identification is GNU 7.2.0 -- 
Check for working C compiler: /usr/bin/cc -- Check for working C compiler: 
/usr/bin/cc -- works -- Detecting C compiler ABI info -- Detecting C compiler 
ABI info - done -- Detecting C compile features -- Detecting C compile features 
- done -- Check for working CXX compiler: /usr/bin/c++ -- Check for working CXX 
compiler: /usr/bin/c++ -- works -- Detecting CXX compiler ABI info -- Detecting 
CXX compiler ABI info - done -- Detecting CXX compile features -- Detecting CXX 
compile features - done INFOCompiler command: /usr/bin/c++ INFOCompiler 
version: Using built-in specs. COLLECT_GCC=/usr/bin/c++ 
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/7/lto-wrapper 
OFFLOAD_TARGET_NAMES=nvptx-none OFFLOAD_TARGET_DEFAULT=1 Target: 
x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Ubuntu 
7.2.0-8ubuntu3.2' --with-bugurl=file:///usr/share/doc/gcc-7/README.Bugs 
--enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --prefix=/usr 
--with-gcc-major-version-only --program-suffix=-7 
--program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id 
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix 
--libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu 
--enable-libstdcxx-debug --enable-libstdcxx-time=yes 
--with-default-libstdcxx-abi=new --enable-gnu-unique-object 
--disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie 
--with-system-zlib --with-target-system-zlib --enable-objc-gc=auto 
--enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 
--with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic 
--enable-offload-targets=nvptx-none --without-cuda-driver 
--enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu 
--target=x86_64-linux-gnu Thread model: posix gcc version 7.2.0 (Ubuntu 
7.2.0-8ubuntu3.2) INFOCompiler id: GNU Selected compiler gcc 7.2.0 -- 
Performing Test CXX_SUPPORTS_SSE3 -- Performing Test CXX_SUPPORTS_SSE3 - 
Success -- Performing Test CXX_SUPPORTS_ALTIVEC -- Performing Test 
CXX_SUPPORTS_ALTIVEC - Failed Configured for RELEASE build (set with cmake 
-DCMAKE_BUILD_TYPE=\{release,debug,...}) -- Build Type: RELEASE -- Build output 
directory: .../Temp/arrow/arrow/python/build/temp.linux-x86_64-3.6/release/ -- 
Found PythonInterp: .../Temp/arrow/pyarrow/bin/python (found version "3.6.3") 
-- Searching for Python libs in 
.../Temp/arrow/pyarrow/lib64;.../Temp/arrow/pyarrow/lib;/usr/lib/python3.6/config-3.6m-x86_64-linux-gnu
 -- Looking for python3.6m -- Found Python lib 
/usr/lib/python3.6/config-3.6m-x86_64-linux-gnu/libpython3.6m.so -- Found 
PythonLibs: /usr/lib/python3.6/config-3.6m-x86_64-linux-gnu/libpython3.6m.so -- 
Found NumPy: version "1.14.1" 
.../Temp/arrow/pyarrow/lib/python3.6/site-packages/numpy/core/include -- 
Searching for Python libs in 
.../Temp/arrow/pyarrow/lib64;.../Temp/arrow/pyarrow/lib;/usr/lib/python3.6/config-3.6m-x86_64-linux-gnu
 -- Looking for python3.6m -- Found Python lib 
/usr/lib/python3.6/config-3.6m-x86_64-linux-gnu/libpython3.6m.so -- Found 
PkgConfig: /usr/bin/pkg-config (found version "0.29.1") -- Checking for module 
'arrow' -- Found arrow, version 0.9.0-SNAPSHOT -- Arrow ABI version: 0.0.0 -- 
Arrow SO version: 0 -- Found the Arrow core library: 
.../Temp/arrow/dist/lib/libarrow.so -- Found the Arrow Python library: 
.../Temp/arrow/dist/lib/libarrow_python.so -- Boost version: 1.63.0 -- Found 
the following Boost libraries: -- system -- filesystem -- regex Added shared 
library dependency arrow: .../Temp/arrow/dist/lib/libarrow.so Added shared 
library dependency arrow_python: .../Temp/arrow/dist/lib/libarrow_python.so -- 
Found the Parquet library: .../Temp/arrow/dist/lib/libparquet.so Added shared 
library dependency parquet: .../Temp/arrow/dist/lib/libparquet.so -- Checking 
for module 'plasma' -- Found plasma, version -- Plasma ABI version: 0.0.0 -- 
Plasma SO version: 0 -- Found the Plasma core library: 
.../Temp/arrow/

[jira] [Created] (ARROW-2264) Efficiently serialize numpy arrays with dtype of unicode fixed length string

2018-03-05 Thread Mitar (JIRA)
Mitar created ARROW-2264:


 Summary: Efficiently serialize numpy arrays with dtype of unicode 
fixed length string
 Key: ARROW-2264
 URL: https://issues.apache.org/jira/browse/ARROW-2264
 Project: Apache Arrow
  Issue Type: Improvement
Affects Versions: 0.8.0
Reporter: Mitar


Looking at the numpy array serialization code it seems that if I have a dtype 
like "<U3" this will go through custom ndarray serializer and not through an 
efficient one.

{{Example:}}{{>>> np.array(['aaa', 'bbb'])}}
{{array(['aaa', 'bbb'], dtype='<U3')}}

This should be able to work, no? It has fixed offsets and memory layout.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


How to properly serialize subclasses of supported classes

2018-03-04 Thread Mitar
Hi!

I have a subclass of numpy and another of pandas which add a metadata
attribute to them. Moreover, I have a subclass of typing.List as a
Python generic with this metadata attribute as well.

Now, it seems if I serialize this to plasma store and back I get
standard numpy, pandas, or list back, respectively.

My question is: how can I make it so that proper subclasses are
returned, including the custom metadata attribute?

I tried to use pyarrow_lib._default_serialization_context.register_type
but it does not seem to work. Moreover, I still worry that even if I
create a serialization for a custom class, if anyone makes a subclass
and tries to store it plasma store they will get back the custom class
and not a subclass.

This is how I am testing:

https://gitlab.com/datadrivendiscovery/metadata/blob/plasma/tests/test_plasma.py#L50

And here is the code for custom numpy class and attempt at registering
custom serialization:

https://gitlab.com/datadrivendiscovery/metadata/blob/plasma/d3m_metadata/container/numpy.py#L135

It looks like custom serialization is not called.


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m


[jira] [Created] (ARROW-2250) plasma_store process should cleanup on TERM signal as well

2018-03-03 Thread Mitar (JIRA)
Mitar created ARROW-2250:


 Summary: plasma_store process should cleanup on TERM signal as well
 Key: ARROW-2250
 URL: https://issues.apache.org/jira/browse/ARROW-2250
 Project: Apache Arrow
  Issue Type: Improvement
Affects Versions: 0.8.0
Reporter: Mitar


Currently it cleans up on INT signal. But if it gets the TERM signal, then it 
kills the parent process (Python one) but not the binary process. I think both 
TERM and INT signals should be handled the same.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-1664) Support for xarray.DataArray and xarray.Dataset

2017-10-10 Thread Mitar (JIRA)
Mitar created ARROW-1664:


 Summary: Support for xarray.DataArray and xarray.Dataset
 Key: ARROW-1664
 URL: https://issues.apache.org/jira/browse/ARROW-1664
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Mitar


DataArray and Dataset are efficient in-memory representations for multi 
dimensional data. It would be great if one could share them between processes 
using Arrow.

http://xarray.pydata.org/en/stable/generated/xarray.DataArray.html#xarray.DataArray
http://xarray.pydata.org/en/stable/generated/xarray.Dataset.html#xarray.Dataset




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)