Re: How to properly serialize subclasses of supported classes

2018-03-05 Thread Robert Nishihara
We just chatted offline. Should be fixed by
https://github.com/apache/arrow/pull/1704.

On Mon, Mar 5, 2018 at 3:42 AM Mitar  wrote:

> Hi!
>
> You mean, this explains why a subclass of list is not being matched? Maybe.
>
> But I do not get why my custom serialization for ndarray subclass is
> never called.
>
> Or how hard would it be to automatically serialize/deserialize into
> subclasses so that I would not have to have a custom serialization for
> ndarray but the existing ndarray serialization would work, casting it
> into a proper subclass.
>
>
> Mitar
>
> On Sun, Mar 4, 2018 at 2:39 PM, Robert Nishihara
>  wrote:
> > The issue is probably this line
> >
> >
> https://github.com/apache/arrow/blob/8b1c8118b017a941f0102709d72df7e5a9783aa4/cpp/src/arrow/python/python_to_arrow.cc#L504
> >
> > which uses PyList_Check instead of PyList_CheckExact. Changing it to the
> > exact form will cause it to use the custom serializer for subclasses of
> > list.
> >
> > On Sun, Mar 4, 2018 at 1:08 AM Mitar  wrote:
> >>
> >> Hi!
> >>
> >> I have a subclass of numpy and another of pandas which add a metadata
> >> attribute to them. Moreover, I have a subclass of typing.List as a
> >> Python generic with this metadata attribute as well.
> >>
> >> Now, it seems if I serialize this to plasma store and back I get
> >> standard numpy, pandas, or list back, respectively.
> >>
> >> My question is: how can I make it so that proper subclasses are
> >> returned, including the custom metadata attribute?
> >>
> >> I tried to use pyarrow_lib._default_serialization_context.register_type
> >> but it does not seem to work. Moreover, I still worry that even if I
> >> create a serialization for a custom class, if anyone makes a subclass
> >> and tries to store it plasma store they will get back the custom class
> >> and not a subclass.
> >>
> >> This is how I am testing:
> >>
> >>
> >>
> https://gitlab.com/datadrivendiscovery/metadata/blob/plasma/tests/test_plasma.py#L50
> >>
> >> And here is the code for custom numpy class and attempt at registering
> >> custom serialization:
> >>
> >>
> >>
> https://gitlab.com/datadrivendiscovery/metadata/blob/plasma/d3m_metadata/container/numpy.py#L135
> >>
> >> It looks like custom serialization is not called.
> >>
> >>
> >> Mitar
> >>
> >> --
> >> http://mitar.tnode.com/
> >> https://twitter.com/mitar_m
>
>
>
> --
> http://mitar.tnode.com/
> https://twitter.com/mitar_m
>


How to properly serialize subclasses of supported classes

2018-03-04 Thread Mitar
Hi!

I have a subclass of numpy and another of pandas which add a metadata
attribute to them. Moreover, I have a subclass of typing.List as a
Python generic with this metadata attribute as well.

Now, it seems if I serialize this to plasma store and back I get
standard numpy, pandas, or list back, respectively.

My question is: how can I make it so that proper subclasses are
returned, including the custom metadata attribute?

I tried to use pyarrow_lib._default_serialization_context.register_type
but it does not seem to work. Moreover, I still worry that even if I
create a serialization for a custom class, if anyone makes a subclass
and tries to store it plasma store they will get back the custom class
and not a subclass.

This is how I am testing:

https://gitlab.com/datadrivendiscovery/metadata/blob/plasma/tests/test_plasma.py#L50

And here is the code for custom numpy class and attempt at registering
custom serialization:

https://gitlab.com/datadrivendiscovery/metadata/blob/plasma/d3m_metadata/container/numpy.py#L135

It looks like custom serialization is not called.


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m