Re: Question about memoryviews and array construction

Wes McKinney Sat, 07 Mar 2020 07:56:07 -0800

There's a couple places to start

* Add PyMemoryView type check to internal::IsPyBinary
https://github.com/apache/arrow/blob/master/cpp/src/arrow/python/helpers.h#L80.
I think this is all that's needed to take care of type inference
* Make sure PyMemoryView is handled in the PyBytesView helper in
https://github.com/apache/arrow/blob/master/cpp/src/arrow/python/common.h#L193


On Sat, Mar 7, 2020 at 9:35 AM Daniel Nugent <[email protected]> wrote:
>
> Great!
>
> If you could provide a smidgen of guidance about where to start making this 
> change, I would be happy to give it a shot.
>
> Thanks,
>
> -Dan Nugent
> On Mar 7, 2020, 09:18 -0500, Wes McKinney <[email protected]>, wrote:
>
> hi Dan,
>
> Yes, we should support constructing StringArray directly from
> memoryview as we do with bytes and unicode -- you're the first person
> to ask about this so far. I opened
> https://issues.apache.org/jira/browse/ARROW-8026. This should not be a
> huge amount of work so would be a good first contribution to the
> project
>
> Thanks
>
> Wes
>
> On Fri, Mar 6, 2020 at 8:29 PM Nugent, Daniel <[email protected]> wrote:
>
>
> Hi,
>
>
>
> I have a short program which I’m wondering about the sensibility of. Could 
> anyone let me know if this is reasonable or not:
>
>
>
> import pyarrow as pa, third_party_library
>
>
> memory_views = third_party_library.get_strings()
>
>
> memory_views
>
>
> [<memory at 0x7f1745cc0870>, <memory at 0x7f1745cc0940>, <memory at 
> 0x7f1745cc0a10>, <memory at 0x7f1745cc0ae0>]
>
> pa.array(memory_views,pa.string())
>
>
> Traceback (most recent call last):
>
> File "<stdin>", line 1, in <module>
>
> File "pyarrow/array.pxi", line 269, in pyarrow.lib.array
>
> File "pyarrow/array.pxi", line 38, in pyarrow.lib._sequence_to_array
>
> File "pyarrow/error.pxi", line 107, in pyarrow.lib.check_status
>
> pyarrow.lib.ArrowTypeError: Expected a string or bytes object, got a 
> 'memoryview' object
>
> pa.array(map(bytes,memory_views),pa.string())
>
>
> <pyarrow.lib.StringArray object at 0x7f1745cbdd00>
>
> [
>
> "this",
>
> "is",
>
> "a",
>
> "sample"
>
> ]
>
>
>
> I have a big list of byte sequences being provided to me as memoryviews from 
> a third party library. I’d like to create an Arrow StringArray from them as 
> efficiently as possible. Having to map and consequently copy them through a 
> bytes constructor seems not great (and the memoryview tobytes function 
> appears to just call the bytes constructor, afaict).
>
>
>
> To me, it seemed like pa.array should be able to use the memoryview objects 
> directly in order to construct the StringArray, but it seems like Arrow wants 
> them copied into fresh byte objects first. I don’t know if I understand why 
> and was ultimately wondering if it’s a reasonable thing to desire.
>
>
>
> Thanks in advance,
>
> -Dan Nugent
>
>
>
>
> ######################################################################
>
> The information contained in this communication is confidential and
>
> may contain information that is privileged or exempt from disclosure
>
> under applicable law. If you are not a named addressee, please notify
>
> the sender immediately and delete this email from your system.
>
> If you have received this communication, and are not a named
>
> recipient, you are hereby notified that any dissemination,
>
> distribution or copying of this communication is strictly prohibited.
>
> ######################################################################

Re: Question about memoryviews and array construction

Reply via email to