The one you just opened seems like a good first issue https://issues.apache.org/jira/browse/ARROW-8070
If you follow the instructions in https://github.com/apache/arrow/blob/master/docs/source/developers/python.rst and can't get thing to build please let us know the details so we can help you On Wed, Mar 11, 2020 at 6:06 PM Nugent, Daniel <[email protected]> wrote: > > Thanks for closing this out! > > Sorry I didn't get around to working on this before you ended up putting it > in. I had some difficulty getting the dev environment set up and limited time > to work on it. > > Is there a list of good first issues to take a crack at? I've really > appreciated the project overall and would like to help out in the time I can. > > -Dan Nugent > > -----Original Message----- > From: Wes McKinney <[email protected]> > Sent: Saturday, March 7, 2020 10:55 AM > To: [email protected] > Subject: [EXTERNAL] Re: Question about memoryviews and array construction > > There's a couple places to start > > * Add PyMemoryView type check to internal::IsPyBinary > https://github.com/apache/arrow/blob/master/cpp/src/arrow/python/helpers.h#L80. > I think this is all that's needed to take care of type inference > * Make sure PyMemoryView is handled in the PyBytesView helper in > https://github.com/apache/arrow/blob/master/cpp/src/arrow/python/common.h#L193 > > On Sat, Mar 7, 2020 at 9:35 AM Daniel Nugent <[email protected]> wrote: > > > > Great! > > > > If you could provide a smidgen of guidance about where to start making this > > change, I would be happy to give it a shot. > > > > Thanks, > > > > -Dan Nugent > > On Mar 7, 2020, 09:18 -0500, Wes McKinney <[email protected]>, wrote: > > > > hi Dan, > > > > Yes, we should support constructing StringArray directly from > > memoryview as we do with bytes and unicode -- you're the first person > > to ask about this so far. I opened > > https://issues.apache.org/jira/browse/ARROW-8026. This should not be a > > huge amount of work so would be a good first contribution to the > > project > > > > Thanks > > > > Wes > > > > On Fri, Mar 6, 2020 at 8:29 PM Nugent, Daniel <[email protected]> wrote: > > > > > > Hi, > > > > > > > > I have a short program which I’m wondering about the sensibility of. Could > > anyone let me know if this is reasonable or not: > > > > > > > > import pyarrow as pa, third_party_library > > > > > > memory_views = third_party_library.get_strings() > > > > > > memory_views > > > > > > [<memory at 0x7f1745cc0870>, <memory at 0x7f1745cc0940>, <memory at > > 0x7f1745cc0a10>, <memory at 0x7f1745cc0ae0>] > > > > pa.array(memory_views,pa.string()) > > > > > > Traceback (most recent call last): > > > > File "<stdin>", line 1, in <module> > > > > File "pyarrow/array.pxi", line 269, in pyarrow.lib.array > > > > File "pyarrow/array.pxi", line 38, in pyarrow.lib._sequence_to_array > > > > File "pyarrow/error.pxi", line 107, in pyarrow.lib.check_status > > > > pyarrow.lib.ArrowTypeError: Expected a string or bytes object, got a > > 'memoryview' object > > > > pa.array(map(bytes,memory_views),pa.string()) > > > > > > <pyarrow.lib.StringArray object at 0x7f1745cbdd00> > > > > [ > > > > "this", > > > > "is", > > > > "a", > > > > "sample" > > > > ] > > > > > > > > I have a big list of byte sequences being provided to me as memoryviews > > from a third party library. I’d like to create an Arrow StringArray from > > them as efficiently as possible. Having to map and consequently copy them > > through a bytes constructor seems not great (and the memoryview tobytes > > function appears to just call the bytes constructor, afaict). > > > > > > > > To me, it seemed like pa.array should be able to use the memoryview objects > > directly in order to construct the StringArray, but it seems like Arrow > > wants them copied into fresh byte objects first. I don’t know if I > > understand why and was ultimately wondering if it’s a reasonable thing to > > desire. > > > > > > > > Thanks in advance, > > > > -Dan Nugent > > > > > > > > > > ###################################################################### > > > > The information contained in this communication is confidential and > > > > may contain information that is privileged or exempt from disclosure > > > > under applicable law. If you are not a named addressee, please notify > > > > the sender immediately and delete this email from your system. > > > > If you have received this communication, and are not a named > > > > recipient, you are hereby notified that any dissemination, > > > > distribution or copying of this communication is strictly prohibited. > > > > ###################################################################### > > > ###################################################################### > > The information contained in this communication is confidential and > > may contain information that is privileged or exempt from disclosure > > under applicable law. If you are not a named addressee, please notify > > the sender immediately and delete this email from your system. > > If you have received this communication, and are not a named > > recipient, you are hereby notified that any dissemination, > > distribution or copying of this communication is strictly prohibited. > > ######################################################################
