Uniform types in Arrow table columns (pyarrow.array) and the case of python dictionaries

2018-01-21 Thread simba nyatsanga
Hi Everyone, I've got two questions that I'd like help with: 1. Pandas and numpy arrays can handle multiple types in a sequence eg. a float and a string by using the dtype=object. From what I gather, Arrow arrays enforce a uniform type depending on the type of the first encountered element in a

Re: Exploring the possibility of creating a persistent cache by arrow/plasma

2018-01-21 Thread Philipp Moritz
Note that for the Python bindings, the reference counting is done automatically, see https://github.com/apache/arrow/blob/master/python/pyarrow/plasma.pyx#L182 which is e.g. used as the base object for numpy arrays whose memory is backed by the object store. On Sun, Jan 21, 2018 at 4:21 PM,

Re: Exploring the possibility of creating a persistent cache by arrow/plasma

2018-01-21 Thread Robert Nishihara
Evicted objects are gone for good, although it would certainly be possible to add the ability to persist them to disk. The Plasma store does reference counting to figure out which clients are using which objects. Clients can "release" objects through the client API to decrement the reference

Re: Exploring the possibility of creating a persistent cache by arrow/plasma

2018-01-21 Thread Mike Sam
Great, thank you very much. What happens to the evicted objects? are they gone for good or are they persisted locally? Also, what defines "objects that are not currently in use by any client"? reference counting? On Sat, Jan 20, 2018 at 1:53 PM, Robert Nishihara

[jira] [Created] (ARROW-2016) [Python] Fix up ASV benchmarking setup and document procedure for use

2018-01-21 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-2016: --- Summary: [Python] Fix up ASV benchmarking setup and document procedure for use Key: ARROW-2016 URL: https://issues.apache.org/jira/browse/ARROW-2016 Project: Apache