Indexing, encoding, transformations and processing with PyArrow - GitHub 6284

Athanassios I. Hatzis Mon, 27 Jan 2020 06:56:19 -0800

Hi, recently I have started experimenting with PyArrow for the needs of my 
TRIADB project. Kudos to
Wes and his team on leading one of the best open-source IT projects in data 
engineering. Definitely
a wise decision to continue the success story of Pandas on the right track !


At this stage I am trying to make a new release of TRIADB that will handle 
metadata management and
fast ingestion of data in memory for transformations and basic query 
operations. 

Secondary index, dictionary encoding and adjacency lists are a core part of 
TRIADB project, that is
the reason I posted the issue with Array.dictionary_encode method (
https://github.com/apache/arrow/issues/6284). Isn't my example and description
clear ? What exactly would you like me to elaborate on ?

I also noticed that there is NumPy integration and you can convert easily from 
NumPy to Arrow but
the reverse direction has several limitations. For example I cannot create view 
for StringArray
(NotImplementedError: NumPy array view is only supported for primitive types). 
But string() (utf8) 
is in the list of your primitive types. Any plans for supporting this type with 
NumPy soon ?

Kind regards
Athanassios

Indexing, encoding, transformations and processing with PyArrow - GitHub 6284

Reply via email to