Re: [Rust] Dictionary encoding and Flight

2020-04-09 Thread Wes McKinney
Well, luckily we have some newly spruced up documentation about how integration testing works (thanks Neal!) https://github.com/apache/arrow/blob/master/docs/source/format/Integration.rst The main task is writing a parser for the JSON format used for integration testing. The JSON is used to commu

Re: [Rust] Dictionary encoding and Flight

2020-04-09 Thread Paul Dix
I'd be happy to pitch in on getting the integration tests developed. It would certainly beat my current method of building and running my test project and switching over to a Jupyter notebook to manually check it. Is there any prior work in the Rust project that I could basically copy from? Or per

Re: [Rust] Dictionary encoding and Flight

2020-04-09 Thread Wes McKinney
hi Paul, Dictionary-encoded is not a nested type, so there shouldn't be any children -- the IPC layout of a dictionary encoded field is that same as the type of the indices (probably want to change the terminology in the Rust library from "keys" to "indices" which is what's used in the specificati

Re: [Rust] Dictionary encoding and Flight

2020-04-09 Thread Paul Dix
I managed to get something up and running. I ended up creating a dictionary_batch.rs and adding that to convert.rs to translate dictionary fields in a schema over to the correct fb thing. I also added a method to writer.rs to convert that to bytes so it can be sent via ipc. However, when writing ou

Re: [Rust] Dictionary encoding and Flight

2020-04-07 Thread Wes McKinney
As another item for consideration -- in C++ at least, the dictionary id is dealt with as an internal detail of the IPC message production process. When serializing the Schema, id's are assigned to each dictionary-encoded field in the DictionaryMemo object, see https://github.com/apache/arrow/blob/

Re: [Rust] Dictionary encoding and Flight

2020-04-07 Thread Wes McKinney
hey Paul, Take a look at how dictionaries work in the IPC protocol https://github.com/apache/arrow/blob/master/docs/source/format/Columnar.rst#serialization-and-interprocess-communication-ipc Dictionaries are sent as separate messages. When a field is tagged as dictionary encoded in the schema,

[Rust] Dictionary encoding and Flight

2020-04-07 Thread Paul Dix
Hello, I'm trying to build a Rust based Flight server and I'd like to use Dictionary encoding for a number of string columns in my data. I've seen that StringDictionary was recently added to Rust here: https://github.com/apache/arrow/commit/c7a7d2dcc46ed06593b994cb54c5eaf9ccd1d21d#diff-72812e308734