Hi Yakov, You might try e-mailing the dev@ mailing list to see if anyone responds there. I'm not sure how many javascript devs are subscribed here.
Cheers, Micah On Thu, Jul 16, 2020 at 6:48 PM Yakov Galka <[email protected]> wrote: > Hi All, > > I have code that creates a table with string columns as follows: > > for(/* each column */) { > // ... > column_vectors.push(Vector.new(Data.Utf8(new Utf8(), 0, element_count, > null_count, nullmap_buffer, offsets_buffer, data_buffer))); > } > const arrow_table = Table.new(column_vectors, column_names); > const data = arrow_table.serialize('binary', false).buffer; > const arrow_table2 = Table.from([new Uint8Array(data)]); > > Here offsets_buffer is a Int32Array with the offsets and data_buffer is a > Uint8Array with the strings, in accordance to the Arrow format described in > https://arrow.apache.org/docs/format/Columnar.html. > > I am trying to change this to use a dictionary encoding instead. I change > the producer of the data to return only the unique strings in data_buffer > and offsets_buffer, and additionally produce an interned_buffer > (Int32Array) with the indices of the strings. However I couldn't find how > to initialize the column in Javascript. > > Shooting in the dark, I tried: > > for(/* each column */) { > // ... > const dictionary = Vector.new(Data.Utf8(new Utf8(), 0, > offsets_buffer.length - 1, 0, 0, offsets_buffer, data_buffer)); > column_vectors.push(Vector.new(Data.Dictionary(new Dictionary(new > Utf8(), new Int32()), 0, element_count, null_count, nullmap_buffer, 0, > interned_buffer, dictionary))); > } > // ... > > However, this causes the deserialization (Table.from) to fail with: > > TypeError: undefined has no properties > visitUtf8 > visit > visit > visitMany > map > visitMany > _loadVectors > _loadDictionaryBatch > _readDictionaryBatch > open > open > from > > What's the correct way of creating a dictionary encoded column? > > Yakov Galka > http://stannum.io/ >
