GitHub user metalmatze edited a discussion: [JS] Decoding `Dictionary<Uint32, 
Utf8>` incorrectly.

Hey everyone! :wave: 

We have a problem with decoding a `Dictionary<Uint32, Utf8>` in arrow-js. 

We roughly this array: 
```js 
[
        "gke-europe-west3-0-preemptible-t2d-st-ec27d3db-pwwz"
        "gke-europe-west3-0-preemptible-t2d-st-add19435-w74v"
        "gke-europe-west3-0-preemptible-t2d-st-717888db-nrfr"
]
```
but we get this: 
```js
[
        "gke-europe-west3-0-preemptible-t2d-st-ec27d3db-pwwz", 
        "gke-europe-west3-0-preemptible-t2d-st-ec27d3db-pwwz",
        null
]
```

It seems like we're reading the same string twice, and then in the second 
batch, we read `NULL`. Something along those lines.
Something with either decoding the dictionaries is broken, or reading the 
dictionary back into JS. 

## Reproducing

For us, there's a stream with incoming Arrow Flight Data chunks. 
We have validated that the backend Go code is returning the correct payload. 

The Flight Data is transformed into chunks using similar code to this: 
https://github.com/lancedb/flight-sql-js-client/blob/decae7347da105554957edac5dc53a7f192c06dc/src/arrow_util.ts#L28-L53

The chunks are base64 encoded for reproducibility in the unit test. 

```ts 
import {tableFromIPC,StructRow} from "apache-arrow";

describe('Arrow Reader', () => {
    const chunksDistinct = [
        
'3AAAABAAAAAAAAoADAAKAAkABAAKAAAAEAAAAAABBAAIAAgAAAAEAAgAAAAEAAAAAQAAABQAAAAQABQAEAAOAA8ABAAAAAgAEAAAABgAAAAMAAAAAAABDXAAAAABAAAAGAAAALD///8QABgAFAAOAA8ABAAQAAgAEAAAADwAAAAwAAAAAAABBRAAAAAwAAAACAAKAAAABAAIAAAADAAAAAAABgAIAAQABgAAACAAAAAAAAAABAAEAAQAAAAEAAAAbm9kZQAAAAATAAAAYXR0cmlidXRlc19yZXNvdXJjZQA=',
        
'qAAAABAAAAAMABgAFgAVAAQACAAMAAAAHAAAAMAAAAAAAAAAAAAAAAACBAAIAAoAAAAEAAgAAAAQAAAAAAAKABgADAAIAAQACgAAACwAAAAQAAAAAQAAAAAAAAAAAAAAAQAAAAEAAAAAAAAAAAAAAAAAAAAAAAAAAwAAAAAAAAAAAAAAAQAAAAAAAABAAAAAAAAAAAgAAAAAAAAAgAAAAAAAAAAzAAAAAAAAAP8AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAMwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAZ2tlLWV1cm9wZS13ZXN0My0wLXByZWVtcHRpYmxlLXQyZC1zdC1lYzI3ZDNkYi1wd3d6AAAAAAAAAAAAAAAAAA==',
        
'qAAAABAAAAAMABoAGAAXAAQACAAMAAAAIAAAAMAAAAAAAAAAAAAAAAAAAAMEAAoAGAAMAAgABAAKAAAAPAAAABAAAAABAAAAAAAAAAAAAAACAAAAAQAAAAAAAAAAAAAAAAAAAAEAAAAAAAAAAAAAAAAAAAAAAAAAAwAAAAAAAAAAAAAAAQAAAAAAAABAAAAAAAAAAAEAAAAAAAAAgAAAAAAAAAAEAAAAAAAAAP8AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAD/AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA==',
        
'qAAAABAAAAAMABgAFgAVAAQACAAMAAAAHAAAAAABAAAAAAAAAAAAAAACBAAIAAoAAAAEAAgAAAAQAAAAAAAKABgADAAIAAQACgAAACwAAAAQAAAAAgAAAAAAAAAAAAAAAQAAAAIAAAAAAAAAAAAAAAAAAAAAAAAAAwAAAAAAAAAAAAAAAQAAAAAAAABAAAAAAAAAAAwAAAAAAAAAgAAAAAAAAABmAAAAAAAAAP8AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAMwAAAGYAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAZ2tlLWV1cm9wZS13ZXN0My0wLXByZWVtcHRpYmxlLXQyZC1zdC1hZGQxOTQzNS13NzR2Z2tlLWV1cm9wZS13ZXN0My0wLXByZWVtcHRpYmxlLXQyZC1zdC03MTc4ODhkYi1ucmZyAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=',
        
'qAAAABAAAAAMABoAGAAXAAQACAAMAAAAIAAAAMAAAAAAAAAAAAAAAAAAAAMEAAoAGAAMAAgABAAKAAAAPAAAABAAAAACAAAAAAAAAAAAAAACAAAAAgAAAAAAAAAAAAAAAAAAAAIAAAAAAAAAAAAAAAAAAAAAAAAAAwAAAAAAAAAAAAAAAQAAAAAAAABAAAAAAAAAAAEAAAAAAAAAgAAAAAAAAAAIAAAAAAAAAP8AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAD/AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA==',
    ];

    describe('decode SELECT DISTINCT', () => {
        const chunks: Buffer[] = [];
        const chunksString: string[] = [];
        chunksDistinct.forEach((b64) => {
            const decoded = Buffer.from(b64, 'base64');
            chunks.push(decoded);
            // for debugging
            chunksString.push(new TextDecoder().decode(decoded));
        });

        console.log(chunksString);

        // Using RecordBatchReader makes no difference
        // const reader = RecordBatchReader.from(chunks)
        // for (const batch of reader) {
        //     console.log(batch.length);
        //
        // }


        const table = tableFromIPC(chunks);

        it('should have the correct data', () => {
            expect(table.numRows).toBe(3);
            expect(table.numCols).toBe(1);
            
expect(table.schema.fields[0].toString()).toBe('attributes_resource: 
Struct<{node:Dictionary<Uint32, Utf8>}>');

            const nodes = [];
            table.toArray().forEach((row: StructRow) => {
                nodes.push(row['attributes_resource']['node']);
            })

            
expect(nodes).toContain("gke-europe-west3-0-preemptible-t2d-st-ec27d3db-pwwz",);
            
expect(nodes).toContain("gke-europe-west3-0-preemptible-t2d-st-add19435-w74v",);
            
expect(nodes).toContain("gke-europe-west3-0-preemptible-t2d-st-717888db-nrfr",);
        });
    });
});
```

Using my debugger, I can introspect these chunks. 

![image](https://github.com/user-attachments/assets/97c831f8-83c8-4ff6-ab1b-c8fbc9d5e664)

The expected strings are inside these chunks. 

_Just a thought: Maybe we should ignore chunks at index 2 and 4?_

The unit test fails:
```
Error: expect(received).toContain(expected) // indexOf

Expected value: "gke-europe-west3-0-preemptible-t2d-st-add19435-w74v"
Received array: ["gke-europe-west3-0-preemptible-t2d-st-ec27d3db-pwwz", 
"gke-europe-west3-0-preemptible-t2d-st-ec27d3db-pwwz", null]

    at Object.<anonymous> (arrow.test.ts:45:27)
```

I hope the code above is able to run on your machine.
Please let me know if you have follow-up questions! 

Does arrow-js have a Slack or Discord? I wasn't able to find anything.

Thank you!

GitHub link: https://github.com/apache/arrow/discussions/46100

----
This is an automatically sent email for user@arrow.apache.org.
To unsubscribe, please send an email to: user-unsubscr...@arrow.apache.org

Reply via email to