[ https://issues.apache.org/jira/browse/ARROW-10121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wes McKinney resolved ARROW-10121. ---------------------------------- Resolution: Fixed Issue resolved by pull request 8302 [https://github.com/apache/arrow/pull/8302] > [C++][Python] Variable dictionaries do not survive roundtrip to IPC stream > -------------------------------------------------------------------------- > > Key: ARROW-10121 > URL: https://issues.apache.org/jira/browse/ARROW-10121 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python > Reporter: Wes McKinney > Assignee: Antoine Pitrou > Priority: Blocker > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 2h > Remaining Estimate: 0h > > Failing test case (from dev@ > https://lists.apache.org/thread.html/r338942b4e9f9316b48e87aab41ac49c7ffedd45733d4a6349523b7eb%40%3Cdev.arrow.apache.org%3E) > {code} > import pyarrow as pa > from io import BytesIO > pa.__version__ > schema = pa.schema([pa.field('foo', pa.int32()), pa.field('bar', > pa.dictionary(pa.int32(), pa.string()))] ) > r1 = pa.record_batch( > [ > [1, 2, 3, 4, 5], > pa.array(["a", "b", "c", "d", "e"]).dictionary_encode() > ], > schema > ) > r1.validate() > r2 = pa.record_batch( > [ > [1, 2, 3, 4, 5], > pa.array(["c", "c", "e", "f", "g"]).dictionary_encode() > ], > schema > ) > r2.validate() > assert r1.column(1).dictionary != r2.column(1).dictionary > sink = pa.BufferOutputStream() > writer = pa.RecordBatchStreamWriter(sink, schema) > writer.write(r1) > writer.write(r2) > serialized = BytesIO(sink.getvalue().to_pybytes()) > stream = pa.ipc.open_stream(serialized) > deserialized = [] > while True: > try: > deserialized.append(stream.read_next_batch()) > except StopIteration: > break > assert deserialized[1][1].to_pylist() == r2[1].to_pylist() > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)