Kyle Kavanagh created ARROW-12676: ------------------------------------- Summary: RecordBatchBuilder with uint dictionary creates signed int Batch Key: ARROW-12676 URL: https://issues.apache.org/jira/browse/ARROW-12676 Project: Apache Arrow Issue Type: New Feature Components: C++ Affects Versions: 3.0.0 Reporter: Kyle Kavanagh
When a RecordBatchBuilder with a dictionary type w/ a uint32 index is flushed to a batch, the resulting batch contains a int32 index: {code:java} BatchBuilder schema after flush: Symbol: dictionary<values=string, indices=int16, ordered=0> Status: dictionary<values=string, indices=uint32, ordered=0>{code} {code:java} Batch schema after flush: Symbol: dictionary<values=string, indices=int16, ordered=0> Status: dictionary<values=string, indices=int32, ordered=0> {code} from: {code:java} std::shared_ptr<arrow::RecordBatch> batch; auto status = batchBuilder_>Flush(&batch); std::cout<<"BatchBuilder schema after flush: "<<batchBuilder_->schema()->ToString()<<std::endl; std::cout<<"Batch schema after flush: "<<batch->schema()->ToString()<<std::endl; if(!status.ok()) { throw Exception("Arrow batch flush failed: {}", status); }{code} This results in a failure to write: "Invalid: Tried to write record batch with different schema" I believe this is related to https://issues.apache.org/jira/browse/ARROW-9969 and in particular, this bit: [https://github.com/apache/arrow/blob/master/cpp/src/arrow/table_builder.cc#L72] Is the dictionary->Equals comparison checking the signed-ness of the indices? -- This message was sent by Atlassian Jira (v8.3.4#803005)