Andrew Lamb created ARROW-12411: ----------------------------------- Summary: [Rust] Add Builder interface for adding Arrays to record batches Key: ARROW-12411 URL: https://issues.apache.org/jira/browse/ARROW-12411 Project: Apache Arrow Issue Type: Improvement Components: Rust Reporter: Andrew Lamb Assignee: Andrew Lamb
Use case: While writing tests (both in IOx and in DataFusion) where I need a single `RecordBatch`, I often find myself doing something like this: ``` let schema = Arc::new(Schema::new(vec![ ArrowField::new("float_field", ArrowDataType::Float64, true), ArrowField::new("time", ArrowDataType::Int64, true), ])); let float_array: ArrayRef = Arc::new(Float64Array::from(vec![10.1, 20.1, 30.1, 40.1])); let timestamp_array: ArrayRef = Arc::new(Int64Array::from(vec![1000, 2000, 3000, 4000])); let batch = RecordBatch::try_new(schema, vec![float_array, timestamp_array]) .expect("created new record batch"); ``` This is annoying because the information that `float_field` is a float is encoded both in the Schema and the `Float64Array` I would much rather rather be able to construct RecordBatches a a builder style to avoid the the redundancy and reduce the amount of typing / redundancy: ``` let float_array: ArrayRef = Arc::new(Float64Array::from(vec![10.1, 20.1, 30.1, 40.1])); let timestamp_array: ArrayRef = Arc::new(Int64Array::from(vec![1000, 2000, 3000, 4000])); let batch = RecordBatch::empty() .append("float_field", timestamp_array).unwrap() .append("time", float_array).unwrap; ``` The proposal is to add a method to `RecordBatch` like ``` impl RecordBatch { ... fn append(self, field_name: &str, field_values: ArrayRef) -> Result<Self> } ``` That would append the a field name to the current schema, returning an error if field_name was already present. The nullability of the field would be set based on the actual null count of the field_values -- This message was sent by Atlassian Jira (v8.3.4#803005)