mathyingzhou opened a new pull request #8648: URL: https://github.com/apache/arrow/pull/8648
This pull request tracks the progress on adding ORC write support. The functionality is not complete yet. However for most types the process of populating a ColumnVectorBatch in ORC using data from Arrow Array. Arrow data types (arrow::Type::type) I do support: Boolean: BOOL Numerical: INT8, INT16, INT32, INT64, FLOAT, DOUBLE Time-related: DATE32 Binary: BINARY, STRING, LARGE_BINARY, LARGE_STRING, FIXED_SIZE_BINARY Nested: LIST, LARGE_LIST, FIXED_SIZE_LIST, STRUCT, MAP, DENSE_UNION, SPARSE_UNION Arrow data types I plan to support: Numerical: DECIMAL128 Time-related: DATE64, TIMESTAMP Dictionary: DICTIONARY Arrow data types I currently do NOT plan to support: Numerical: UINT8, UINT16, UINT32, UINT64, HALF_FLOAT, DECIMAL256 (There are no corresponding types in ORC. Of course except for in the case of DECIMAL256 we can always cast them into larger types. However I think maybe users need to explicitly do that.) Time-related: TIME32, TIME64, INTERVAL_MONTHS, INTERVAL_DAY_TIME, DURATION (There are no corresponding types in ORC and it is impossible to cast them into ORC types without losing time-related information) Extension: EXTENSION ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org