mathyingzhou opened a new pull request #8648:
URL: https://github.com/apache/arrow/pull/8648


   This pull request tracks the progress on adding ORC write support. The 
functionality is not complete yet. However for most types the process of 
populating a ColumnVectorBatch in ORC using data from Arrow Array.
   
   Arrow data types (arrow::Type::type) I do support:
   Boolean: BOOL
   Numerical: INT8, INT16, INT32, INT64, FLOAT, DOUBLE
   Time-related: DATE32
   Binary: BINARY, STRING, LARGE_BINARY, LARGE_STRING, FIXED_SIZE_BINARY
   Nested: LIST, LARGE_LIST, FIXED_SIZE_LIST, STRUCT, MAP, DENSE_UNION, 
SPARSE_UNION
   
   Arrow data types I plan to support:
   Numerical: DECIMAL128
   Time-related: DATE64, TIMESTAMP
   Dictionary: DICTIONARY
   
   Arrow data types I currently do NOT plan to support:
   Numerical: UINT8, UINT16, UINT32, UINT64, HALF_FLOAT, DECIMAL256 (There are 
no corresponding types in ORC. Of course except for in the case of DECIMAL256 
we can always cast them into larger types. However I think maybe users need to 
explicitly do that.)
   Time-related: TIME32, TIME64, INTERVAL_MONTHS, INTERVAL_DAY_TIME, DURATION 
(There are no corresponding types in ORC and it is impossible to cast them into 
ORC types without losing time-related information)
   Extension: EXTENSION 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to