Prasanth Jayachandran created HIVE-19205: --------------------------------------------
Summary: Hive streaming ingest improvements (v2) Key: HIVE-19205 URL: https://issues.apache.org/jira/browse/HIVE-19205 Project: Hive Issue Type: Improvement Components: Streaming Affects Versions: 3.0.0, 3.1.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran This is umbrella jira to track hive streaming ingest improvements. At a high level following are the improvements - Support for dynamic partitioning - API changes (simple streaming connection builder) - Hide the transaction batches from clients (client can tune the transaction batch but doesn't have to know about the transaction batch size) - Support auto rollover to next transaction batch (clients don't have to worry about closing a transaction batch and opening a new one) - Record writers will all be strict meaning the schema of the record has to match table schema. This is to avoid the multiple serialization/deserialization for re-ordering columns if there is schema mismatch - Automatic distribution for non-bucketed tables so that compactor can have more parallelism - Create delta files with all ORC overhead disabled (no compression, no dictionary). Compactor will recreate the orc files with compression and dictionary encoding. - Automatic memory management via auto-flushing (will yield smaller stripes for delta files but is more scalable and clients don't have to worry about distributing the data across writers) - Support for more writers (Avro specifically. ORC passthrough format?) - Support to accept input stream instead of record byte[] - Removing HCatalog dependency (old streaming API will be in the hcatalog package for backward compatibility, new streaming API will be in its own hive module) -- This message was sent by Atlassian JIRA (v7.6.3#76005)