[ https://issues.apache.org/jira/browse/IMPALA-9515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zoltán Borók-Nagy resolved IMPALA-9515. --------------------------------------- Resolution: Fixed > Milestone 3: Reading “original files” > ------------------------------------- > > Key: IMPALA-9515 > URL: https://issues.apache.org/jira/browse/IMPALA-9515 > Project: IMPALA > Issue Type: Sub-task > Reporter: Zoltán Borók-Nagy > Assignee: Zoltán Borók-Nagy > Priority: Major > Labels: impala-acid > > “Original files” don’t store special ACID columns, that means we need to > auto-generate those values. Actually we only need to auto-generate the record > id: (originalTransaction, bucket, rowId). > * originalTransaction: can be parsed from the containing directory > ** If it’s the table root directory then originalTransaction is 0 > * Bucket: it’s the bit-packed value of (bucket codec version, bucket id, and > statement id) > ** Bucket codec version is 1 > ** Bucket id can be parsed from the filename > ** Statement id can be parsed from the delta directory: > *** delta_<min_writeid>_<max_writeid>_<statement_id> > *** (min_writeid = max_writeid for original files) > * rowId: zero-based for each bucket, if there are multiple files in a single > bucket: > ** List all the files belonging to the bucket > ** First file’s first row id is 0 > ** Next file’s first row id is the row count of the first file > ** And so on > The frontend should generate the base record ID for each file and propagate > that information to the scanners. Therefore the scanners would know if they > are scanning files in full ACID format or raw format. The ORC scanner needs > to be changed in order to generate and fill the ACID columns for original > files. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org