Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/15818 )
Change subject: IMPALA-9512: Full ACID Milestone 2: Validate rows against the valid write id list ...................................................................... Patch Set 5: (3 comments) http://gerrit.cloudera.org:8080/#/c/15818/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15818/5//COMMIT_MSG@35 PS5, Line 35: but since Hive Streaming doesn't write : statistics we validate the write id per ORC row batch. > I think there is still a way to optimize this without statistics: I added a sentence, but I'm a bit skeptic about the performance gain of the proposed approach. Filling an array with the same values should be pretty fast on modern CPUs. And creating another OrcRowReader, OrcRowReaderOptions, single-element orc batches for each stripes would also have some overhead. So it's hard to tell without measurements, but I wouldn't complicate the code until we run into perf issues. http://gerrit.cloudera.org:8080/#/c/15818/5/be/src/exec/acid-metadata-utils.h File be/src/exec/acid-metadata-utils.h: http://gerrit.cloudera.org:8080/#/c/15818/5/be/src/exec/acid-metadata-utils.h@26 PS5, Line 26: ValidWriteIdList > optional: Some unit tests would be nice for this class as it hides some rea Added unit tests. A thrift ValidWriteIdList needs extra conversions between the thrift structure and the Java ValidWriteIdList. But I agree that it'd make the code a bit safer since we wouldn't depend on a plain string format. I'll add that in my next PS. http://gerrit.cloudera.org:8080/#/c/15818/3/be/src/exec/hdfs-orc-scanner.cc File be/src/exec/hdfs-orc-scanner.cc: http://gerrit.cloudera.org:8080/#/c/15818/3/be/src/exec/hdfs-orc-scanner.cc@517 PS3, Line 517: tch* row_batch) { > The handling of this case is still not clear to me. Good catch. Besides the current write id column we also selected the column with the "maximum column id" from the table. It's basically the last column, or the deepest child of the last column if last column is nested. Now we only select the current write id column. -- To view, visit http://gerrit.cloudera.org:8080/15818 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5ed74585a2d73ebbcee763b0545be4412926299d Gerrit-Change-Number: 15818 Gerrit-PatchSet: 5 Gerrit-Owner: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Thu, 07 May 2020 12:51:59 +0000 Gerrit-HasComments: Yes
