Tim Armstrong has posted comments on this change. Change subject: IMPALA-5776: Write partial tuple to the correct mempool ......................................................................
Patch Set 2: (6 comments) This is already easier to understand but I'm hoping we can avoid adding yet another MemPool to the scanners, particularly when the difference from boundary_pool_ isn't clear. http://gerrit.cloudera.org:8080/#/c/7639/2/be/src/exec/hdfs-text-scanner.cc File be/src/exec/hdfs-text-scanner.cc: Line 169: row_batch->tuple_data_pool()->AcquireData(boundary_pool_.get(), false); > Actually, after running some tests with ASAN, it turns out that it's not ok It seemed like the way data in boundary_pool_ is handled is inconsistent - the code is inconsistent whether it should be transferred or not, particularly since it makes some effort to copy out data from boundary_column_ already. I made some comments on the latest PS. http://gerrit.cloudera.org:8080/#/c/7639/4/be/src/exec/hdfs-text-scanner.cc File be/src/exec/hdfs-text-scanner.cc: PS4, Line 317: still need to handl Comment needs update http://gerrit.cloudera.org:8080/#/c/7639/4/be/src/exec/hdfs-text-scanner.h File be/src/exec/hdfs-text-scanner.h: Line 194: /// Mem pool for boundary_row_ and boundary_column_. I still don't understand the memory lifetime for boundary_row_, boundary_col_ and this pool. It seems like it should be the same as partial_tuple_pool_. I.e memory is allocated from it only when processing a row straddling blocks, and the memory can be safely freed after we've finished materializing that row. Line 199: /// the line as erroneous in case of parsing errors. Can you clarify whether this data will be referenced by the output batches. Line 202: /// Helper string for dealing with columns that span file blocks. Same here. It looks like some attempt is made to copy data out of this column in CopyBoundaryField(), maybe it's just missing a case where it isn't copied out. This case looks suspicious: // Missing columns or row delimiter at end of the file is ok, fill the row in. char* col = boundary_column_.buffer(); int num_fields = 0; RETURN_IF_ERROR(delimited_text_parser_->FillColumns<true>(boundary_column_.len(), &col, &num_fields, field_locations_.data())); Line 230: /// for boundary tuples. Can you explain the lifetime of the memory? I.e. we allocate memory from this pool -- To view, visit http://gerrit.cloudera.org:8080/7639 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I60ba5c113aefd17f697c1888fd46a237ef396540 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Taras Bobrovytsky <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
