Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8172 )
Change subject: IMPALA-5307: Part 4: copy out uncompressed text and seq ...................................................................... Patch Set 11: (5 comments) http://gerrit.cloudera.org:8080/#/c/8172/11/be/src/exec/hdfs-scanner-ir.cc File be/src/exec/hdfs-scanner-ir.cc: http://gerrit.cloudera.org:8080/#/c/8172/11/be/src/exec/hdfs-scanner-ir.cc@61 PS11, Line 61: return 0; > Is there a reason why we don't treat this as parse error and returns -1 ? No that's a good point. I changed it. http://gerrit.cloudera.org:8080/#/c/8172/11/be/src/exec/hdfs-scanner.h File be/src/exec/hdfs-scanner.h: http://gerrit.cloudera.org:8080/#/c/8172/11/be/src/exec/hdfs-scanner.h@347 PS11, Line 347: /// Returns -1 if parsing should be aborted due to parse errors. > May help to also add "Returns 0 if copying strings into 'pool' failed'. Clarified that it could be a memory allocation error. Per the other comment, it probably makes most sense to just return -1. http://gerrit.cloudera.org:8080/#/c/8172/11/be/src/exec/hdfs-sequence-scanner.cc File be/src/exec/hdfs-sequence-scanner.cc: http://gerrit.cloudera.org:8080/#/c/8172/11/be/src/exec/hdfs-sequence-scanner.cc@a282 PS11, Line 282: > It may be trivial but does it make sense to add DCHECK(row_batch->num_tuple The assumption of 1 tuple per row for scans is pretty deeply baked into the scanners and planner, so it seemed silly to have this speculative generality here. I added a DCHECK to HdfsScanner to document that the assumption is valid for all HDFS scanners. http://gerrit.cloudera.org:8080/#/c/8172/11/be/src/exec/hdfs-text-scanner.cc File be/src/exec/hdfs-text-scanner.cc: http://gerrit.cloudera.org:8080/#/c/8172/11/be/src/exec/hdfs-text-scanner.cc@849 PS11, Line 849: tuples_returned > There is a subtle behavior here if Tuple::CopyStrings() failed in WriteAlig Yeah I think I just made a mistake returning 0. Returning -1 makes more sense. I manually tested with this query: [localhost:21000] > set num_nodes=1; set disable_codegen=1; set mem_limit=5m; select * from widetable_1000_cols; NUM_NODES set to 1 DISABLE_CODEGEN set to 1 MEM_LIMIT set to 5m Query: select * from widetable_1000_cols Query submitted at: 2017-11-07 11:52:01 (Coordinator: http://tarmstrong-box:25000) Query progress can be monitored at: http://tarmstrong-box:25000/query_plan?query_id=9948285e1aa8e172:f73fe97100000000 ERROR: Memory limit exceeded: HdfsScanner::WriteAlignedTuples() failed to allocate 125 bytes for strings. HDFS_SCAN_NODE (id=0) could not allocate 125.00 B without exceeding limit. Error occurred on backend tarmstrong-box:22000 by fragment 9948285e1aa8e172:f73fe97100000000 Memory left in process limit: 8.01 GB Memory left in query limit: 947.59 KB Query(9948285e1aa8e172:f73fe97100000000): Limit=5.00 MB Reservation=0 ReservationLimit=0 OtherMemory=4.07 MB Total=4.07 MB Peak=4.07 MB Fragment 9948285e1aa8e172:f73fe97100000000: Reservation=0 OtherMemory=4.07 MB Total=4.07 MB Peak=4.07 MB HDFS_SCAN_NODE (id=0): Total=4.07 MB Peak=4.07 MB PLAN_ROOT_SINK: Total=0 Peak=0 It would be nice to turn it into an end-to-end test but I cant see how to make it robustly fail in that place. http://gerrit.cloudera.org:8080/#/c/8172/11/be/src/exec/hdfs-text-scanner.cc@857 PS11, Line 857: num_tuples > Why is this not tuples_returned here ? Is there any guarantee max_added_tup I think num_fields is tracking the number of fields consumed from the input. -- To view, visit http://gerrit.cloudera.org:8080/8172 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I304fd002b61bfedf41c8b1405cd7eb7b492bb941 Gerrit-Change-Number: 8172 Gerrit-PatchSet: 11 Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Comment-Date: Tue, 07 Nov 2017 19:54:01 +0000 Gerrit-HasComments: Yes