Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/12097 )
Change subject: IMPALA-7980: Fix spinning because of buggy num_unqueued_files_. ...................................................................... Patch Set 5: Code-Review+1 (2 comments) This is much better than the num_unqueued_files_ code. http://gerrit.cloudera.org:8080/#/c/12097/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12097/5//COMMIT_MSG@85 PS5, Line 85: adn Nit: and http://gerrit.cloudera.org:8080/#/c/12097/5/be/src/exec/hdfs-scan-node-base.h File be/src/exec/hdfs-scan-node-base.h: http://gerrit.cloudera.org:8080/#/c/12097/5/be/src/exec/hdfs-scan-node-base.h@465 PS5, Line 465: remaining_scan_range_issuances_ > remaining_initial_scan_ranges_? Here is how it works (as I understand it, which might be incomplete): If we're reading Avro, we issue a scan range for the header. When that comes back, we issue scan ranges for the rest of the file. Each of those scan ranges can be processed by a different scanner thread. So, a 20GB file could be split up and processed in parallel with a single header read (per node). This is different from Parquet, because a Parquet file is typically a single block and handled by a single scanner thread (and we don't know the column ranges up front). A 20GB Parquet file with many row groups would issue the footer for each split. Each scanner thread would have its own footer and process its chunk of the file. I'm ok with the current variable name, but here are a couple that might work: remaining_scan_range_submissions_? future_scan_range_submissions_? I would prefer to avoid "initial" in the variable name. -- To view, visit http://gerrit.cloudera.org:8080/12097 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I133de13238d3d05c510e2ff771d48979125735b1 Gerrit-Change-Number: 12097 Gerrit-PatchSet: 5 Gerrit-Owner: Philip Zeyliger <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Philip Zeyliger <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Comment-Date: Tue, 29 Jan 2019 18:32:43 +0000 Gerrit-HasComments: Yes
