Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12097 )

Change subject: IMPALA-7980: Fix spinning because of buggy num_unqueued_files_.
......................................................................


Patch Set 5: Code-Review+1

(2 comments)

This is much better than the num_unqueued_files_ code.

http://gerrit.cloudera.org:8080/#/c/12097/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12097/5//COMMIT_MSG@85
PS5, Line 85: adn
Nit: and


http://gerrit.cloudera.org:8080/#/c/12097/5/be/src/exec/hdfs-scan-node-base.h
File be/src/exec/hdfs-scan-node-base.h:

http://gerrit.cloudera.org:8080/#/c/12097/5/be/src/exec/hdfs-scan-node-base.h@465
PS5, Line 465: remaining_scan_range_issuances_
> remaining_initial_scan_ranges_?
Here is how it works (as I understand it, which might be incomplete):
If we're reading Avro, we issue a scan range for the header. When that comes 
back, we issue scan ranges for the rest of the file. Each of those scan ranges 
can be processed by a different scanner thread. So, a 20GB file could be split 
up and processed in parallel with a single header read (per node).

This is different from Parquet, because a Parquet file is typically a single 
block and handled by a single scanner thread (and we don't know the column 
ranges up front). A 20GB Parquet file with many row groups would issue the 
footer for each split. Each scanner thread would have its own footer and 
process its chunk of the file.

I'm ok with the current variable name, but here are a couple that might work:

remaining_scan_range_submissions_?
future_scan_range_submissions_?

I would prefer to avoid "initial" in the variable name.



--
To view, visit http://gerrit.cloudera.org:8080/12097
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I133de13238d3d05c510e2ff771d48979125735b1
Gerrit-Change-Number: 12097
Gerrit-PatchSet: 5
Gerrit-Owner: Philip Zeyliger <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Michael Ho <[email protected]>
Gerrit-Reviewer: Philip Zeyliger <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Comment-Date: Tue, 29 Jan 2019 18:32:43 +0000
Gerrit-HasComments: Yes

Reply via email to