Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12097 )
Change subject: IMPALA-7980: Fix spinning because of buggy num_unqueued_files_. ...................................................................... Patch Set 3: (12 comments) Thanks for the reviews. Here's another iteration. The LZO stuff also needed handling, it turns out, so that's in progress. http://gerrit.cloudera.org:8080/#/c/12097/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12097/2//COMMIT_MSG@7 PS2, Line 7: IMPALA-7980: Fix spinning because of buggy num_unqueued_files_. > In theory some of the lines in this commit message should be shorter I tried to shorten some stuff. Didn't get all of it. http://gerrit.cloudera.org:8080/#/c/12097/2//COMMIT_MSG@21 PS2, Line 21: base-sequence-scanner.cc. > did you consider an approach like adding something like this to the top lev So, an HdfsScanNode can have multiple files, of different file formats. And then it uses static methods to call IssueInitialRanges(). So I wasn't able to shoe-horn in some object-oriented stuff in there. I didn't want to take on a bigger refactoring. Here's the relevant snippet from hdfs-scan-node-base.cc. for (const auto& entry : matching_per_type_files) { if (entry.second.empty()) continue; switch (entry.first) { case THdfsFileFormat::PARQUET: RETURN_IF_ERROR(HdfsParquetScanner::IssueInitialRanges(this, entry.second)); break; case THdfsFileFormat::TEXT: RETURN_IF_ERROR(HdfsTextScanner::IssueInitialRanges(this, entry.second)); break; case THdfsFileFormat::SEQUENCE_FILE: case THdfsFileFormat::RC_FILE: case THdfsFileFormat::AVRO: RETURN_IF_ERROR(BaseSequenceScanner::IssueInitialRanges(this, entry.second)); break; case THdfsFileFormat::ORC: RETURN_IF_ERROR(HdfsOrcScanner::IssueInitialRanges(this, entry.second)); break; default: DCHECK(false) << "Unexpected file type " << entry.first; } } http://gerrit.cloudera.org:8080/#/c/12097/2//COMMIT_MSG@38 PS2, Line 38: required ~30 threads to be spinning in the kernel. We were able to use > typo Done http://gerrit.cloudera.org:8080/#/c/12097/2//COMMIT_MSG@74 PS2, Line 74: I se > typo Done http://gerrit.cloudera.org:8080/#/c/12097/2/be/src/exec/hdfs-scan-node-base.h File be/src/exec/hdfs-scan-node-base.h: http://gerrit.cloudera.org:8080/#/c/12097/2/be/src/exec/hdfs-scan-node-base.h@266 PS2, Line 266: Must n > Must? Done http://gerrit.cloudera.org:8080/#/c/12097/2/be/src/exec/hdfs-scan-node-base.h@277 PS2, Line 277: threads > impala has plugins? This is related to the Impala lzo plugin. The task here is to change the signature in that repo, and then delete this signature. There's no compatibility guarantee that I have to keep around; I just wanted to avoid having to teach some test infrastructure about multiple branches. Anyway, I didn't see [[deprecated]] in use elsewhere, and this should be short-lived, so I'm going to ignore it at the moment. http://gerrit.cloudera.org:8080/#/c/12097/2/be/src/exec/hdfs-scan-node-base.h@287 PS2, Line 287: Tuple* InitTemplateTuple(const std::vector<ScalarExprEvaluator*>& value_evals, > can you DCHECK that the result doesn't go below zero? Done http://gerrit.cloudera.org:8080/#/c/12097/2/be/src/exec/hdfs-scan-node-base.h@471 PS2, Line 471: m a slot's path to its index into ma > yea, I was also confused by the inverted naming here. Sorry about the confusing naming. As you can imagine, I went back and forth between booleans and ints and stuff here. I went with "remaining_scan_range_issuances_". http://gerrit.cloudera.org:8080/#/c/12097/2/be/src/exec/hdfs-scan-node-base.cc File be/src/exec/hdfs-scan-node-base.cc: http://gerrit.cloudera.org:8080/#/c/12097/2/be/src/exec/hdfs-scan-node-base.cc@453 PS2, Line 453: > is it important to not decrement this in the error cases below on line 471- It has to be decremented *after* IssueInitialRanges(), since, if you don't, all the scanner threads might exit and you'll hang. I was trying to catch all the error cases and always decrement in those. I think the easiest way to do it is to wrap inside of another function, so I'll do that. http://gerrit.cloudera.org:8080/#/c/12097/2/be/src/exec/hdfs-scan-node.h File be/src/exec/hdfs-scan-node.h: http://gerrit.cloudera.org:8080/#/c/12097/2/be/src/exec/hdfs-scan-node.h@139 PS2, Line 139: : /// Number of times scanner thread didn't find work to do. : RuntimeProfile::Counter* scanner_thread_workless_loops_counter_ = nullptr; > hrm, I see the value of this for your test, but not entirely convinced it m I'm curious whether or not this loop is hot in practice at our customers, and this was the easiest way to tell. I've seen this be ~10 in the benchmarking I was doing, and it'll be useful to check up on it in more cases. I considered getting rid of this whole loop and using condition variables, but that's definitely a trickier thing to get right, and this would give us a sense of whether it's worth it. http://gerrit.cloudera.org:8080/#/c/12097/2/be/src/exec/hdfs-scan-node.cc File be/src/exec/hdfs-scan-node.cc: http://gerrit.cloudera.org:8080/#/c/12097/2/be/src/exec/hdfs-scan-node.cc@190 PS2, Line 190: return Status::OK(); > Can we assert in ~HdfsScanNode? Presumably all threads with active referenc Good idea. Trying it out. http://gerrit.cloudera.org:8080/#/c/12097/2/be/src/exec/hdfs-scan-node.cc@199 PS2, Line 199: // Typically, at this point, remaining_scan_range_issuances_ is > line too long (91 > 90) Done -- To view, visit http://gerrit.cloudera.org:8080/12097 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I133de13238d3d05c510e2ff771d48979125735b1 Gerrit-Change-Number: 12097 Gerrit-PatchSet: 3 Gerrit-Owner: Philip Zeyliger <phi...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Philip Zeyliger <phi...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-Comment-Date: Wed, 23 Jan 2019 18:00:54 +0000 Gerrit-HasComments: Yes