Matthew Jacobs has posted comments on this change.
Change subject: IMPALA-1473: Incorrect cardinality in exec summary for exchange
......................................................................
Patch Set 1:
(3 comments)
Thanks!
While this fixes the summary, I'm still seeing the wrong result in the
profile's coordinator RowsProduced. (I had mentioned that in the JIRA summary).
We should fix up these things at the same time since it's confusing if the info
doesn't make sense.
When I ran the query in the test I found:
Coordinator Fragment F02:(Total: 271.450ms, non-child: 2.134ms, %
non-child: 0.79%)
MemoryUsage(500.000ms): 16.06 KB, 20.08 KB
- AverageThreadTokens: 0.00
- BloomFilterBytes: 0
- PeakMemoryUsage: 20.08 KB (20560)
- PerHostPeakMemUsage: 0
- PrepareTime: 121.367us
- RowsProduced: 0 (0)
RowsProduced should be 5. Not sure why it's 0, but the
PlanFragmentExecutor::GetNext() calls ExchangeNode::GetNext() and then sets
COUNTER_ADD(rows_produced_counter_, row_batch_->num_rows());
Which it looks like isn't coming back or isn't being set properly by the
ExchangeNode for some reason.
http://gerrit.cloudera.org:8080/#/c/4679/1/tests/query_test/test_exec_summary.py
File tests/query_test/test_exec_summary.py:
PS1, Line 27: # Search through the query profile for the exec summary, of the
form:
: # Operator #Hosts Avg Time Max Time #Rows Est.
#Rows ...
: #
------------------------------------------------------------------- ...
: # 05:MERGING-EXCHANGE 1 82.547us 82.547us 5
5 ...
: # ...
: def _get_exec_summary(self, handle):
: profile = self.client.get_runtime_profile(handle)
: summary = ''
: in_summary = False
: for line in profile.split('\n'):
: if in_summary:
: if line.startswith('-----'):
: summary += line + '\n'
: else:
: split = line.split(':')
: if len(split) != 2 or not split[0].isdigit():
: return summary
: summary += line + '\n'
: elif line.startswith('Operator'):
: # Found the beginning of the exec summary.
: summary += line + '\n'
: in_summary = True
: assert False, 'Failed to find exec summary in query profile'
Thanks for adding a test. However, there's already logic to parse the summary
into a structured result, see impala_beeswax.py. Can you change the code to
leverage that?
PS1, Line 57: handle = self.execute_query_async(query, dict())
: self.impalad_test_service.wait_for_query_state(self.client,
handle,
: self.client.QUERY_STATES['FINISHED'], timeout=40)
: self.client.fetch(query, handle)
: self.close_query(handle)
There are 'execute' versions that handle waiting and closing for you. Take a
look in impala_test_suite. You should be able to find a way to call the
client's execute method (not the async version) and get back the result object
that has the parsed summary in it.
PS1, Line 64: for line in self._get_exec_summary(handle).split('\n'):
: if line.startswith('05:MERGING-EXCHANGE'):
: found = True
: columns = re.split(' +', line)
: assert columns[0] == '05:MERGING-EXCHANGE' # Operator
: assert columns[4] == '5' # #Rows
: assert columns[5] == '5' # Est. #Rows
: break
you should be able to avoid the string parsing yourself and use the structured
result from ImpalaBeeswaxClient
--
To view, visit http://gerrit.cloudera.org:8080/4679
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I386719370386c9cff09b8b35d15dc712dc6480aa
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <[email protected]>
Gerrit-Reviewer: Matthew Jacobs <[email protected]>
Gerrit-HasComments: Yes