[Impala-ASF-CR] IMPALA-1473: Incorrect cardinality in exec summary for exchange

Matthew Jacobs (Code Review) Tue, 11 Oct 2016 13:43:57 -0700

Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-1473: Incorrect cardinality in exec summary for exchange
......................................................................



Patch Set 1:

(3 comments)

Thanks!

While this fixes the summary, I'm still seeing the wrong result in the 
profile's coordinator RowsProduced. (I had mentioned that in the JIRA summary). 
We should fix up these things at the same time since it's confusing if the info 
doesn't make sense.

When I ran the query in the test I found:
    Coordinator Fragment F02:(Total: 271.450ms, non-child: 2.134ms, % 
non-child: 0.79%)
      MemoryUsage(500.000ms): 16.06 KB, 20.08 KB
       - AverageThreadTokens: 0.00 
       - BloomFilterBytes: 0
       - PeakMemoryUsage: 20.08 KB (20560)
       - PerHostPeakMemUsage: 0
       - PrepareTime: 121.367us
       - RowsProduced: 0 (0)


RowsProduced should be 5. Not sure why it's 0, but the 
PlanFragmentExecutor::GetNext() calls ExchangeNode::GetNext() and then sets

  COUNTER_ADD(rows_produced_counter_, row_batch_->num_rows());


Which it looks like isn't coming back or isn't being set properly by the 
ExchangeNode for some reason.

http://gerrit.cloudera.org:8080/#/c/4679/1/tests/query_test/test_exec_summary.py
File tests/query_test/test_exec_summary.py:

PS1, Line 27:   # Search through the query profile for the exec summary, of the 
form:
            :   # Operator            #Hosts  Avg Time  Max Time   #Rows   Est. 
#Rows  ...
            :   # 
-------------------------------------------------------------------  ...
            :   # 05:MERGING-EXCHANGE    1    82.547us  82.547us     5         
5       ...
            :   # ...
            :   def _get_exec_summary(self, handle):
            :     profile = self.client.get_runtime_profile(handle)
            :     summary = ''
            :     in_summary = False
            :     for line in profile.split('\n'):
            :       if in_summary:
            :         if line.startswith('-----'):
            :           summary += line + '\n'
            :         else:
            :           split = line.split(':')
            :           if len(split) != 2 or not split[0].isdigit():
            :             return summary
            :           summary += line + '\n'
            :       elif line.startswith('Operator'):
            :         # Found the beginning of the exec summary.
            :         summary += line + '\n'
            :         in_summary = True
            :     assert False, 'Failed to find exec summary in query profile'
Thanks for adding a test. However, there's already logic to parse the summary 
into a structured result, see impala_beeswax.py. Can you change the code to 
leverage that?


PS1, Line 57:     handle = self.execute_query_async(query, dict())
            :     self.impalad_test_service.wait_for_query_state(self.client, 
handle,
            :         self.client.QUERY_STATES['FINISHED'], timeout=40)
            :     self.client.fetch(query, handle)
            :     self.close_query(handle)
There are 'execute' versions that handle waiting and closing for you. Take a 
look in impala_test_suite. You should be able to find a way to call the 
client's execute method (not the async version) and get back the result object 
that has the parsed summary in it.


PS1, Line 64:     for line in self._get_exec_summary(handle).split('\n'):
            :       if line.startswith('05:MERGING-EXCHANGE'):
            :         found = True
            :         columns = re.split(' +', line)
            :         assert columns[0] == '05:MERGING-EXCHANGE' # Operator
            :         assert columns[4] == '5' # #Rows
            :         assert columns[5] == '5' # Est. #Rows
            :         break
you should be able to avoid the string parsing yourself and use the structured 
result from ImpalaBeeswaxClient


-- 
To view, visit http://gerrit.cloudera.org:8080/4679
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I386719370386c9cff09b8b35d15dc712dc6480aa
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <[email protected]>
Gerrit-Reviewer: Matthew Jacobs <[email protected]>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1473: Incorrect cardinality in exec summary for exchange

Reply via email to