Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9635 )

Change subject: IMPALA-6662: Make stress test resilient to hangs due to client 
crashes
......................................................................


Patch Set 2:

(14 comments)

Thanks for the review!

http://gerrit.cloudera.org:8080/#/c/9635/2/tests/stress/concurrent_select.py
File tests/stress/concurrent_select.py:

http://gerrit.cloudera.org:8080/#/c/9635/2/tests/stress/concurrent_select.py@a268
PS2, Line 268:
> Nit: Go ahead and leave extra blank line this in. flake8 prefers 2 spaces b
Done


http://gerrit.cloudera.org:8080/#/c/9635/2/tests/stress/concurrent_select.py@321
PS2, Line 321: acces
> Nit: access
Done


http://gerrit.cloudera.org:8080/#/c/9635/2/tests/stress/concurrent_select.py@339
PS2, Line 339: _record_runner_metrics_before_kill
> Can we have a more descriptive name? This happens after a process failed, b
_record_runner_metrics_before_evict() seems like a much better name. I changed 
it to that.


http://gerrit.cloudera.org:8080/#/c/9635/2/tests/stress/concurrent_select.py@368
PS2, Line 368: @property
> Any reason you use the property decorator here but nowhere else?
As we spoke, we don't require @property for this, it was just moved from the 
base patchset. I removed it now.


http://gerrit.cloudera.org:8080/#/c/9635/2/tests/stress/concurrent_select.py@376
PS2, Line 376: num_running = total_started - total_finished
> Maybe a nit, but can this exist outside the lock? That is, indented two spa
Done


http://gerrit.cloudera.org:8080/#/c/9635/2/tests/stress/concurrent_select.py@382
PS2, Line 382:       total_dequeued = self._past_runners_num_queries_dequeued
             :       for runner in self._query_runners:
             :         total_dequeued += runner._num_queries_dequeued.value
             :     return total_dequeued
> return self._total_num_queries_dequed_no_lock()
Done


http://gerrit.cloudera.org:8080/#/c/9635/2/tests/stress/concurrent_select.py@387
PS2, Line 387:   def _total_num_queries_dequeued_no_lock(self):
             :     """ TODO: Get rid of this function after reformatting how we 
obtain query indices.
             :     """
> I was confused about this function being dangerous until I saw that in its
Done


http://gerrit.cloudera.org:8080/#/c/9635/2/tests/stress/concurrent_select.py@647
PS2, Line 647: impalad
> Cool, this param isn't needed anymore since you bind the query_runner to an
Done


http://gerrit.cloudera.org:8080/#/c/9635/2/tests/stress/concurrent_select.py@850
PS2, Line 850:             if runner.proc.exitcode != 0:
> While I believe queries that fail with error fail with some low positive in
I print every runners exit code at L870. So if it does hit a SIGSEGV, it would 
return -11 on a nix system. Is that what you were looking for?


http://gerrit.cloudera.org:8080/#/c/9635/2/tests/stress/concurrent_select.py@860
PS2, Line 860:                 assert runner._num_queries_started.value - \
             :                     runner._num_queries_finished.value == 1
> I think this assert can be invalid if more than one runner has crashed. Let
These values are on a per query runner basis (no '_total' prefix), so if 2 
runners fail at the same time, each ones start minus finish has to at most be 1.


http://gerrit.cloudera.org:8080/#/c/9635/2/tests/stress/concurrent_select.py@867
PS2, Line 867:             assert runner._num_queries_started.value == 
runner._num_queries_finished.value
> Same with this.
This is the same explanation as my previous comment, so it should be fine.


http://gerrit.cloudera.org:8080/#/c/9635/2/tests/stress/concurrent_select.py@877
PS2, Line 877:  == True:
> Remove. For Booleans, you can just use "if var". PEP-008 says to do it this
Done


http://gerrit.cloudera.org:8080/#/c/9635/2/tests/stress/concurrent_select.py@1078
PS2, Line 1078:   def _update_from_query_report
> Python nit: This is treated as a public method, so it shouldn't have the _
Done


http://gerrit.cloudera.org:8080/#/c/9635/2/tests/stress/concurrent_select.py@1079
PS2, Line 1079:     LOG.debug("Updating runtime stats")
> Is there a simple way to distinguish this message across different QueryRun
I print the Query Runners PID in some other places, so the PID could serve as a 
unique ID for a runner. So I added that to the print statement now. Let me know 
if you think it should be something else.



--
To view, visit http://gerrit.cloudera.org:8080/9635
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I10c5dc9b8c2fffc471bac2279e348bc1d1fec3b7
Gerrit-Change-Number: 9635
Gerrit-PatchSet: 2
Gerrit-Owner: Sailesh Mukil <[email protected]>
Gerrit-Reviewer: David Knupp <[email protected]>
Gerrit-Reviewer: Michael Brown <[email protected]>
Gerrit-Reviewer: Sailesh Mukil <[email protected]>
Gerrit-Comment-Date: Thu, 15 Mar 2018 17:18:10 +0000
Gerrit-HasComments: Yes

Reply via email to