Sailesh Mukil has posted comments on this change. ( http://gerrit.cloudera.org:8080/9635 )
Change subject: IMPALA-6662: Make stress test resilient to hangs due to client crashes ...................................................................... Patch Set 2: (14 comments) Thanks for the review! http://gerrit.cloudera.org:8080/#/c/9635/2/tests/stress/concurrent_select.py File tests/stress/concurrent_select.py: http://gerrit.cloudera.org:8080/#/c/9635/2/tests/stress/concurrent_select.py@a268 PS2, Line 268: > Nit: Go ahead and leave extra blank line this in. flake8 prefers 2 spaces b Done http://gerrit.cloudera.org:8080/#/c/9635/2/tests/stress/concurrent_select.py@321 PS2, Line 321: acces > Nit: access Done http://gerrit.cloudera.org:8080/#/c/9635/2/tests/stress/concurrent_select.py@339 PS2, Line 339: _record_runner_metrics_before_kill > Can we have a more descriptive name? This happens after a process failed, b _record_runner_metrics_before_evict() seems like a much better name. I changed it to that. http://gerrit.cloudera.org:8080/#/c/9635/2/tests/stress/concurrent_select.py@368 PS2, Line 368: @property > Any reason you use the property decorator here but nowhere else? As we spoke, we don't require @property for this, it was just moved from the base patchset. I removed it now. http://gerrit.cloudera.org:8080/#/c/9635/2/tests/stress/concurrent_select.py@376 PS2, Line 376: num_running = total_started - total_finished > Maybe a nit, but can this exist outside the lock? That is, indented two spa Done http://gerrit.cloudera.org:8080/#/c/9635/2/tests/stress/concurrent_select.py@382 PS2, Line 382: total_dequeued = self._past_runners_num_queries_dequeued : for runner in self._query_runners: : total_dequeued += runner._num_queries_dequeued.value : return total_dequeued > return self._total_num_queries_dequed_no_lock() Done http://gerrit.cloudera.org:8080/#/c/9635/2/tests/stress/concurrent_select.py@387 PS2, Line 387: def _total_num_queries_dequeued_no_lock(self): : """ TODO: Get rid of this function after reformatting how we obtain query indices. : """ > I was confused about this function being dangerous until I saw that in its Done http://gerrit.cloudera.org:8080/#/c/9635/2/tests/stress/concurrent_select.py@647 PS2, Line 647: impalad > Cool, this param isn't needed anymore since you bind the query_runner to an Done http://gerrit.cloudera.org:8080/#/c/9635/2/tests/stress/concurrent_select.py@850 PS2, Line 850: if runner.proc.exitcode != 0: > While I believe queries that fail with error fail with some low positive in I print every runners exit code at L870. So if it does hit a SIGSEGV, it would return -11 on a nix system. Is that what you were looking for? http://gerrit.cloudera.org:8080/#/c/9635/2/tests/stress/concurrent_select.py@860 PS2, Line 860: assert runner._num_queries_started.value - \ : runner._num_queries_finished.value == 1 > I think this assert can be invalid if more than one runner has crashed. Let These values are on a per query runner basis (no '_total' prefix), so if 2 runners fail at the same time, each ones start minus finish has to at most be 1. http://gerrit.cloudera.org:8080/#/c/9635/2/tests/stress/concurrent_select.py@867 PS2, Line 867: assert runner._num_queries_started.value == runner._num_queries_finished.value > Same with this. This is the same explanation as my previous comment, so it should be fine. http://gerrit.cloudera.org:8080/#/c/9635/2/tests/stress/concurrent_select.py@877 PS2, Line 877: == True: > Remove. For Booleans, you can just use "if var". PEP-008 says to do it this Done http://gerrit.cloudera.org:8080/#/c/9635/2/tests/stress/concurrent_select.py@1078 PS2, Line 1078: def _update_from_query_report > Python nit: This is treated as a public method, so it shouldn't have the _ Done http://gerrit.cloudera.org:8080/#/c/9635/2/tests/stress/concurrent_select.py@1079 PS2, Line 1079: LOG.debug("Updating runtime stats") > Is there a simple way to distinguish this message across different QueryRun I print the Query Runners PID in some other places, so the PID could serve as a unique ID for a runner. So I added that to the print statement now. Let me know if you think it should be something else. -- To view, visit http://gerrit.cloudera.org:8080/9635 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I10c5dc9b8c2fffc471bac2279e348bc1d1fec3b7 Gerrit-Change-Number: 9635 Gerrit-PatchSet: 2 Gerrit-Owner: Sailesh Mukil <[email protected]> Gerrit-Reviewer: David Knupp <[email protected]> Gerrit-Reviewer: Michael Brown <[email protected]> Gerrit-Reviewer: Sailesh Mukil <[email protected]> Gerrit-Comment-Date: Thu, 15 Mar 2018 17:18:10 +0000 Gerrit-HasComments: Yes
