Adar Dembo has posted comments on this change.

Change subject: KUDU-1773: remove overly strict DCHECKs
......................................................................


Patch Set 1:

> Can you think of a new stress test we could add that would trigger
 > this? It's a shame that we found this issue via the Impala
 > end-to-end stress tests and don't have our own coverage of this
 > race scenario.

I'm tempted to punt on this.

It would be useful, but after discussion with MJ it seems like the Impala 
stress test that uncovered this issue is...far more stressful than any of ours. 
It uses 8 nodes with an average of 300-600 replicas per node, issues many 
queries to multiple tables concurrently, and deals with tables up to hundreds 
of millions of rows in size. On top of that, the test is already dealing with 
some unexplained timeouts pausing reactor threads for dozens of seconds at a 
time; if those timeouts are happening client-side, it could certainly explain 
the interleaving necessary to tease out this race.

To be clear, it's not impossible to write a stress test for this race. The 
client workload would need to be performing writes with many retries, and be 
doing "cold" tablet lookups often. On top of that, the servers would need to be 
cycling through replicas quickly, evicting and replacing them as fast as 
possible. I think it'd be a fair amount of work to implement that, for 
relatively little gain (these crashes were debug-only, after all).

-- 
To view, visit http://gerrit.cloudera.org:8080/5292
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I01c1dde99cce1f43e6a9864c1ff6f7aaad448a77
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mpe...@apache.org>
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-HasComments: No

Reply via email to