Adar Dembo has posted comments on this change.
Change subject: KUDU-1773: remove overly strict DCHECKs
Patch Set 1:
> Can you think of a new stress test we could add that would trigger
> this? It's a shame that we found this issue via the Impala
> end-to-end stress tests and don't have our own coverage of this
> race scenario.
I'm tempted to punt on this.
It would be useful, but after discussion with MJ it seems like the Impala
stress test that uncovered this issue is...far more stressful than any of ours.
It uses 8 nodes with an average of 300-600 replicas per node, issues many
queries to multiple tables concurrently, and deals with tables up to hundreds
of millions of rows in size. On top of that, the test is already dealing with
some unexplained timeouts pausing reactor threads for dozens of seconds at a
time; if those timeouts are happening client-side, it could certainly explain
the interleaving necessary to tease out this race.
To be clear, it's not impossible to write a stress test for this race. The
client workload would need to be performing writes with many retries, and be
doing "cold" tablet lookups often. On top of that, the servers would need to be
cycling through replicas quickly, evicting and replacing them as fast as
possible. I think it'd be a fair amount of work to implement that, for
relatively little gain (these crashes were debug-only, after all).
To view, visit http://gerrit.cloudera.org:8080/5292
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Owner: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mpe...@apache.org>
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>