If you can reproduce the invalid behavior 10+% of the time with steps to repro that take 5-10s/iteration, that sounds extremely interesting for getting to the bottom of the invalid shard issue (if that's what the root cause ends up being). Would be very interested in the set up to see if the behavior can be duplicated.
Andrew On Tue, Jun 25, 2013 at 2:18 PM, Robert Coli <rc...@eventbrite.com> wrote: > On Mon, Jun 24, 2013 at 6:42 PM, Josh Dzielak <j...@keen.io> wrote: > > There is only 1 thread running this sequence, and consistency levels are > set > > to ALL. The behavior is fairly repeatable - the unexpectation mutation > will > > happen at least 10% of the time I run this program, but at different > points. > > When it does not go awry, I can run this loop many thousands of times and > > keep the counter exact. But if it starts happening to a specific counter, > > the counter will never "recover" and will continue to maintain it's > > incorrect value even after successful subsequent writes. > > Sounds like a corrupt counter shard. Hard to understand how it can > happen at ALL. If I were you I would file a JIRA including your repro > path... > > =Rob >