Cassandra is being used on a large scale at Uber. We usually create
dedicated clusters for each of our internal use cases, however that is
difficult to scale and manage.
We are investigating the approach of using a single shared cluster with
100s of nodes and handle 10s to 100s of different use ca
really... well that's good to know. it still almost never works though. i
guess every time I've seen it it must have timed out due to tombstones.
On 17 Feb. 2017 22:06, "Sylvain Lebresne" wrote:
On Fri, Feb 17, 2017 at 11:54 AM, kurt greaves wrote:
> if you want a reliable count, you should us
typically when I've seen that gossip issue it requires more than just
restarting the affected node to fix. if you're not getting query related
errors in the server log you should start looking at what is being queried.
are the queries that time out each day the same?
what's the Owns % for the relevant keyspace from nodetool status?
I can't say that I have tried that while the issue is going on, but I have
done such rolling restarts for sure, and the timeouts still occur every
day. What would a rolling restart do to fix the issue?
In fact, as I write this, I am restarting each node one by one in the
eu-west-1 datacenter, and
+1 for using spark for counts.
On Feb 17, 2017 4:25 PM, "kurt greaves" wrote:
> if you want a reliable count, you should use spark. performing a count (*)
> will inevitably fail unless you make your server read timeouts and
> tombstone fail thresholds ridiculous
>
> On 17 Feb. 2017 04:34, "Jan"
Hi,
We faced this issue too.
You could try with reduced paging size, so that tombstone threshold isn't
breached.
try using "paging 500" in cqlsh
[ https://docs.datastax.com/en/cql/3.3/cql/cql_reference/cqlshPaging.html ]
Similarly paging size could be set in java driver as well
This is a work ar
On Fri, Feb 17, 2017 at 11:54 AM, kurt greaves wrote:
> if you want a reliable count, you should use spark. performing a count (*)
> will inevitably fail unless you make your server read timeouts and
> tombstone fail thresholds ridiculous
>
That's just not true. count(*) is paged internally so w
have you tried a rolling restart of the entire DC?
Btw:
They break incremental repair if you use CDC: https://issues.apache.
org/jira/browse/CASSANDRA-12888
Not only when using CDC! You shouldn't use incremental repairs with MVs.
Never (right now).
2017-02-16 17:42 GMT+01:00 Jonathan Haddad :
> My advice to avoid them is based on the issues th
Hi Nate,
See here dstat results:
https://gist.github.com/brstgt/216c662b525a9c5b653bbcd8da5b3fcb
Network volume does not correspond to Disk IO, not even close.
@heterogenous vnode count:
I did this to test how load behaves on a new server class we ordered for
CS. The new nodes had much faster CPU
if you want a reliable count, you should use spark. performing a count (*)
will inevitably fail unless you make your server read timeouts and
tombstone fail thresholds ridiculous
On 17 Feb. 2017 04:34, "Jan" wrote:
> Hi,
>
> could you post the output of nodetool cfstats for the table?
>
> Cheers
12 matches
Mail list logo