> So userspace throttling is probably the answer? I believe so.
> Is the normal way of > doing this to go through the JMX interface from a userspace program, > and hold off on inserts until the values fall below a given threshold? > If so, that's going to be a pain, since most of my system is > currently using python :) I don't know what the normal way is or what people have done with cassandra in production. What I have tended to do personally and in general (not cassandra specifik) is to do domain specific rate limiting as required whenever I do batch jobs / bulk reads/writes. Regardless of whether your database is cassandra, postgresql or anything else - throwing writes (or reads for that matter) at the database at maximum possible speed tends to have adverse effects on latency on other normal traffic. Only during offline batch operations where latency of other traffic is irrelevant, do I ever go "all in" and throw traffic at a database at full speed. That said, often simple measures like "write with a single sequential writer subject to RTT of RPC requests" is sufficient to rate limit pretty well in practice. But of course that depends on the nature of the writes and how expensive they are relative to RTT and/or RPC. FWIW, whenever I have needed a hard "maximum of X per second" rate limit I have implemented or re-used a rate limiter (e.g. a token bucket) for the language in question and used it in my client code. -- / Peter Schuller
