Thanks Ryan, could you please share more details: according to what you observed in testing, why was performance worse if you do not do extra buffering?
I was thinking (could be wrong) that without extra buffering, the counter update goes to Memtable.putIfPresent() and CounterColumn.resolve(), which are still in-memory operations, and thus would not be so bad ? Yang On Mon, May 23, 2011 at 11:54 AM, Ryan King <r...@twitter.com> wrote: > On Sun, May 22, 2011 at 11:00 AM, Yang <teddyyyy...@gmail.com> wrote: >> Thanks, >> >> I did read through that pdf doc, and went through the counters code in >> 0.8-rc2, I think I understand the logic in that code. >> >> in my hypothetical implementation, I am not suggesting to overstep the >> complicated logic in counters code, since the extra module will still >> need to enter the increment through StorageProxy.mutate( >> My_counter.delta=1 ) , so that the logical clock is still handled by >> the Counters code. >> >> the only difference is, as you said, >> that rainbird collapses many +1 deltas. but my claim is that in fact >> this "collapsing" is already done by cassandra since the write always >> hit the memtable first, >> so collapsing in Cassandra memtable vs collapsing in rainbird memory >> takes the same time, while rainbird introduces an extra level of >> caching (I am strongly suspecting that rainbird is vulnerable to >> losing up to 1minute's worth of data , if the rainbird dies before the >> writes are flushed to cassandra ---- unless it does implement its own >> commit log, but that is kind of re-implementing many of the wheels in >> Cassandra ....) > > Right, Rainbird buffers for performance and can lose up to 1 minute of data. > >> I thought at one time probably the reason was because that from one >> given url, rainbird needs to create writes on many keys, so that they >> keys need to go to different >> Cassandra nodes. but later I found that this can also be done in a >> module on the coordinator, since the client request first hits a >> coordinator, instead of the data node, in fact, in a multi-insert >> case, the coordinator already sends the request to multiple data >> nodes. the extra module I am proposing simply translates a single >> insert into multi-insert, and then cassandra takes over from there >> >> >> Thanks >> Yang >> >> On Sun, May 22, 2011 at 3:47 AM, aaron morton <aa...@thelastpickle.com> >> wrote: >>> The implementation of distributed counters is more complicated than your >>> example, there is a design doc attached to the ticket >>> here https://issues.apache.org/jira/browse/CASSANDRA-1072 >>> By collapsing some of those +1 increments together at the application level >>> there is less work for the cluster to do. This can be important when the >>> numbers are big http://blog.twitter.com/2011/03/numbers.html >>> Cheers >>> ----------------- >>> Aaron Morton >>> Freelance Cassandra Developer >>> @aaronmorton >>> http://www.thelastpickle.com >>> On 21 May 2011, at 09:04, Yang wrote: >>> >>> (sorry if Rainbird is not a topic relevant enough, I'd appreciate if >>> someone could point me to a more appropriate venue in that case) >>> >>> >>> Rainbird buffers up 1 minute worth of events first before writing to >>> Cassandra. >>> >>> it seems that this extra layer of buffering is repetitive, and could >>> be avoided : Cassandra's memtable already does buffering, whose >>> internal implementation is just >>> Map.put(key, CF ) , I guess rainbird does similar things : >>> column_to_count = map.get(key); column_to_count++ ; map.put(key, >>> column_to_count) ?? >>> the "++" part is probably already done by the Distributed Counters in >>> Cassandra. >>> then I guess Rainbird layer exists because it needs to parse an >>> incoming event into various attributes that it is interested in: for >>> example from an url, we bump up the counts of >>> FQDN , domain, path etc, Rainbird does the transformation from >>> url--->3 attrs. >>> >>> but I guess that transformation might as well be done in the cassandra >>> JVM itself, if we could provide some hooks, so that a module >>> translates incoming request into >>> multiple keys, and bump up their counts. that way we avoid the >>> intermediate communication from clients to rainbird, and rainbird to >>> Cassandra. are there some points I'm missing? >>> >>> Thanks >>> Yang >>> >>> >> >