Data aggregation -- help me design a solution

2012-08-21 Thread Oleg Dulin
Here are my requirements. We use Cassandra. I get millions of invoice line items into the system. As I load them I need to build up some data structures. * Invoice line items by invoice id (each line item has an invoice id on it ), with total dollar value * Invoice line items by customer

Re: Data aggregation -- help me design a solution

2012-08-21 Thread Milind Parikh
1. Assuming that the majorirty of the line items are new and 2. The lookup of an existing line-item will dictate the performance of the system because reads are slower than writes in C*. 3. Assuming that you are using counters in C* Therefore eliminate that problem by implementing a bloom

Re: Data aggregation -- help me design a solution

2012-08-21 Thread Guillermo Winkler
Oleg, If you have the aggregates in counters you only need to read the current counter when adding/removing invoice lines. In this situation you only need to be sure this sequence: + Read current counter value + Update current value according to newly created/updated lines Is done safely to