Though DSE cassandra comes with hadoop integration, this is clearly is use case 
for hadoop. 
Any reason why cassandra is your first choice?



> On 23 Jul 2015, at 6:12 a.m., Pierre Devops <pierredev...@gmail.com> wrote:
> 
> Cassandra is not very good at massive read/bulk read if you need to retrieve 
> and compute a large amount of data on multiple machines using something like 
> spark or hadoop (or you'll need to hack and process the sstable directly, 
> something which is not "natively" supported, you'll have to hack your way)
> 
> However, it's very good to store and retrieve them once they have been 
> processed and sorted. That's why I would opt for solution 2) or for another 
> solution which process data before inserting them in cassandra, and doesn't 
> use cassandra as a temporary store.
> 
> 2015-07-23 2:04 GMT+02:00 Renato Perini <renato.per...@gmail.com>:
>> Problem: Log analytics.
>> 
>> Solutions:
>>        1) Aggregating logs using Flume and storing the aggregations into 
>> Cassandra. Spark reads data from Cassandra, make some computations
>> and write the results in distinct tables, still in Cassandra.
>>        2) Aggregating logs using Flume to a sink, streaming data directly 
>> into Spark. Spark make some computations and store the results in Cassandra.
>>        3) *** your solution ***
>> 
>> Which is the best workflow for this task?
>> I would like to setup something flexible enough to allow me to use batch 
>> processing and realtime streaming without major fuss.
>> 
>> Thank you in advance.
> 

Reply via email to