MapReduce and Hadoop generally are pluggable so you can do queries over HDFS, 
over HBase, or over Cassandra.  Cassandra has good Hadoop support as outlined 
here:  If you're looking for a 
simpler solution, there is DataStax's enterprise product which makes 
configuration and operations much easier for doing both realtime and analytic 
queries -

We currently use CDH3 with Cassandra 0.8 and it works fine.

In bocca al lupo,


On Feb 17, 2012, at 6:42 AM, Alessio Cecchi wrote:

> Hi,
> we have developed a software that store logs from mail servers in MySQL, but 
> for huge enviroments we are developing a version that store this data in 
> HBase. Raw logs are, once a day, first normalized, so the output is like this:
> username,date of login, IP Address, protocol
> username,date of login, IP Address, protocol
> username,date of login, IP Address, protocol
> [...]
> and after inserted into the database.
> As I was saying, for huge installation (from 1 to 10 million of logins per 
> day, keep for 12 months) we are working with HBase, but I would also consider 
> Cassandra.
> The advantage of HBase is MapReduce which makes searching the logs very fast 
> by splitting the "query" concurrently on multiple hosts.
> Query will be launched from a web interface (will be few requests per day) 
> and the search keys are user and time range.
> But Cassandra seems less complex to manage and simply to run, so I want to 
> evaluate it instead of HBase.
> My question is, can also Cassandra split a "query" over the cluster like 
> MapReduce? Reading on-line Cassandra seems fast in insert data but slower 
> than HBase to "query". Is it really so?
> We want not install Hadoop over Cassandra.
> Any suggestion is welcome :-)
> -- 
> Alessio Cecchi is:
> @ ILS ->
> on LinkedIn ->
> Assistenza Sistemi GNU/Linux ->
> @ PLUG ->  ex-Presidente, adesso senatore a vita,
> @ LOLUG ->  Socio

Reply via email to