They can be, but I would assume that if your Cassandra data model is
inefficient for the kind of queries you want to do, Spark won't magically
take that way.

For example, say you have a users table. Each user has a country, which
isn't a partitioning key or clustering key.

If you wanted to calculate the number of all users from a particular query,
there's no way to do that in the previous data model other than to do a
full table scan and count the users from that country.

Spark can do this full table scan for you and return the number of records.
May be it can spread the work across multiple servers. But it can't reduce
the amount of work that has to be done.

Otoh, if you were okay with creating a new table in which the country is
part of the primary key, and for each user that signed up, you created a
record in this user_by_country table, then it would be a very fast query to
look up the users in a particular country, as country is then the primary
key.



On Sun, Oct 23, 2016 at 2:18 PM, Welly Tambunan <if05...@gmail.com> wrote:

> I like muti data centre resillience in cassandra.
>
> I think thats plus one for cassandra.
>
> Ali, complex analytics can be done in spark right?
>
> On 23 Oct 2016 4:08 p.m., "Ali Akhtar" <ali.rac...@gmail.com> wrote:
>
> >
>
> > I would say it depends on your use case.
> >
> > If you need a lot of queries that require joins, or complex analytics of
> the kind that Cassandra isn't suited for, then HDFS / HBase may be better.
> >
> > If you can work with the cassandra way of doing things (creating new
> tables for each query you'll need to do, duplicating data - doing extra
> writes for faster reads) , then Cassandra should work for you. It is easier
> to setup and do dev ops with, in my experience.
> >
> > On Sun, Oct 23, 2016 at 2:05 PM, Welly Tambunan <if05...@gmail.com>
> wrote:
>
> >>
>
> >> I mean. HDFS and HBase.
> >>
> >> On Sun, Oct 23, 2016 at 4:00 PM, Ali Akhtar <ali.rac...@gmail.com>
> wrote:
>
> >>>
>
> >>> By Hadoop do you mean HDFS?
> >>>
> >>>
> >>>
> >>> On Sun, Oct 23, 2016 at 1:56 PM, Welly Tambunan <if05...@gmail.com>
> wrote:
>
> >>>>
>
> >>>> Hi All,
> >>>>
> >>>> I read the following comparison between hadoop and cassandra. Seems
> the conclusion that we use hadoop for data lake ( cold data ) and Cassandra
> for hot data (real time data).
> >>>>
> >>>> http://www.datastax.com/nosql-databases/nosql-cassandra-and-hadoop
> <http://www.datastax.com/nosql-databases/nosql-cassandra-and-hadoop>
> >>>>
> >>>> My question is, can we just use cassandra to rule them all ?
> >>>>
> >>>> What we are trying to achieve is to minimize the moving part on our
> system.
> >>>>
> >>>> Any response would be really appreciated.
> >>>>
> >>>>
> >>>> Cheers
> >>>>
> >>>> --
> >>>> Welly Tambunan
> >>>> Triplelands
> >>>>
> >>>> http://weltam.wordpress.com <http://weltam.wordpress.com>
> >>>> http://www.triplelands.com <http://www.triplelands.com/blog/>
> >>>
> >>>
> >>
> >>
> >>
> >> --
> >> Welly Tambunan
> >> Triplelands
> >>
> >> http://weltam.wordpress.com <http://weltam.wordpress.com>
> >> http://www.triplelands.com <http://www.triplelands.com/blog/>
> >
> >
>

Reply via email to