Main problem that this sweet spot is very narrow. We can't have lots
of CF, we can't have long rows and we end up with enormous amount of
huge composite row keys and stored metadata about that keys (keep in
mind overhead on such scheme, but looks like that nobody really cares
about it
@cassandra.apache.org
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Wednesday, October 10, 2012 3:37 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: 1000's of CF's.
Main problem that this sweet
I'm not a Cassandra dev, so take what I say with a lot of salt, but
AFAICT, there is a certain amount of overhead in maintaining a CF, so
when you have large numbers of CFs, this adds up. From a layperson's
perspective, this observation sounds reasonable, since zero-cost CFs
would be tantamount to
So what solution should be for cassandra architecture when we need to make
Hadoop M\R jobs and not be restricted by number of CF?
What we have now is fair amount of CFs ( 2K) and this number is slowly
growing so we already planing to merge partitioned CFs. But our next goal is to
run hadoop
Okay, so it only took me two solid days not a week. PlayOrm in master branch
now supports virtual CF's or virtual tables in ONE CF, so you can have 1000's
or millions of virtual CF's in one CF now. It works with all the Scalable-SQL,
works with the joins, and works with the PlayOrm command
So basically, with moving towards the 1000's of CF all being put in one
CF, our performance is going to tank on map/reduce, correct? I mean, from
what I remember we could do map/reduce on a single CF, but by stuffing
1000's of virtual Cf's into one CF, our map/reduce will have to read in
all 999
being put in one
CF, our performance is going to tank on map/reduce, correct? I mean, from
what I remember we could do map/reduce on a single CF, but by stuffing
1000's of virtual Cf's into one CF, our map/reduce will have to read in
all 999 virtual CF's rows that we don't want just to map/reduce
node doing a findAll query into the
partitions it is responsible for. In this way, I think we can 1000's of
virtual CF's inside ONE CF and then PlayOrm does it's query and retrieves
the rows for that partition of one virtual CF.
Anyone know of a computer grid we can dish out work to? That would
into the
partitions it is responsible for. In this way, I think we can 1000's of
virtual CF's inside ONE CF and then PlayOrm does it's query and retrieves
the rows for that partition of one virtual CF.
Anyone know of a computer grid we can dish out work to? That would be my
only missing piece (well