Hector also offers support for 'Virtual Keyspaces' which you might want to look at.
On Thu, Sep 27, 2012 at 1:10 PM, Aaron Turner <synfina...@gmail.com> wrote: > On Thu, Sep 27, 2012 at 3:11 PM, Hiller, Dean <dean.hil...@nrel.gov> wrote: >> We have 1000's of different building devices and we stream data from these >> devices. The format and data from each one varies so one device has >> temperature at timeX with some other variables, another device has CO2 >> percentage and other variables. Every device is unique and streams it's own >> data. We dynamically discover devices and register them. Basically, one CF >> or table per thing really makes sense in this environment. While we could >> try to find out which devices "are" similar, this would really be a pain and >> some devices add some new variable into the equation. NOT only that but >> researchers can register new datasets and upload them as well and each >> dataset they have they do NOT want to share with other researches >> necessarily so we have security groups and each CF belongs to security >> groups. We dynamically create CF's on the fly as people register new >> datasets. >> >> On top of that, when the data sets get too large, we probably want to >> partition a single CF into time partitions. We could create one CF and put >> all the data and have a partition per device, but then a time partition will >> contain "multiple" devices of data meaning we need to shrink our time >> partition size where if we have CF per device, the time partition can be >> larger as it is only for that one device. >> >> THEN, on top of that, we have a meta CF for these devices so some people >> want to query for streams that match criteria AND which returns a CF name >> and they query that CF name so we almost need a query with variables like >> select cfName from Meta where x = y and then select * from cfName where >> xxxxx. Which we can do today. > > How strict are your security requirements? If it wasn't for that, > you'd be much better off storing data on a per-statistic basis then > per-device. Hell, you could store everything in a single CF by using > a composite row key: > > <devicename>|<stat type>|<instance> > > But yeah, there isn't a hard limit for the number of CF's, but there > is overhead associated with each one and so I wouldn't consider your > design as scalable. Generally speaking, hundreds are ok, but > thousands is pushing it. > > > > -- > Aaron Turner > http://synfin.net/ Twitter: @synfinatic > http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & > Windows > Those who would give up essential Liberty, to purchase a little temporary > Safety, deserve neither Liberty nor Safety. > -- Benjamin Franklin > "carpe diem quam minimum credula postero"