On Thu, Sep 27, 2012 at 3:11 PM, Hiller, Dean <dean.hil...@nrel.gov> wrote: > We have 1000's of different building devices and we stream data from these > devices. The format and data from each one varies so one device has > temperature at timeX with some other variables, another device has CO2 > percentage and other variables. Every device is unique and streams it's own > data. We dynamically discover devices and register them. Basically, one CF > or table per thing really makes sense in this environment. While we could > try to find out which devices "are" similar, this would really be a pain and > some devices add some new variable into the equation. NOT only that but > researchers can register new datasets and upload them as well and each > dataset they have they do NOT want to share with other researches necessarily > so we have security groups and each CF belongs to security groups. We > dynamically create CF's on the fly as people register new datasets. > > On top of that, when the data sets get too large, we probably want to > partition a single CF into time partitions. We could create one CF and put > all the data and have a partition per device, but then a time partition will > contain "multiple" devices of data meaning we need to shrink our time > partition size where if we have CF per device, the time partition can be > larger as it is only for that one device. > > THEN, on top of that, we have a meta CF for these devices so some people want > to query for streams that match criteria AND which returns a CF name and they > query that CF name so we almost need a query with variables like select > cfName from Meta where x = y and then select * from cfName where xxxxx. Which > we can do today.
How strict are your security requirements? If it wasn't for that, you'd be much better off storing data on a per-statistic basis then per-device. Hell, you could store everything in a single CF by using a composite row key: <devicename>|<stat type>|<instance> But yeah, there isn't a hard limit for the number of CF's, but there is overhead associated with each one and so I wouldn't consider your design as scalable. Generally speaking, hundreds are ok, but thousands is pushing it. -- Aaron Turner http://synfin.net/ Twitter: @synfinatic http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin "carpe diem quam minimum credula postero"