On Thu, Sep 27, 2012 at 3:11 PM, Hiller, Dean <dean.hil...@nrel.gov> wrote:
> We have 1000's of different building devices and we stream data from these 
> devices.  The format and data from each one varies so one device has 
> temperature at timeX with some other variables, another device has CO2 
> percentage and other variables.  Every device is unique and streams it's own 
> data.  We dynamically discover devices and register them.  Basically, one CF 
> or table per thing really makes sense in this environment.  While we could 
> try to find out which devices "are" similar, this would really be a pain and 
> some devices add some new variable into the equation.  NOT only that but 
> researchers can register new datasets and upload them as well and each 
> dataset they have they do NOT want to share with other researches necessarily 
> so we have security groups and each CF belongs to security groups.  We 
> dynamically create CF's on the fly as people register new datasets.
>
> On top of that, when the data sets get too large, we probably want to 
> partition a single CF into time partitions.  We could create one CF and put 
> all the data and have a partition per device, but then a time partition will 
> contain "multiple" devices of data meaning we need to shrink our time 
> partition size where if we have CF per device, the time partition can be 
> larger as it is only for that one device.
>
> THEN, on top of that, we have a meta CF for these devices so some people want 
> to query for streams that match criteria AND which returns a CF name and they 
> query that CF name so we almost need a query with variables like select 
> cfName from Meta where x = y and then select * from cfName where xxxxx. Which 
> we can do today.

How strict are your security requirements?  If it wasn't for that,
you'd be much better off storing data on a per-statistic basis then
per-device.  Hell, you could store everything in a single CF by using
a composite row key:

<devicename>|<stat type>|<instance>

But yeah, there isn't a hard limit for the number of CF's, but there
is overhead associated with each one and so I wouldn't consider your
design as scalable.  Generally speaking, hundreds are ok, but
thousands is pushing it.



-- 
Aaron Turner
http://synfin.net/         Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
    -- Benjamin Franklin
"carpe diem quam minimum credula postero"

Reply via email to