Hector also offers support for 'Virtual Keyspaces' which you might
want to look at.


On Thu, Sep 27, 2012 at 1:10 PM, Aaron Turner <synfina...@gmail.com> wrote:
> On Thu, Sep 27, 2012 at 3:11 PM, Hiller, Dean <dean.hil...@nrel.gov> wrote:
>> We have 1000's of different building devices and we stream data from these 
>> devices.  The format and data from each one varies so one device has 
>> temperature at timeX with some other variables, another device has CO2 
>> percentage and other variables.  Every device is unique and streams it's own 
>> data.  We dynamically discover devices and register them.  Basically, one CF 
>> or table per thing really makes sense in this environment.  While we could 
>> try to find out which devices "are" similar, this would really be a pain and 
>> some devices add some new variable into the equation.  NOT only that but 
>> researchers can register new datasets and upload them as well and each 
>> dataset they have they do NOT want to share with other researches 
>> necessarily so we have security groups and each CF belongs to security 
>> groups.  We dynamically create CF's on the fly as people register new 
>> datasets.
>>
>> On top of that, when the data sets get too large, we probably want to 
>> partition a single CF into time partitions.  We could create one CF and put 
>> all the data and have a partition per device, but then a time partition will 
>> contain "multiple" devices of data meaning we need to shrink our time 
>> partition size where if we have CF per device, the time partition can be 
>> larger as it is only for that one device.
>>
>> THEN, on top of that, we have a meta CF for these devices so some people 
>> want to query for streams that match criteria AND which returns a CF name 
>> and they query that CF name so we almost need a query with variables like 
>> select cfName from Meta where x = y and then select * from cfName where 
>> xxxxx. Which we can do today.
>
> How strict are your security requirements?  If it wasn't for that,
> you'd be much better off storing data on a per-statistic basis then
> per-device.  Hell, you could store everything in a single CF by using
> a composite row key:
>
> <devicename>|<stat type>|<instance>
>
> But yeah, there isn't a hard limit for the number of CF's, but there
> is overhead associated with each one and so I wouldn't consider your
> design as scalable.  Generally speaking, hundreds are ok, but
> thousands is pushing it.
>
>
>
> --
> Aaron Turner
> http://synfin.net/         Twitter: @synfinatic
> http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & 
> Windows
> Those who would give up essential Liberty, to purchase a little temporary
> Safety, deserve neither Liberty nor Safety.
>     -- Benjamin Franklin
> "carpe diem quam minimum credula postero"

Reply via email to