Re: 1000's of column families

2012-10-02 Thread Hiller, Dean
Ben, to address your question, read my last post but to summarize, yes, there is less overhead in memory to prefix keys than manage multiple Cfs EXCEPT when doing map/reduce. Doing map/reduce, you will now have HUGE overhead in reading a whole slew of rows you don't care about as you can't

Re: 1000's of column families

2012-10-02 Thread Brian O'Neill
Without putting too much thought into it... Given the underlying architecture, I think you could/would have to write your own partitioner, which would partition based on the prefix/virtual keyspace. -brian --- Brian O'Neill Lead Architect, Software Development Health Market Science The

Re: 1000's of column families

2012-10-02 Thread Hiller, Dean
Thanks for the idea but…(but please keep thinking on it)... 100% what we don't want since partitioned data resides on the same node. I want to map/reduce the column families and leverage the parallel disks :( :( I am sure others would want to do the same…..We almost need a feature of virtual

Re: 1000's of column families

2012-10-02 Thread Brian O'Neill
Agreed. Do we know yet what the overhead is for each column family? What is the limit? If you have a SINGLE keyspace w/ 2+ CF's, what happens? Anyone know? -brian --- Brian O'Neill Lead Architect, Software Development Health Market Science The Science of Better Results 2700 Horizon

Re: 1000's of column families

2012-10-02 Thread Ben Hood
Brian, On Tue, Oct 2, 2012 at 2:20 PM, Brian O'Neill boneil...@gmail.com wrote: Without putting too much thought into it... Given the underlying architecture, I think you could/would have to write your own partitioner, which would partition based on the prefix/virtual keyspace. I might be

Re: 1000's of column families

2012-10-02 Thread Brian O'Neill
Exactly. --- Brian O'Neill Lead Architect, Software Development Health Market Science The Science of Better Results 2700 Horizon Drive € King of Prussia, PA € 19406 M: 215.588.6024 € @boneill42 http://www.twitter.com/boneill42 € healthmarketscience.com This information transmitted in this

Re: 1000's of column families

2012-10-02 Thread Ben Hood
On Tue, Oct 2, 2012 at 3:37 PM, Brian O'Neill boneil...@gmail.com wrote: Exactly. So you're back to the deliberation between using multiple CFs (potentially with some known working upper bound*) or feeding your map reduce in some other way (as you decided to do with Storm). In my particular

Re: 1000's of column families

2012-10-02 Thread Hiller, Dean
Ben, Brian, By the way, PlayOrm offers a NoSqlTypedSession that is different than the ORM half of PlayOrm dealing in raw stuff that does indexing(so you can do Scalable SQL on data that has no ORM on top of it). That is what we use for our 1000's of CF's as we don't know the format of any of

Re: 1000's of column families

2012-10-02 Thread Jeremy Hanna
Another option that may or may not work for you is the support in Cassandra 1.1+ to use a secondary index as an input to your mapreduce job. What you might do is add a field to the column family that represents which virtual column family that it is part of. Then when doing mapreduce jobs,

Re: 1000's of column families

2012-10-02 Thread Ben Hood
Jeremy, On Tuesday, October 2, 2012 at 17:06, Jeremy Hanna wrote: Another option that may or may not work for you is the support in Cassandra 1.1+ to use a secondary index as an input to your mapreduce job. What you might do is add a field to the column family that represents which

Re: 1000's of column families

2012-10-02 Thread Hiller, Dean
@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Tuesday, October 2, 2012 11:18 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: 1000's of column families Jeremy, On Tuesday, October

Re: 1000's of column families

2012-10-02 Thread Ben Hood
Dean, On Tuesday, October 2, 2012 at 18:52, Hiller, Dean wrote: Because the data for an index is not all together(ie. Need a multi get to get the data). It is not contiguous. The prefix in a partition they keep the data so all data for a prefix from what I understand is contiguous.

Re: 1000's of column families

2012-10-02 Thread Hiller, Dean
@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: 1000's of column families Dean, On Tuesday, October 2, 2012 at 18:52, Hiller, Dean wrote: Because the data for an index is not all together(ie. Need a multi get to get the data). It is not contiguous

Re: 1000's of column families

2012-10-02 Thread Jeremy Hanna
@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: 1000's of column families Dean, On Tuesday, October 2, 2012 at 18:52, Hiller, Dean wrote: Because the data for an index is not all together(ie. Need a multi get to get the data). It is not contiguous

Re: 1000's of column families

2012-10-01 Thread Brian O'Neill
@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Thursday, September 27, 2012 8:01 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: 1000's of column families Out

Re: 1000's of column families

2012-10-01 Thread Hiller, Dean
: 1000's of column families Out of curiosity, is it really necessary to have that amount of CFs? I am probably still used to relational databases, where you would use a new table just in case you need to store different kinds of data. As Cassandra stores anything in each CF, it might probably make

Re: 1000's of column families

2012-10-01 Thread Ben Hood
Brian, On Mon, Oct 1, 2012 at 4:22 PM, Brian O'Neill b...@alumni.brown.edu wrote: We haven't committed either way yet, but given Ed Anuff's presentation on virtual keyspaces, we were leaning towards a single column family approach:

Re: 1000's of column families

2012-10-01 Thread Brian O'Neill
Its just a convenient way of prefixing: http://hector-client.github.com/hector/build/html/content/virtual_keyspaces.html -brian On Mon, Oct 1, 2012 at 4:22 PM, Ben Hood 0x6e6...@gmail.com wrote: Brian, On Mon, Oct 1, 2012 at 4:22 PM, Brian O'Neill b...@alumni.brown.edu wrote: We haven't

Re: 1000's of column families

2012-10-01 Thread Ben Hood
On Mon, Oct 1, 2012 at 9:38 PM, Brian O'Neill b...@alumni.brown.edu wrote: Its just a convenient way of prefixing: http://hector-client.github.com/hector/build/html/content/virtual_keyspaces.html So given that it is possible to use a CF per tenant, should we assume that there at sufficient

Re: 1000's of column families

2012-09-28 Thread Hiller, Dean
@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: 1000's of column families so if you add up all the applications which would be huge and then all the tables which is large, it just keeps growing. It is a very nice concept(all data in one location), though we will see how implementing it goes

Re: 1000's of column families

2012-09-28 Thread Robin Verlangen
@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Thursday, September 27, 2012 11:52 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: 1000's of column families so if you add up all the applications which

Re: 1000's of column families

2012-09-28 Thread Aaron Turner
@cassandra.apache.org Date: Thursday, September 27, 2012 11:52 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: 1000's of column families so if you add up all the applications which would be huge and then all the tables

Re: 1000's of column families

2012-09-28 Thread Flavio Baronti
@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: 1000's of column families Out of curiosity, is it really necessary to have that amount of CFs? I am probably still used to relational databases, where you would use a new table just in case you need

Re: 1000's of column families

2012-09-27 Thread Sylvain Lebresne
On Thu, Sep 27, 2012 at 12:13 AM, Hiller, Dean dean.hil...@nrel.gov wrote: We are streaming data with 1 stream per 1 CF and we have 1000's of CF. When using the tools they are all geared to analyzing ONE column family at a time :(. If I remember correctly, Cassandra supports as many CF's as

Re: 1000's of column families

2012-09-27 Thread Robin Verlangen
Every CF adds some overhead (in memory) to each node. This is something you should really keep in mind. Best regards, Robin Verlangen *Software engineer* * * W http://www.robinverlangen.nl E ro...@us2.nl http://goo.gl/Lt7BC Disclaimer: The information contained in this message and attachments

Re: 1000's of column families

2012-09-27 Thread Hiller, Dean
Is there a non rhetorical question in there? Maybe is that a feature request in disguise? The question was basically, Is Cassandra ok with as many CF's as you want? It sounds like it is not based on the email that every CF causes a bit more RAM to be used though. So if cassandra is not ok with

Re: 1000's of column families

2012-09-27 Thread Marcelo Elias Del Valle
Out of curiosity, is it really necessary to have that amount of CFs? I am probably still used to relational databases, where you would use a new table just in case you need to store different kinds of data. As Cassandra stores anything in each CF, it might probably make sense to have a lot of CFs

Re: 1000's of column families

2012-09-27 Thread Hiller, Dean
@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Thursday, September 27, 2012 8:01 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: 1000's of column families Out of curiosity, is it really necessary to have

Re: 1000's of column families

2012-09-27 Thread Marcelo Elias Del Valle
@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: 1000's of column families Out of curiosity, is it really necessary to have that amount of CFs? I am probably still used to relational databases, where you would use a new table just in case you need to store different kinds of data. As Cassandra

Re: 1000's of column families

2012-09-27 Thread Hiller, Dean
: 1000's of column families Dean, I was used, in the relational world, to use hibernate and O/R mapping. There were times when I used 3 classes (2 inheriting from 1 another) and mapped all of the to 1 table. The common part was in the super class and each sub class had it's own columns

Re: 1000's of column families

2012-09-27 Thread Edward Capriolo
Hector also offers support for 'Virtual Keyspaces' which you might want to look at. On Thu, Sep 27, 2012 at 1:10 PM, Aaron Turner synfina...@gmail.com wrote: On Thu, Sep 27, 2012 at 3:11 PM, Hiller, Dean dean.hil...@nrel.gov wrote: We have 1000's of different building devices and we stream

Re: 1000's of column families

2012-09-27 Thread Aaron Turner
On Thu, Sep 27, 2012 at 7:35 PM, Marcelo Elias Del Valle mvall...@gmail.com wrote: 2012/9/27 Aaron Turner synfina...@gmail.com How strict are your security requirements? If it wasn't for that, you'd be much better off storing data on a per-statistic basis then per-device. Hell, you could

Re: 1000's of column families

2012-09-27 Thread Hiller, Dean
Unfortunately, the security aspect is very strict. Some make their data public but there are many projects where due to client contracts, they cannot make their data public within our company(ie. Other groups in our company are not allowed to see the data). Also, currently, we have researchers

Re: 1000's of column families

2012-09-27 Thread Robin Verlangen
so if you add up all the applications which would be huge and then all the tables which is large, it just keeps growing. It is a very nice concept(all data in one location), though we will see how implementing it goes. This shouldn't be a real problem for Cassandra. Just add more nodes and ever

1000's of column families

2012-09-26 Thread Hiller, Dean
We are streaming data with 1 stream per 1 CF and we have 1000's of CF. When using the tools they are all geared to analyzing ONE column family at a time :(. If I remember correctly, Cassandra supports as many CF's as you want, correct? Even though I am going to have tons of funs with