Ben,
to address your question, read my last post but to summarize, yes, there
is less overhead in memory to prefix keys than manage multiple Cfs EXCEPT
when doing map/reduce. Doing map/reduce, you will now have HUGE overhead
in reading a whole slew of rows you don't care about as you can't
Without putting too much thought into it...
Given the underlying architecture, I think you could/would have to write
your own partitioner, which would partition based on the prefix/virtual
keyspace.
-brian
---
Brian O'Neill
Lead Architect, Software Development
Health Market Science
The
Thanks for the idea but…(but please keep thinking on it)...
100% what we don't want since partitioned data resides on the same node.
I want to map/reduce the column families and leverage the parallel disks
:( :(
I am sure others would want to do the same…..We almost need a feature of
virtual
Agreed.
Do we know yet what the overhead is for each column family? What is the
limit?
If you have a SINGLE keyspace w/ 2+ CF's, what happens? Anyone know?
-brian
---
Brian O'Neill
Lead Architect, Software Development
Health Market Science
The Science of Better Results
2700 Horizon
Brian,
On Tue, Oct 2, 2012 at 2:20 PM, Brian O'Neill boneil...@gmail.com wrote:
Without putting too much thought into it...
Given the underlying architecture, I think you could/would have to write
your own partitioner, which would partition based on the prefix/virtual
keyspace.
I might be
Exactly.
---
Brian O'Neill
Lead Architect, Software Development
Health Market Science
The Science of Better Results
2700 Horizon Drive King of Prussia, PA 19406
M: 215.588.6024 @boneill42 http://www.twitter.com/boneill42
healthmarketscience.com
This information transmitted in this
On Tue, Oct 2, 2012 at 3:37 PM, Brian O'Neill boneil...@gmail.com wrote:
Exactly.
So you're back to the deliberation between using multiple CFs
(potentially with some known working upper bound*) or feeding your map
reduce in some other way (as you decided to do with Storm). In my
particular
Ben, Brian,
By the way, PlayOrm offers a NoSqlTypedSession that is different than
the ORM half of PlayOrm dealing in raw stuff that does indexing(so you can
do Scalable SQL on data that has no ORM on top of it). That is what we
use for our 1000's of CF's as we don't know the format of any of
Another option that may or may not work for you is the support in Cassandra
1.1+ to use a secondary index as an input to your mapreduce job. What you
might do is add a field to the column family that represents which virtual
column family that it is part of. Then when doing mapreduce jobs,
Jeremy,
On Tuesday, October 2, 2012 at 17:06, Jeremy Hanna wrote:
Another option that may or may not work for you is the support in Cassandra
1.1+ to use a secondary index as an input to your mapreduce job. What you
might do is add a field to the column family that represents which
@cassandra.apache.org
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Tuesday, October 2, 2012 11:18 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: 1000's of column families
Jeremy,
On Tuesday, October
Dean,
On Tuesday, October 2, 2012 at 18:52, Hiller, Dean wrote:
Because the data for an index is not all together(ie. Need a multi get to get
the data). It is not contiguous.
The prefix in a partition they keep the data so all data for a prefix from
what I understand is contiguous.
@cassandra.apache.org
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: 1000's of column families
Dean,
On Tuesday, October 2, 2012 at 18:52, Hiller, Dean wrote:
Because the data for an index is not all together(ie. Need a multi get to get
the data). It is not contiguous
@cassandra.apache.org
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: 1000's of column families
Dean,
On Tuesday, October 2, 2012 at 18:52, Hiller, Dean wrote:
Because the data for an index is not all together(ie. Need a multi get to get
the data). It is not contiguous
@cassandra.apache.org
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Thursday, September 27, 2012 8:01 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: 1000's of column families
Out
: 1000's of column families
Out of curiosity, is it really necessary to have that amount of CFs?
I am probably still used to relational databases, where you would use
a new
table just in case you need to store different kinds of data. As
Cassandra
stores anything in each CF, it might probably make
Brian,
On Mon, Oct 1, 2012 at 4:22 PM, Brian O'Neill b...@alumni.brown.edu wrote:
We haven't committed either way yet, but given Ed Anuff's presentation
on virtual keyspaces, we were leaning towards a single column family
approach:
Its just a convenient way of prefixing:
http://hector-client.github.com/hector/build/html/content/virtual_keyspaces.html
-brian
On Mon, Oct 1, 2012 at 4:22 PM, Ben Hood 0x6e6...@gmail.com wrote:
Brian,
On Mon, Oct 1, 2012 at 4:22 PM, Brian O'Neill b...@alumni.brown.edu wrote:
We haven't
On Mon, Oct 1, 2012 at 9:38 PM, Brian O'Neill b...@alumni.brown.edu wrote:
Its just a convenient way of prefixing:
http://hector-client.github.com/hector/build/html/content/virtual_keyspaces.html
So given that it is possible to use a CF per tenant, should we assume
that there at sufficient
@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: 1000's of column families
so if you add up all the applications
which would be huge and then all the tables which is large, it just keeps
growing. It is a very nice concept(all data in one location), though we
will see how implementing it goes
@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Thursday, September 27, 2012 11:52 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: 1000's of column families
so if you add up all the applications
which
@cassandra.apache.org
Date: Thursday, September 27, 2012 11:52 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: 1000's of column families
so if you add up all the applications
which would be huge and then all the tables
@cassandra.apache.org
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: 1000's of column families
Out of curiosity, is it really necessary to have that amount of CFs?
I am probably still used to relational databases, where you would use a new
table just in case you need
On Thu, Sep 27, 2012 at 12:13 AM, Hiller, Dean dean.hil...@nrel.gov wrote:
We are streaming data with 1 stream per 1 CF and we have 1000's of CF. When
using the tools they are all geared to analyzing ONE column family at a time
:(. If I remember correctly, Cassandra supports as many CF's as
Every CF adds some overhead (in memory) to each node. This is something you
should really keep in mind.
Best regards,
Robin Verlangen
*Software engineer*
*
*
W http://www.robinverlangen.nl
E ro...@us2.nl
http://goo.gl/Lt7BC
Disclaimer: The information contained in this message and attachments
Is there a non rhetorical question in there? Maybe is that a feature
request in disguise?
The question was basically, Is Cassandra ok with as many CF's as you want?
It sounds like it is not based on the email that every CF causes a bit
more RAM to be used though. So if cassandra is not ok with
Out of curiosity, is it really necessary to have that amount of CFs?
I am probably still used to relational databases, where you would use a new
table just in case you need to store different kinds of data. As Cassandra
stores anything in each CF, it might probably make sense to have a lot of
CFs
@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Thursday, September 27, 2012 8:01 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: 1000's of column families
Out of curiosity, is it really necessary to have
@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: 1000's of column families
Out of curiosity, is it really necessary to have that amount of CFs?
I am probably still used to relational databases, where you would use a
new table just in case you need to store different kinds of data. As
Cassandra
: 1000's of column families
Dean,
I was used, in the relational world, to use hibernate and O/R mapping.
There were times when I used 3 classes (2 inheriting from 1 another) and mapped
all of the to 1 table. The common part was in the super class and each sub
class had it's own columns
Hector also offers support for 'Virtual Keyspaces' which you might
want to look at.
On Thu, Sep 27, 2012 at 1:10 PM, Aaron Turner synfina...@gmail.com wrote:
On Thu, Sep 27, 2012 at 3:11 PM, Hiller, Dean dean.hil...@nrel.gov wrote:
We have 1000's of different building devices and we stream
On Thu, Sep 27, 2012 at 7:35 PM, Marcelo Elias Del Valle
mvall...@gmail.com wrote:
2012/9/27 Aaron Turner synfina...@gmail.com
How strict are your security requirements? If it wasn't for that,
you'd be much better off storing data on a per-statistic basis then
per-device. Hell, you could
Unfortunately, the security aspect is very strict. Some make their data
public but there are many projects where due to client contracts, they
cannot make their data public within our company(ie. Other groups in our
company are not allowed to see the data).
Also, currently, we have researchers
so if you add up all the applications
which would be huge and then all the tables which is large, it just keeps
growing. It is a very nice concept(all data in one location), though we
will see how implementing it goes.
This shouldn't be a real problem for Cassandra. Just add more nodes and
ever
We are streaming data with 1 stream per 1 CF and we have 1000's of CF. When
using the tools they are all geared to analyzing ONE column family at a time
:(. If I remember correctly, Cassandra supports as many CF's as you want,
correct? Even though I am going to have tons of funs with
35 matches
Mail list logo