I would say that it's mostly a performance issue, tied to memory
management, but the main problem is that a large number of tables invites a
whole host of clluster management difficulties that require... expert
attention, which then means you need an expert to maintain and enhance it.
Cassandra sc
Hi Jack,
When I mean entries, I meant rows. Each column family has about 200 columns.
> Disabling of slab allocation is an expert-only feature - its use is
generally an anti-pattern, not recommended.
I understand this and have seen this recommendation at several places. I
want to understand the c
By entries, do you mean rows or columns? Please clarify how many columns
each of your tables has, and how many rows you are populating for each
table.
In case I didn't make it clear earlier, limit yourself to "low hundreds"
(like 250) of tables and you should be fine. Thousands of tables is a clea
any ideas or advises?
On Mon, Jun 22, 2015 at 10:55 AM, Arun Chaitanya
wrote:
> Hello All,
>
> Now we settled on the following approach. I want to know if there are any
> problems that you foresee in the production environment.
>
> Our Approach: Use Off Heap Memory
>
> Modifications to def
Hello All,
Now we settled on the following approach. I want to know if there are any
problems that you foresee in the production environment.
Our Approach: Use Off Heap Memory
Modifications to default cassandra.yaml and cassandra-env.sh
^^
> > I strongly advise against this approach.
> Jon, I think so too. But so you actually foresee any problems with this
> approach?
> I can think of a few. [I want to evaluate if we can live with this problem]
Just to be clear, I’m not saying this is a great approach, I AM saying that it
may be be
> Sorry for this naive question but how important is this tuning? Can this
have a huge impact in production?
Massive. Here's a graph of when we did some JVM tuning at my previous
company:
http://33.media.tumblr.com/5d0efca7288dc969c1ac4fc3d36e0151/tumblr_inline_mzvj254quj1rd24f4.png
About an or
Thanks Jon and Jack,
> I strongly advise against this approach.
Jon, I think so too. But so you actually foresee any problems with this
approach?
I can think of a few. [I want to evaluate if we can live with this problem]
- No more CQL.
- No data types, everything needs to be a blob.
- L
How big is each of the tables - are they all fairly small or fairly large?
Small as in no more than thousands of rows or large as in tens of millions
or hundreds of millions of rows?
Small tables are are not ideal for a Cassandra cluster since the rows would
be spread out across the nodes, even th
While Graham's suggestion will let you collapse a bunch of tables into a
single one, it'll likely result in so many other problems it won't be worth
the effort. I strongly advise against this approach.
First off, different workloads need different tuning. Compaction
strategies, gc_grace_seconds,
Depending on your use case and data types (for example if you can have a
minimally
Nested Json representation of the objects;
Than you could go with a common map representation where keys
are top love object fields and values are valid Json literals as strings; eg
unquoted primitives, quoted str
Hello Jack,
> Column families? As opposed to tables? Are you using Thrift instead of
CQL3? You should be focusing on the latter, not the former.
We have an ORM developed in our company, which maps each DTO to a column
family. So, we have many column families. We are using CQL3.
> But either way,
Scalability of Cassandra refers primarily to number of rows and number of
nodes - to add more data, add more nodes.
Column families? As opposed to tables? Are you using Thrift instead of
CQL3? You should be focusing on the latter, not the former.
But either way, the general guidance is that there
Hello Graham,
> Are the CFs different, or all the same schema?
The column families are different. May be with better data modelling, we
can combine a few of them.
> Are you contractually obligated to actually separate data into separate
CFs?
No. Its just that we have several sub systems(around 10
Are the CFs different, or all the same schema? Are you contractually obligated
to actually separate data into separate CFs? It seems like you’d have a lot
simpler time if you could use the part of the partition key to separate data.
Note also, I don’t know what disks you are using, but disk cach
Good Day Everyone,
I am very happy with the (almost) linear scalability offered by C*. We had
a lot of problems with RDBMS.
But, I heard that C* has a limit on number of column families that can be
created in a single cluster.
The reason being each CF stores 1-2 MB on the JVM heap.
In our use ca
16 matches
Mail list logo