Currently, I'm not sure you can really reduce those dependencies. But we do
plan on reducing that ultimately. Basically the reason we have anything
thrift related in there is that so far we depends on the full Cassandra
jar. However, we'll pull out the classes uses by the native transport in
their
> From what I know having too much data on one node is bad, not really sure
> why, but I think that performance will go down due to the size of indexes
> and bloom filters (I may be wrong on the reasons but I'm quite sure you can't
> store too much data per node).
If you have many hundreds of
What version are you on ?
> but we are finding a secondary index is performing slow
Not sure what you mean here.
> Are secondary indexes concurrent or single threaded?
Rebuilding a secondary index (via node tool) is a single threaded operation,
but *all* indexes specified on the command lin
Hi Alexandru,
"We are running a 3 node Cassandra 1.1.5 cluster with a 3TB Raid 0 disk per
node for the data dir and separate disk for the commitlog, 12 cores, 24 GB
RAM"
I think you should tune your architecture in a very different way. From
what I know having too much data on one node is bad, no
We are importing data from one column family into a second column family via
"nodetool refresh" but we are finding a secondary index is performing slow and
the machine CPU is pretty much idle. We are trying to bulk load data as fast as
possible.
Are secondary indexes concurrent or single thread
Hello everyone,
We are running a 3 node Cassandra 1.1.5 cluster with a 3TB Raid 0 disk per
node for the data dir and separate disk for the commitlog, 12 cores, 24 GB
RAM (12GB to Cassandra heap).
We now have 1.1 TB worth of data per node (RF = 2).
Our data input is between 20 to 30 GB per day, d