Cassandra users,

I have a 4 node Cassandra cluster set up.  All nodes are in a single rack
and distribution center.  I have a loader program which loads 40 million
rows into a table in a keyspace with a replication factor of 3.
Immediately after inserting the rows (after the loader program finishes),
if I SELECT count(*) from the table, the result is less than 40 million.
If I run our dumper program to retrieve all rows, it is less than 40
million.  However, if I wait roughly 20 minutes, the count eventually
reaches 40 million rows and the dumper program returns all 40 million.

If I do the same thing in a keyspace where the replication factor is 1, I
don't have any "stabilization" time and the 40 million rows are immediately
available.

I've modified the loading and dumping programs to use both the Thrift Java
driver and the CQL Java driver and neither seems to make a difference.

I'm very new to Cassandra and my questions are, what may be causing this
delay in all rows being available and how might I lessen/eliminate this
delay?

Thanks,
Greg

Reply via email to