Any better solution to avoid TombstoneOverwhelmingException?

2014-06-30 Thread Jason Tang
Our application will use Cassandra to persistent for asynchronous tasks, so
in one time period, lots of records will be created in Cassandra (more then
10M). Later it will be executed.

Due to disk space limitation, the executed records will be deleted.
After gc_grace_seconds, it is expected to be auto removed from the disk.

So for the next round of execution, the deleted records, should not be
queried out.

In this traffic, it will be generated lots of tombstones.

To avoid TombstoneOverwhelmingException, One way is to larger
tombstone_failure_threshold, but is there any impact for the system's
performance on my traffic model, or is there any better solution for this
traffic?


BRs
//Tang


Re: Any better solution to avoid TombstoneOverwhelmingException?

2014-06-30 Thread Jason Tang
The traffic is continuously, which means when insert new records, at the
same time, old records are executed (deleted)

And the execution are based on time condition, so some stored records will
be executed (deleted), some will be executed in the next round.

For given TTL, it is same as delete, it will also generate the Tombstone.


2014-06-30 15:58 GMT+08:00 DuyHai Doan doanduy...@gmail.com:

 Why don't you store all current data into one partition and for the next
 round of execution, switch to a new partition ? This way you don't even
 need to remove data (if you insert with a given TTL)


 On Mon, Jun 30, 2014 at 8:43 AM, Jason Tang ares.t...@gmail.com wrote:

 Our application will use Cassandra to persistent for asynchronous tasks,
 so in one time period, lots of records will be created in Cassandra (more
 then 10M). Later it will be executed.

 Due to disk space limitation, the executed records will be deleted.
 After gc_grace_seconds, it is expected to be auto removed from the disk.

 So for the next round of execution, the deleted records, should not be
 queried out.

 In this traffic, it will be generated lots of tombstones.

 To avoid TombstoneOverwhelmingException, One way is to larger
 tombstone_failure_threshold, but is there any impact for the system's
 performance on my traffic model, or is there any better solution for this
 traffic?


 BRs
 //Tang





Re: heap issues - looking for advices on gc tuning

2013-10-30 Thread Jason Tang
What's configuration of following parameters
memtable_flush_queue_size:
concurrent_compactors:


2013/10/30 Piavlo lolitus...@gmail.com

 Hi,

 Below I try to give a full picture to the problem I'm facing.

 This is a 12 node cluster, running on ec2 with m2.xlarge instances (17G
 ram , 2 cpus).
 Cassandra version is 1.0.8
 Cluster normally having between 3000 - 1500 reads per second (depends on
 time of the day) and 1700 - 800 writes per second- according to Opscetner.
 RF=3, now row caches are used.

 Memory relevant  configs from cassandra.yaml:
 flush_largest_memtables_at: 0.85
 reduce_cache_sizes_at: 0.90
 reduce_cache_capacity_to: 0.75
 commitlog_total_space_in_mb: 4096

 relevant JVM options used are:
 -Xms8000M -Xmx8000M -Xmn400M
 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
 -XX:MaxTenuringThreshold=1
 -XX:**CMSInitiatingOccupancyFraction**=80 -XX:+**
 UseCMSInitiatingOccupancyOnly

 Now what happens is that with these settings after cassandra process
 restart, the GC it working fine at the beginning, and heap used looks like a
 saw with perfect teeth, eventually the teeth size start to diminish until
 the teeth become not noticable, and then cassandra starts to spend lot's of
 CPU time
 doing gc. It takes about 2 weeks until for such cycle , and then I need to
 restart cassandra process to improve performance.
 During all this time there are no memory  related messages in cassandra
 system.log, except a GC for ParNew: little above 200ms once in a while.

 Things i've already done trying to reduce this eventual heap pressure.
 1) reducing bloom_filter_fp_chance  resulting in reduction from ~700MB to
 ~280MB total per node based on all Filter.db files on the node.
 2) reducing key cache sizes, and dropping key_caches for CFs which do no
 not have many reads
 3) the heap size was increased from 7000M to 8000M
 All these have not really helped , just the increase from 7000M to 8000M,
 helped in increase the cycle till excessive gc from ~9 days to ~14 days.

 I've tried to graph overtime the data that is supposed to be in heap vs
 actual heap size, by summing up all CFs bloom filter sizes + all CFs key
 cache capacities multipled by average key size + all CFs memtables data
 size reported (i've overestimated the data size a bit on purpose to be on
 the safe size).
 Here is a link to graph showing last 2 day metrics for a node which could
 not effectively do GC, and then cassandra process was restarted.
 http://awesomescreenshot.com/**0401w5y534http://awesomescreenshot.com/0401w5y534
 You can clearly see that before and after restart, the size of data that
 is in supposed to be in heap, is the same pretty much the same,
 which makes me think that I really need is GC tunning.

 Also I suppose that this is not due to number of total keys each node has
 , which is between 300 - 200 milions keys for all CF key estimates summed
 on a code.
 The nodes have datasize between 75G to 45G  accordingly to milions of
 keys. And all nodes are starting to have having GC heavy load after about
 14 days.
 Also the excessive GC and heap usage are not affected by load which varies
 depending on time of the day (see read/write rates at the beginning of the
 mail).
 So again based on this , I assume this is not due to large number of keys
 or too much load on the cluster,  but due to a pure GC misconfiguration
 issue.

 Things I remember that I've tried for GC tunning:
 1) Changing -XX:MaxTenuringThreshold=1 to values like 8 - did not help.
 2) Adding  -XX:+CMSIncrementalMode -XX:+CMSIncrementalPacing -XX:**
 CMSIncrementalDutyCycleMin=0
   -XX:CMSIncrementalDutyCycle=10 -XX:ParallelGCThreads=2
 JVM_OPTS -XX:ParallelCMSThreads=1
 this actually made things worse.
 3) Adding -XX:-XX-UseAdaptiveSizePolicy -XX:SurvivorRatio=8 - did not help.

 Also since it takes like 2 weeks to verify that changing GC setting did
 not help, the process is painfully slow to try all the possibilities :)
 I'd highly appreciate any help and hints on the GC tunning.

 tnx
 Alex









Re: Side effects of hinted handoff lead to consistency problem

2013-10-14 Thread Jason Tang
After check the log and configuration, I found it caused by two reason.

 1. GC grace seconds
I using hector client to connect cassandra, and the default value of GC
grace seconds for each column family is **Zero** ! So when hinted handoff
replay the temporary value, the tombstone on other two node is deleted by
compaction. And then client will get the temporary value.

 2. Secondary index
Even after fix the first problem, I can still get temporary result from
cassandra client. And I use the command like get my_cf where
column_one='value'  to query the data, then the temporary value show
again. But when I using the raw key to query the record again, it
disappeared.
And from client, we always using row key to get the data, and in this
way, I didn't get the temporary value.

So it seems the secondary index is not restricted by the consistency
configuration.

And when I change GC grace seconds to 10 days. our problem solved, but
it is still a strange behavior when using index query.


2013/10/8 Jason Tang ares.t...@gmail.com

 I have a 3 nodes cluster, replicate_factor is 3 also. Consistency level is
 Write quorum, Read quorum.
 Traffic has three major steps
 Create:
 Rowkey: 
 Column: status=new, requests=x
 Update:
  Rowkey: 
  Column: status=executing, requests=x
 Delete:
  Rowkey: 

 When one node down, it can work according to consistency configuration,
 and the final status is all requests are finished and delete.

 So if running cassandra client to list the result (also set consistency
 quorum). It shows empty (only rowkey left), which is correct.

 But if we start the dead node, the hinted handoff model will write back
 the data to this node. So there are lots of create, update, delete.

 I don't know due to GC or compaction, the delete records on other two
 nodes seems not work, and if using cassandra client to list the data (also
 consistency quorum), the deleted row show again with column value.

 And if using client to check the data several times, you can find the data
 is changed, seems hinted handoff replay operation, the deleted data show up
 and then disappear.

 So the hinted handoff mechanism will faster the repair, but the temporary
 data will be seen from external (if data is deleted).

 Is there a way to have this procedure invisible from external, until the
 hinted handoff finished?

 What I want is final status synchronization, the temporary status is out
 of date and also incorrect, should never been seen from external.

 Is it due to row delete instead of column delete? Or compaction?



Re: Failed to solve Digest mismatch

2013-10-09 Thread Jason Tang
I did some test on this issue, and it turns out the problem caused by local
time stamp.
In our traffic, the update and delete happened very fast, within 1 seconds,
even within 100ms.
And at that time, the ntp service seems not work well, the offset is same
times even larger then 1 second.

Then the some delete time stamp is before the create time stamp, so
when do mismatch
resolve, the result is not correct.


2012/7/4 aaron morton aa...@thelastpickle.com

 Jason,
 Are you able document the steps to reproduce this on a clean install ?

 Is so do you have time to create an issue on
 https://issues.apache.org/jira/browse/CASSANDRA

 Thanks


 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 2/07/2012, at 1:49 AM, Jason Tang wrote:

 For the create/update/deleteColumn/deleteRow test case, for Quorum
 consistency level, 6 nodes, replicate factor 3, for one thread around 1/100
 round, I can have this reproduced.

 And if I have 20 client threads to run the test client, the ratio is
 bigger.

 And the test group will be executed by one thread, and the client time
 stamp is unique and sequenced, guaranteed by Hector.

 And client only access the data from local Cassandra.

 And the query only use the row key which is unique. The column name is not
 unique, in my case, eg, status.

 And the row have around 7 columns, which are all not big, eg
 status:true, userName:Jason ...

 BRs
 //Ares

 2012/7/1 Jonathan Ellis jbel...@gmail.com

 Is this Cassandra 1.1.1?

 How often do you observe this?  How many columns are in the row?  Can
 you reproduce when querying by column name, or only when slicing the
 row?

 On Thu, Jun 28, 2012 at 7:24 AM, Jason Tang ares.t...@gmail.com wrote:
  Hi
 
 First I delete one column, then I delete one row. Then try to read
 all
  columns from the same row, all operations from same client app.
 
 The consistency level is read/write quorum.
 
 Check the Cassandra log, the local node don't perform the delete
  operation but send the mutation to other nodes (192.168.0.6,
 192.168.0.1)
 
 After delete, I try to read all columns from the row, I found the
 node
  found Digest mismatch due to Quorum consistency configuration, but the
  result is not correct.
 
 From the log, I can see the delete mutation already accepted
  by 192.168.0.6, 192.168.0.1,  but when 192.168.0.5 read response from
 0.6
  and 0.1, and then it merge the data, but finally 0.5 shows the result
 which
  is the dirty data.
 
 Following logs shows the change of column 737461747573 ,
 192.168.0.5
  try to read from 0.1 and 0.6, it should be deleted, but finally it
 shows it
  has the data.
 
  log:
  192.168.0.5
  DEBUG [Thrift:17] 2012-06-28 15:59:42,198 StorageProxy.java (line 653)
  Command/ConsistencyLevel is SliceByNamesReadCommand(table='drc',
  key=7878323239537570657254616e67307878,
  columnParent='QueryPath(columnFamilyName='queue',
 superColumnName='null',
  columnName='null')',
 
 columns=[6578656375746554696d65,6669726554696d65,67726f75705f6964,696e517565756554696d65,6c6f67526f6f744964,6d6f54797065,706172746974696f6e,7265636569766554696d65,72657175657374,7265747279,7365727669636550726f7669646572,737461747573,757365724e616d65,])/QUORUM
  DEBUG [Thrift:17] 2012-06-28 15:59:42,198 ReadCallback.java (line 79)
  Blockfor is 2; setting up requests to /192.168.0.6,/192.168.0.1
  DEBUG [Thrift:17] 2012-06-28 15:59:42,198 StorageProxy.java (line 674)
  reading data from /192.168.0.6
  DEBUG [Thrift:17] 2012-06-28 15:59:42,198 StorageProxy.java (line 694)
  reading digest from /192.168.0.1
  DEBUG [RequestResponseStage:2] 2012-06-28 15:59:42,199
  ResponseVerbHandler.java (line 44) Processing response on a callback
 from
  6556@/192.168.0.6
  DEBUG [RequestResponseStage:2] 2012-06-28 15:59:42,199
  AbstractRowResolver.java (line 66) Preprocessed data response
  DEBUG [RequestResponseStage:6] 2012-06-28 15:59:42,199
  ResponseVerbHandler.java (line 44) Processing response on a callback
 from
  6557@/192.168.0.1
  DEBUG [RequestResponseStage:6] 2012-06-28 15:59:42,199
  AbstractRowResolver.java (line 66) Preprocessed digest response
  DEBUG [Thrift:17] 2012-06-28 15:59:42,199 RowDigestResolver.java (line
 65)
  resolving 2 responses
  DEBUG [Thrift:17] 2012-06-28 15:59:42,200 StorageProxy.java (line 733)
  Digest mismatch: org.apache.cassandra.service.DigestMismatchException:
  Mismatch for key DecoratedKey(100572974179274741747356988451225858264,
  7878323239537570657254616e67307878) (b725ab25696111be49aaa7c4b7afa52d vs
  d41d8cd98f00b204e9800998ecf8427e)
  DEBUG [RequestResponseStage:9] 2012-06-28 15:59:42,201
  ResponseVerbHandler.java (line 44) Processing response on a callback
 from
  6558@/192.168.0.6
  DEBUG [RequestResponseStage:7] 2012-06-28 15:59:42,201
  ResponseVerbHandler.java (line 44) Processing response on a callback
 from
  6559@/192.168.0.1
  DEBUG [RequestResponseStage:9] 2012-06-28 15:59:42,201
  AbstractRowResolver.java (line 66

Side effects of hinted handoff lead to consistency problem

2013-10-08 Thread Jason Tang
I have a 3 nodes cluster, replicate_factor is 3 also. Consistency level is
Write quorum, Read quorum.
Traffic has three major steps
Create:
Rowkey: 
Column: status=new, requests=x
Update:
 Rowkey: 
 Column: status=executing, requests=x
Delete:
 Rowkey: 

When one node down, it can work according to consistency configuration, and
the final status is all requests are finished and delete.

So if running cassandra client to list the result (also set consistency
quorum). It shows empty (only rowkey left), which is correct.

But if we start the dead node, the hinted handoff model will write back the
data to this node. So there are lots of create, update, delete.

I don't know due to GC or compaction, the delete records on other two nodes
seems not work, and if using cassandra client to list the data (also
consistency quorum), the deleted row show again with column value.

And if using client to check the data several times, you can find the data
is changed, seems hinted handoff replay operation, the deleted data show up
and then disappear.

So the hinted handoff mechanism will faster the repair, but the temporary
data will be seen from external (if data is deleted).

Is there a way to have this procedure invisible from external, until the
hinted handoff finished?

What I want is final status synchronization, the temporary status is out of
date and also incorrect, should never been seen from external.

Is it due to row delete instead of column delete? Or compaction?


Why Cassandra so depend on client local timestamp?

2013-10-01 Thread Jason Tang
Following case may be logical correct for Cassandra, but difficult for user.
Let's say:

Cassandra consistency level: write all, read one
replication_factor:3

For one record, rowkey:001, column:status

Client 1, insert value for rowkey 001, status:True, timestamp 11:00:05
Client 2 Slice Query, get the value True for rowkey 001, @11:00:00
Client 2, update value for rowkey 001, status:False, timestamp 11:00:02

So the client update sequence is True to False, although the update
requests are from different nodes, but the sequence are logically ordered.

But the result is rowkey:001, column:status, value: True

So why Cassandra so depend on client local time? Why not using server
localtime instead client local time?

Because I am using consistency level write all, and replication_factor:3,
so for all the 3 nodes, the update sequence is correct (True - False),
they can give a correct final results.

If for some reason, it need strong depends on operation's timestamp, then
query operation also need a timestamp, then Client 2 will not see the value
True, which happen in future.

So either using server timestamp or provide a consistent view by using
timestamp for query, it will be more consistent.

Otherwise, the consistency of Cassandra is so weak.


Gossiper in Cassandra using unicast/broadcast/multicast ?

2013-06-20 Thread Jason Tang
Hi

   We are considering using Cassandra in virtualization environment. I
wonder is Cassandra using unicast/broadcast/multicast for node discover or
communication?

  From the code, I find the broadcast address is used for heartbeat in
Gossiper.java, but I don't know how actually it works when node
communication and when node start up (not for new node added in)

BRs


Re: Consistent problem when solve Digest mismatch

2013-03-06 Thread Jason Tang
Actually I didn't concurrent update the same records, because I first
create it, then search it, then delete it. The version conflict solved
failed, due to delete local time stamp is earlier then create local time
stamp.


2013/3/6 aaron morton aa...@thelastpickle.com

 Otherwise, it means the version conflict solving strong depends on global
 sequence id (timestamp) which need provide by client ?

 Yes.
 If you have an  area of your data model that has a high degree of
 concurrency C* may not be the right match.

 In 1.1 we have atomic updates so clients see either the entire write or
 none of it. And sometimes you can design a data model that does mutate
 shared values, but writes ledger entries instead. See Matt Denis talk here
 http://www.datastax.com/events/cassandrasummit2012/presentations or this
 post http://thelastpickle.com/2012/08/18/Sorting-Lists-For-Humans/

 Cheers

 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 4/03/2013, at 4:30 PM, Jason Tang ares.t...@gmail.com wrote:

 Hi

 The timestamp provided by my client is unix timestamp (with ntp), and as I
 said, due to the ntp drift, the local unix timestamp is not accurately
 synchronized (compare to my case).

 So for short, client can not provide global sequence number to indicate
 the event order.

 But I wonder, I configured Cassandra consistency level as write QUORUM. So
 for one record, I suppose Cassandra has the ability to decide the final
 update results.

 Otherwise, it means the version conflict solving strong depends on global
 sequence id (timestamp) which need provide by client ?


 //Tang


 2013/3/4 Sylvain Lebresne sylv...@datastax.com

 The problem is, what is the sequence number you are talking about is
 exactly?

 Or let me put it another way: if you do have a sequence number that
 provides a total ordering of your operation, then that is exactly what you
 should use as your timestamp. What Cassandra calls the timestamp, is
 exactly what you call seqID, it's the number Cassandra uses to decide the
 order of operation.

 Except that in real life, provided you have more than one client talking
 to Cassandra, then providing a total ordering of operation is hard, and in
 fact not doable efficiently. So in practice, people use unix timestamp
 (with ntp) which provide a very good while cheap approximation of the real
 life order of operations.

 But again, if you do know how to assign a more precise timestamp,
 Cassandra let you use that: you can provid your own timestamp (using unix
 timestamp is just the default). The point being, unix timestamp is the
 better approximation we have in practice.

 --
 Sylvain


 On Mon, Mar 4, 2013 at 9:26 AM, Jason Tang ares.t...@gmail.com wrote:

 Hi

   Previous I met a consistency problem, you can refer the link below for
 the whole story.

 http://mail-archives.apache.org/mod_mbox/cassandra-user/201206.mbox/%3CCAFb+LUxna0jiY0V=AvXKzUdxSjApYm4zWk=ka9ljm-txc04...@mail.gmail.com%3E

   And after check the code, seems I found some clue of the problem.
 Maybe some one can check this.

   For short, I have Cassandra cluster (1.0.3), The consistency level is
 read/write quorum, replication_factor is 3.

   Here is event sequence:

 seqID   NodeA   NodeB   NodeC
 1. New  New   New
 2. Update  Update   Update
 3. Delete   Delete

 When try to read from NodeB and NodeC, Digest mismatch exception
 triggered, so Cassandra try to resolve this version conflict.
 But the result is value Update.

 Here is the suspect root cause, the version conflict resolved based
 on time stamp.

 Node C local time is a bit earlier then node A.

 Update requests sent from node C with time stamp 00:00:00.050,
 Delete sent from node A with time stamp 00:00:00.020, which is not same
 as the event sequence.

 So the version conflict resolved incorrectly.

 It is true?

 If Yes, then it means, consistency level can secure the conflict been
 found, but to solve it correctly, dependence one time synchronization's
 accuracy, e.g. NTP ?








Re: Consistent problem when solve Digest mismatch

2013-03-04 Thread Jason Tang
Hi

The timestamp provided by my client is unix timestamp (with ntp), and as I
said, due to the ntp drift, the local unix timestamp is not accurately
synchronized (compare to my case).

So for short, client can not provide global sequence number to indicate the
event order.

But I wonder, I configured Cassandra consistency level as write QUORUM. So
for one record, I suppose Cassandra has the ability to decide the final
update results.

Otherwise, it means the version conflict solving strong depends on global
sequence id (timestamp) which need provide by client ?


//Tang


2013/3/4 Sylvain Lebresne sylv...@datastax.com

 The problem is, what is the sequence number you are talking about is
 exactly?

 Or let me put it another way: if you do have a sequence number that
 provides a total ordering of your operation, then that is exactly what you
 should use as your timestamp. What Cassandra calls the timestamp, is
 exactly what you call seqID, it's the number Cassandra uses to decide the
 order of operation.

 Except that in real life, provided you have more than one client talking
 to Cassandra, then providing a total ordering of operation is hard, and in
 fact not doable efficiently. So in practice, people use unix timestamp
 (with ntp) which provide a very good while cheap approximation of the real
 life order of operations.

 But again, if you do know how to assign a more precise timestamp,
 Cassandra let you use that: you can provid your own timestamp (using unix
 timestamp is just the default). The point being, unix timestamp is the
 better approximation we have in practice.

 --
 Sylvain


 On Mon, Mar 4, 2013 at 9:26 AM, Jason Tang ares.t...@gmail.com wrote:

 Hi

   Previous I met a consistency problem, you can refer the link below for
 the whole story.

 http://mail-archives.apache.org/mod_mbox/cassandra-user/201206.mbox/%3CCAFb+LUxna0jiY0V=AvXKzUdxSjApYm4zWk=ka9ljm-txc04...@mail.gmail.com%3E

   And after check the code, seems I found some clue of the problem. Maybe
 some one can check this.

   For short, I have Cassandra cluster (1.0.3), The consistency level is
 read/write quorum, replication_factor is 3.

   Here is event sequence:

 seqID   NodeA   NodeB   NodeC
 1. New  New   New
 2. Update  Update   Update
 3. Delete   Delete

 When try to read from NodeB and NodeC, Digest mismatch exception
 triggered, so Cassandra try to resolve this version conflict.
 But the result is value Update.

 Here is the suspect root cause, the version conflict resolved based
 on time stamp.

 Node C local time is a bit earlier then node A.

 Update requests sent from node C with time stamp 00:00:00.050, Delete
 sent from node A with time stamp 00:00:00.020, which is not same as the
 event sequence.

 So the version conflict resolved incorrectly.

 It is true?

 If Yes, then it means, consistency level can secure the conflict been
 found, but to solve it correctly, dependence one time synchronization's
 accuracy, e.g. NTP ?






Re: Cassandra Consistency problem with NTP

2013-01-16 Thread Jason Tang
Delay read is acceptable, but problem still there:
A request come to node One at local time PM 10:00:01.000
B request come to node Two at local time PM 10:00:00.980

The correct order is A -- B
I am not sure how node C will handle the data, although A came before B,
but B's timestamp is earlier then A ?



2013/1/17 Russell Haering russellhaer...@gmail.com

 One solution is to only read up to (now - 1 second). If this is a public
 API where you want to guarantee full consistency (ie, if you have added a
 message to the queue, it will definitely appear to be there) you can
 instead delay requests for 1 second before reading up to the moment that
 the request was received.

 In either of these approaches you can tune the time offset based on how
 closely synchronized you believe you can keep your clocks. The tradeoff of
 course, will be increased latency.


 On Wed, Jan 16, 2013 at 5:56 PM, Jason Tang ares.t...@gmail.com wrote:

 Hi

 I am using Cassandra in a message bus solution, the major responsibility
 of cassandra is recording the incoming requests for later consumming.

 One strategy is First in First out (FIFO), so I need to get the stored
 request in reversed order.

 I use NTP to synchronize the system time for the nodes in the cluster. (4
 nodes).

 But the local time of each node are still have some inaccuracy, around 40
 ms.

 The consistency level is write all and read one, and replicate factor is
 3.

 But here is the problem:
 A request come to node One at local time PM 10:00:01.000
 B request come to node Two at local time PM 10:00:00.980

 The correct order is A -- B
 But the timestamp is B -- A

 So is there any way for Cassandra to keep the correct order for read
 operation? (e.g. logical timestamp ?)

 Or Cassandra strong depence on time synchronization solution?

 BRs
 //Tang








Re: Cassandra Consistency problem with NTP

2013-01-16 Thread Jason Tang
Yes, Sylvain, you are correct.
When I say A comes before B,  it means client will secure the order,
actually, B will be sent only after get response of A request.

And Yes, A and B are not update same record, so it is not typical Cassandra
consistency problem.

And Yes, the column name is provide by client, and now I use the local
timestamp, and local time of A and B are not synchronized well, so I have
problem.

So what I want is, Cassandra provide some information for client, to
indicate A is stored before B, e.g. global unique timestamp, or  row order.




2013/1/17 Sylvain Lebresne sylv...@datastax.com

 I'm not sure I fully understand your problem. You seem to be talking of
 ordering the requests, in the order they are generated. But in that case,
 you will rely on the ordering of columns within whatever row you store
 request A and B in, and that order depends on the column names, which in
 turns is client provided and doesn't depend at all of the time
 synchronization of the cluster nodes. And since you are able to say that
 request A comes before B, I suppose this means said requests are generated
 from the same source. In which case you just need to make sure that the
 column names storing each request respect the correct ordering.

 The column timestamps Cassandra uses are here to which update *to the same
 column* is the more recent one. So it only comes into play if you requests
 A and B update the same column and you're interested in knowing which one
 of the update will win when you read. But even if that's your case (which
 doesn't sound like it at all from your description), the column timestamp
 is only generated server side if you use CQL. And even in that latter case,
 it's a convenience and you can force a timestamp client side if you really
 wish. In other words, Cassandra dependency on time synchronization is not a
 strong one even in that case. But again, that doesn't seem at all to be the
 problem you are trying to solve.

 --
 Sylvain


 On Thu, Jan 17, 2013 at 2:56 AM, Jason Tang ares.t...@gmail.com wrote:

 Hi

 I am using Cassandra in a message bus solution, the major responsibility
 of cassandra is recording the incoming requests for later consumming.

 One strategy is First in First out (FIFO), so I need to get the stored
 request in reversed order.

 I use NTP to synchronize the system time for the nodes in the cluster. (4
 nodes).

 But the local time of each node are still have some inaccuracy, around 40
 ms.

 The consistency level is write all and read one, and replicate factor is
 3.

 But here is the problem:
 A request come to node One at local time PM 10:00:01.000
 B request come to node Two at local time PM 10:00:00.980

 The correct order is A -- B
 But the timestamp is B -- A

 So is there any way for Cassandra to keep the correct order for read
 operation? (e.g. logical timestamp ?)

 Or Cassandra strong depence on time synchronization solution?

 BRs
 //Tang








Re: is it possible to disable compaction per CF ?

2012-07-27 Thread Jason Tang
setMaxCompactionThreshold(0)
setMinCompactionThreshold(0)

2012/7/27 Илья Шипицин chipits...@gmail.com

 Hello!

 if we are dealing with append-only data model, so what if I disable
 compaction on certain CF ?
 any side effect ?

 can I do it with

 update column family  with compaction_strategy = null  ?

 Cheers,
 Ilya Shipitsin



Compaction not remove the deleted data from secondary index when use TTL

2012-07-19 Thread Jason Tang
Hi

 For some consistency problem, we can not use delete direct to delete
one row, and then we use TTL for each column of the row.

 We using the Cassandra as the central storage of the stateful system.
All request will be stored in Cassandra, and marked as status;NEW, and then
we change it to status:EXECUTING, then delete it (by TTL).

 And we use secondary index of column 'status', and after process 4
million requests, most of the requests are deleted from Cassandra.

 After executing compact from nodetool, the size of CF Requests SSTable
is decreased to about 20M, but the Requests.idxStatus
is continuously increased, and about 1.6G.

 And from the system log, I found the compact command from nodetool
will not trigger the compaction of the secondary index, but during the
traffic, when compaction of the CF Requests triggered, the compaction of
the index will be started also.

But the size of the SSTable not decreased as expected, it seems the
data in secondary index not deleted. And since we only have 3 status, I can
found such log
 INFO [CompactionExecutor:31] 2012-07-20 10:30:50,532
CompactionController.java (line 129) Compacting large row demo/
Requests.idxStatus:EXECUTING (264045300 bytes) incrementally

So why the secondary index not compact to small size as expected, is it
related to TTL?

And is it possible to rebuild the index ?

BRs


Re: Replication factor - Consistency Questions

2012-07-18 Thread Jason Tang
Yes, for ALL, it is not good for HA, and because we meet problem when use
QUORAM, and current solution is switch Write:QUORAM / Read:QUORAM when got
UnavailableException exception.

2012/7/18 Jay Parashar jparas...@itscape.com

 Thanks..but write ALL will fail for any downed nodes. I am thinking of
 QUORAM.

 ** **

 *From:* Jason Tang [mailto:ares.t...@gmail.com]
 *Sent:* Tuesday, July 17, 2012 8:24 PM
 *To:* user@cassandra.apache.org
 *Subject:* Re: Replication factor - Consistency Questions

 ** **

 Hi

 ** **

 I am starting using Cassandra for not a long time, and also have problems
 in consistency.

 ** **

 Here is some thinking.

 If you have Write:Any / Read:One, it will have consistency problem, and if
 you want to repair, check your schema, and check the parameter Read repair
 chance: 

 http://wiki.apache.org/cassandra/StorageConfiguration 

 ** **

 And if you want to get consistency result, my suggestion is to have
 Write:ALL / Read:One, since for Cassandra, write is more faster then read.
 

 ** **

 For performance impact, you need to test your traffic, and if your memory
 can not cache all your data, or your network is not fast enough, then yes,
 it will impact to write one more node.

 ** **

 BRs

 ** **

 2012/7/18 Jay Parashar jparas...@itscape.com

 Hello all,

 There is a lot of material on Replication factor and Consistency level but
 I
 am a little confused by what is happening on my setup. (Cassandra 1.1.2). I
 would appreciate any answers.

 My Setup: A cluster of 2 nodes evenly balanced. My RF =2, Consistency
 Level;
 Write = ANY and Read = 1

 I know that my consistency is Weak but since my RF = 2, I thought data
 would
 be just duplicated in both the nodes but sometimes, querying does not give
 me the correct (or gives partial) results. In other times, it gives me the
 right results
 Is the Read Repair going on after the first query? But as RF = 2, data is
 duplicated then why the repair?
 Note: My query is done a while after the Writes so data should have been in
 both the nodes. Or is this not the case (flushing not happening etc)?

 I am thinking of making the Write as 1 and Read as QUORAM so R + W  RF (1
 +
 2  2) to give strong consistency. Will that affect performance a lot
 (generally speaking)?

 Thanks in advance
 Regards

 Jay

 

 ** **



Re: Replication factor - Consistency Questions

2012-07-17 Thread Jason Tang
Hi

I am starting using Cassandra for not a long time, and also have problems
in consistency.

Here is some thinking.
If you have Write:Any / Read:One, it will have consistency problem, and if
you want to repair, check your schema, and check the parameter Read repair
chance: 
http://wiki.apache.org/cassandra/StorageConfiguration

And if you want to get consistency result, my suggestion is to have
Write:ALL / Read:One, since for Cassandra, write is more faster then read.

For performance impact, you need to test your traffic, and if your memory
can not cache all your data, or your network is not fast enough, then yes,
it will impact to write one more node.

BRs


2012/7/18 Jay Parashar jparas...@itscape.com

 Hello all,

 There is a lot of material on Replication factor and Consistency level but
 I
 am a little confused by what is happening on my setup. (Cassandra 1.1.2). I
 would appreciate any answers.

 My Setup: A cluster of 2 nodes evenly balanced. My RF =2, Consistency
 Level;
 Write = ANY and Read = 1

 I know that my consistency is Weak but since my RF = 2, I thought data
 would
 be just duplicated in both the nodes but sometimes, querying does not give
 me the correct (or gives partial) results. In other times, it gives me the
 right results
 Is the Read Repair going on after the first query? But as RF = 2, data is
 duplicated then why the repair?
 Note: My query is done a while after the Writes so data should have been in
 both the nodes. Or is this not the case (flushing not happening etc)?

 I am thinking of making the Write as 1 and Read as QUORAM so R + W  RF (1
 +
 2  2) to give strong consistency. Will that affect performance a lot
 (generally speaking)?

 Thanks in advance
 Regards

 Jay





Re: Cassandra take 100% CPU for 2~3 minutes every half an hour and mutation lost

2012-07-12 Thread Jason Tang
Hi

After change the parameter of concurrent compactor, we can limit Cassandra
to use 100% of one core at that moment. (concurrent_compactors: 1)

And I got the stack of the crazy thread, it last 2~3 minutes, on same
stack.

Any clue of this issue?

Thread 18114: (state = IN_JAVA)

 - java.util.AbstractList$Itr.hasNext() @bci=8, line=339 (Compiled frame;
information may be imprecise)

 -
org.apache.cassandra.db.ColumnFamilyStore.removeDeletedStandard(org.apache.cassandra.db.ColumnFamily,
int) @bci=6, line=841 (Compiled frame)

 -
org.apache.cassandra.db.ColumnFamilyStore.removeDeletedColumnsOnly(org.apache.cassandra.db.ColumnFamily,
int) @bci=17, line=835 (Compiled frame)

 -
org.apache.cassandra.db.ColumnFamilyStore.removeDeleted(org.apache.cassandra.db.ColumnFamily,
int) @bci=8, line=826 (Compiled frame)

 -
org.apache.cassandra.db.compaction.PrecompactedRow.removeDeletedAndOldShards(org.apache.cassandra.db.DecoratedKey,
org.apache.cassandra.db.compaction.CompactionController,
org.apache.cassandra.db.ColumnFamily) @bci=38, line=77 (Compiled frame)

 -
org.apache.cassandra.db.compaction.PrecompactedRow.init(org.apache.cassandra.db.compaction.CompactionController,
java.util.List) @bci=33, line=102 (Compiled frame)

 -
org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(java.util.List)
@bci=223, line=133 (Compiled frame)

 -
org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced()
@bci=44, line=102 (Compiled frame)

 -
org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced()
@bci=1, line=87 (Compiled frame)

 - org.apache.cassandra.utils.MergeIterator$ManyToOne.consume() @bci=88,
line=116 (Compiled frame)

 - org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext() @bci=5,
line=99 (Compiled frame)

 - com.google.common.collect.AbstractIterator.tryToComputeNext() @bci=9,
line=140 (Compiled frame)

 - com.google.common.collect.AbstractIterator.hasNext() @bci=61, line=135
(Compiled frame)

 - com.google.common.collect.Iterators$7.computeNext() @bci=4, line=614
(Compiled frame)

 - com.google.common.collect.AbstractIterator.tryToComputeNext() @bci=9,
line=140 (Compiled frame)

 - com.google.common.collect.AbstractIterator.hasNext() @bci=61, line=135
(Compiled frame)

 -
org.apache.cassandra.db.compaction.CompactionTask.execute(org.apache.cassandra.db.compaction.CompactionManager$CompactionExecutorStatsCollector)
@bci=542, line=141 (Compiled frame)

 - org.apache.cassandra.db.compaction.CompactionManager$1.call() @bci=117,
line=134 (Interpreted frame)

 - org.apache.cassandra.db.compaction.CompactionManager$1.call() @bci=1,
line=114 (Interpreted frame)

 - java.util.concurrent.FutureTask$Sync.innerRun() @bci=30, line=303
(Interpreted frame)

 - java.util.concurrent.FutureTask.run() @bci=4, line=138 (Interpreted
frame)

 -
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(java.lang.Runnable)
@bci=59, line=886 (Compiled frame)

 - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=28, line=908
(Compiled frame)

 - java.lang.Thread.run() @bci=11, line=662 (Interpreted frame)



BRs

//Jason



2012/7/11 Jason Tang ares.t...@gmail.com

 Hi

 I encounter the High CPU problem, Cassandra 1.0.3, happened on both
 sized and leveled compaction, 6G heap, 64bit Oracle java. For normal
 traffic, Cassandra will use 15% CPU.

 But every half a hour, Cassandra will use almost 100% total cpu (SUSE,
 12 Core).

 And here is the top information for that moment.

 #top -H -p 12451

 top - 12:30:14 up 15 days, 12:49,  6 users,  load average: 10.52, 8.92,
 8.14
 Tasks: 706 total,  21 running, 685 sleeping,   0 stopped,   0 zombie
 Cpu(s): 25.7%us, 14.0%sy, 48.9%ni,  6.5%id,  0.0%wa,  0.0%hi,  4.9%si,
  0.0%st
 Mem: 24150M total,12218M used,11932M free,  142M buffers
 Swap:0M total,0M used,0M free, 3714M cached

   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
 20291 casadm24   4 8003m 5.4g 167m R   92 22.7   0:42.46 java
 20276 casadm24   4 8003m 5.4g 167m R   88 22.7   0:43.88 java
 20181 casadm24   4 8003m 5.4g 167m R   86 22.7   0:52.97 java
 20213 casadm24   4 8003m 5.4g 167m R   85 22.7   0:49.21 java
 20188 casadm24   4 8003m 5.4g 167m R   82 22.7   0:54.34 java
 20268 casadm24   4 8003m 5.4g 167m R   81 22.7   0:46.25 java
 20269 casadm24   4 8003m 5.4g 167m R   41 22.7   0:15.11 java
 20316 casadm24   4 8003m 5.4g 167m S   20 22.7   0:02.35 java
 20191 casadm24   4 8003m 5.4g 167m R   15 22.7   0:16.85 java
 12500 casadm20   0 8003m 5.4g 167m R6 22.7   1:07.86 java
 15245 casadm20   0 8003m 5.4g 167m D5 22.7   0:36.45 java

 Jstack can not print the stack.
 Thread 20291: (state = IN_JAVA)
 Error occurred during stack walking:
 ...
 Thread 20276: (state = IN_JAVA)
 Error occurred during stack walking:

 After it come back, the stack shows:
 Thread 20291: (state = BLOCKED)
  - sun.misc.Unsafe.park(boolean, long) @bci

Cassandra take 100% CPU for 2~3 minutes every half an hour and mutation lost

2012-07-10 Thread Jason Tang
Hi

I encounter the High CPU problem, Cassandra 1.0.3, happened on both
sized and leveled compaction, 6G heap, 64bit Oracle java. For normal
traffic, Cassandra will use 15% CPU.

But every half a hour, Cassandra will use almost 100% total cpu (SUSE,
12 Core).

And here is the top information for that moment.

#top -H -p 12451

top - 12:30:14 up 15 days, 12:49,  6 users,  load average: 10.52, 8.92, 8.14
Tasks: 706 total,  21 running, 685 sleeping,   0 stopped,   0 zombie
Cpu(s): 25.7%us, 14.0%sy, 48.9%ni,  6.5%id,  0.0%wa,  0.0%hi,  4.9%si,
 0.0%st
Mem: 24150M total,12218M used,11932M free,  142M buffers
Swap:0M total,0M used,0M free, 3714M cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
20291 casadm24   4 8003m 5.4g 167m R   92 22.7   0:42.46 java
20276 casadm24   4 8003m 5.4g 167m R   88 22.7   0:43.88 java
20181 casadm24   4 8003m 5.4g 167m R   86 22.7   0:52.97 java
20213 casadm24   4 8003m 5.4g 167m R   85 22.7   0:49.21 java
20188 casadm24   4 8003m 5.4g 167m R   82 22.7   0:54.34 java
20268 casadm24   4 8003m 5.4g 167m R   81 22.7   0:46.25 java
20269 casadm24   4 8003m 5.4g 167m R   41 22.7   0:15.11 java
20316 casadm24   4 8003m 5.4g 167m S   20 22.7   0:02.35 java
20191 casadm24   4 8003m 5.4g 167m R   15 22.7   0:16.85 java
12500 casadm20   0 8003m 5.4g 167m R6 22.7   1:07.86 java
15245 casadm20   0 8003m 5.4g 167m D5 22.7   0:36.45 java

Jstack can not print the stack.
Thread 20291: (state = IN_JAVA)
Error occurred during stack walking:
...
Thread 20276: (state = IN_JAVA)
Error occurred during stack walking:

After it come back, the stack shows:
Thread 20291: (state = BLOCKED)
 - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information
may be imprecise)
 - java.util.concurrent.locks.LockSupport.parkNanos(java.lang.Object, long)
@bci=20, line=196 (Compiled frame)
 -
java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(java.util.concurrent.SynchronousQueue$TransferStack$SNode,
boolean, long) @bci=174, line=424 (Compiled frame)
 -
java.util.concurrent.SynchronousQueue$TransferStack.transfer(java.lang.Object,
boolean, long) @bci=102, line=323 (Compiled frame)
 - java.util.concurrent.SynchronousQueue.poll(long,
java.util.concurrent.TimeUnit) @bci=11, line=874 (Compiled frame)
 - java.util.concurrent.ThreadPoolExecutor.getTask() @bci=62, line=945
(Compiled frame)
 - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=18, line=907
(Compiled frame)
 - java.lang.Thread.run() @bci=11, line=662 (Interpreted frame

And after this happened, the data is not correct, some
large column which suppose to be deleted, come back again.
Here is the suspect thread when it use up 100%
Thread 20191: (state = IN_VM)
 - sun.misc.Unsafe.unpark(java.lang.Object) @bci=0 (Compiled frame;
information may be imprecise)
 - java.util.concurrent.locks.LockSupport.unpark(java.lang.Thread) @bci=8,
line=122 (Compiled frame)
 -
java.util.concurrent.SynchronousQueue$TransferStack$SNode.tryMatch(java.util.concurrent.SynchronousQueue$TransferStack$SNode)
@bci=34, line=242 (Compiled frame)
 -
java.util.concurrent.SynchronousQueue$TransferStack.transfer(java.lang.Object,
boolean, long) @bci=268, line=344 (Compiled frame)
 - java.util.concurrent.SynchronousQueue.offer(java.lang.Object) @bci=19,
line=846 (Compiled frame)
 - java.util.concurrent.ThreadPoolExecutor.execute(java.lang.Runnable)
@bci=43, line=653 (Compiled frame)
 -
java.util.concurrent.AbstractExecutorService.submit(java.util.concurrent.Callable)
@bci=20, line=92 (Compiled frame)
 -
org.apache.cassandra.db.compaction.ParallelCompactionIterable$Reducer.getCompactedRow(java.util.List)
@bci=86, line=190 (Compiled frame) -
org.apache.cassandra.db.compaction.ParallelCompactionIterable$Reducer.getReduced()
@bci=31, line=164 (Compiled frame)
 -
org.apache.cassandra.db.compaction.ParallelCompactionIterable$Reducer.getReduced()
@bci=1, line=144 (Compiled frame)
 - org.apache.cassandra.utils.MergeIterator$ManyToOne.consume() @bci=88,
line=116 (Compiled frame)
 - org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext() @bci=5,
line=99 (Compiled frame)
 - com.google.common.collect.AbstractIterator.tryToComputeNext() @bci=9,
line=140 (Compiled frame)
 - com.google.common.collect.AbstractIterator.hasNext() @bci=61, line=135
(Compiled frame)
 -
org.apache.cassandra.db.compaction.ParallelCompactionIterable$Unwrapper.computeNext()
@bci=4, line=103 (Compiled frame)
 -
org.apache.cassandra.db.compaction.ParallelCompactionIterable$Unwrapper.computeNext()
@bci=1, line=90 (Compiled frame)
 - com.google.common.collect.AbstractIterator.tryToComputeNext() @bci=9,
line=140 (Compiled frame)
 - com.google.common.collect.AbstractIterator.hasNext() @bci=61, line=135
(Compiled frame)
 - com.google.common.collect.Iterators$7.computeNext() @bci=4, line=614
(Compiled frame)
 - 

Re: Failed to solve Digest mismatch

2012-07-01 Thread Jason Tang
For the create/update/deleteColumn/deleteRow test case, for Quorum
consistency level, 6 nodes, replicate factor 3, for one thread around 1/100
round, I can have this reproduced.

And if I have 20 client threads to run the test client, the ratio is bigger.

And the test group will be executed by one thread, and the client time
stamp is unique and sequenced, guaranteed by Hector.

And client only access the data from local Cassandra.

And the query only use the row key which is unique. The column name is not
unique, in my case, eg, status.

And the row have around 7 columns, which are all not big, eg status:true,
userName:Jason ...

BRs
//Ares

2012/7/1 Jonathan Ellis jbel...@gmail.com

 Is this Cassandra 1.1.1?

 How often do you observe this?  How many columns are in the row?  Can
 you reproduce when querying by column name, or only when slicing the
 row?

 On Thu, Jun 28, 2012 at 7:24 AM, Jason Tang ares.t...@gmail.com wrote:
  Hi
 
 First I delete one column, then I delete one row. Then try to read all
  columns from the same row, all operations from same client app.
 
 The consistency level is read/write quorum.
 
 Check the Cassandra log, the local node don't perform the delete
  operation but send the mutation to other nodes (192.168.0.6, 192.168.0.1)
 
 After delete, I try to read all columns from the row, I found the node
  found Digest mismatch due to Quorum consistency configuration, but the
  result is not correct.
 
 From the log, I can see the delete mutation already accepted
  by 192.168.0.6, 192.168.0.1,  but when 192.168.0.5 read response from 0.6
  and 0.1, and then it merge the data, but finally 0.5 shows the result
 which
  is the dirty data.
 
 Following logs shows the change of column 737461747573 , 192.168.0.5
  try to read from 0.1 and 0.6, it should be deleted, but finally it shows
 it
  has the data.
 
  log:
  192.168.0.5
  DEBUG [Thrift:17] 2012-06-28 15:59:42,198 StorageProxy.java (line 653)
  Command/ConsistencyLevel is SliceByNamesReadCommand(table='drc',
  key=7878323239537570657254616e67307878,
  columnParent='QueryPath(columnFamilyName='queue', superColumnName='null',
  columnName='null')',
 
 columns=[6578656375746554696d65,6669726554696d65,67726f75705f6964,696e517565756554696d65,6c6f67526f6f744964,6d6f54797065,706172746974696f6e,7265636569766554696d65,72657175657374,7265747279,7365727669636550726f7669646572,737461747573,757365724e616d65,])/QUORUM
  DEBUG [Thrift:17] 2012-06-28 15:59:42,198 ReadCallback.java (line 79)
  Blockfor is 2; setting up requests to /192.168.0.6,/192.168.0.1
  DEBUG [Thrift:17] 2012-06-28 15:59:42,198 StorageProxy.java (line 674)
  reading data from /192.168.0.6
  DEBUG [Thrift:17] 2012-06-28 15:59:42,198 StorageProxy.java (line 694)
  reading digest from /192.168.0.1
  DEBUG [RequestResponseStage:2] 2012-06-28 15:59:42,199
  ResponseVerbHandler.java (line 44) Processing response on a callback from
  6556@/192.168.0.6
  DEBUG [RequestResponseStage:2] 2012-06-28 15:59:42,199
  AbstractRowResolver.java (line 66) Preprocessed data response
  DEBUG [RequestResponseStage:6] 2012-06-28 15:59:42,199
  ResponseVerbHandler.java (line 44) Processing response on a callback from
  6557@/192.168.0.1
  DEBUG [RequestResponseStage:6] 2012-06-28 15:59:42,199
  AbstractRowResolver.java (line 66) Preprocessed digest response
  DEBUG [Thrift:17] 2012-06-28 15:59:42,199 RowDigestResolver.java (line
 65)
  resolving 2 responses
  DEBUG [Thrift:17] 2012-06-28 15:59:42,200 StorageProxy.java (line 733)
  Digest mismatch: org.apache.cassandra.service.DigestMismatchException:
  Mismatch for key DecoratedKey(100572974179274741747356988451225858264,
  7878323239537570657254616e67307878) (b725ab25696111be49aaa7c4b7afa52d vs
  d41d8cd98f00b204e9800998ecf8427e)
  DEBUG [RequestResponseStage:9] 2012-06-28 15:59:42,201
  ResponseVerbHandler.java (line 44) Processing response on a callback from
  6558@/192.168.0.6
  DEBUG [RequestResponseStage:7] 2012-06-28 15:59:42,201
  ResponseVerbHandler.java (line 44) Processing response on a callback from
  6559@/192.168.0.1
  DEBUG [RequestResponseStage:9] 2012-06-28 15:59:42,201
  AbstractRowResolver.java (line 66) Preprocessed data response
  DEBUG [RequestResponseStage:7] 2012-06-28 15:59:42,201
  AbstractRowResolver.java (line 66) Preprocessed data response
  DEBUG [Thrift:17] 2012-06-28 15:59:42,201 RowRepairResolver.java (line
 63)
  resolving 2 responses
  DEBUG [Thrift:17] 2012-06-28 15:59:42,201 SliceQueryFilter.java (line
 123)
  collecting 0 of 2147483647: 6669726554696d65:false:13@1340870382109004
  DEBUG [Thrift:17] 2012-06-28 15:59:42,201 SliceQueryFilter.java (line
 123)
  collecting 1 of 2147483647: 67726f75705f6964:false:10@1340870382109014
  DEBUG [Thrift:17] 2012-06-28 15:59:42,201 SliceQueryFilter.java (line
 123)
  collecting 2 of 2147483647:
 696e517565756554696d65:false:13@1340870382109005
  DEBUG [Thrift:17] 2012-06-28 15:59:42,201 SliceQueryFilter.java (line
 123)
  collecting 3 of 2147483647

Failed to solve Digest mismatch

2012-06-28 Thread Jason Tang
Hi

   First I delete one column, then I delete one row. Then try to read all
columns from the same row, all operations from same client app.

   The consistency level is read/write quorum.

   Check the Cassandra log, the local node don't perform the delete
operation but send the mutation to other nodes (192.168.0.6, 192.168.0.1)

   After delete, I try to read all columns from the row, I found the node
found Digest mismatch due to Quorum consistency configuration, but the
result is not correct.

   From the log, I can see the delete mutation already accepted
by 192.168.0.6, 192.168.0.1,  but when 192.168.0.5 read response from 0.6
and 0.1, and then it merge the data, but finally 0.5 shows the result which
is the dirty data.

   Following logs shows the change of column 737461747573 , 192.168.0.5
try to read from 0.1 and 0.6, it should be deleted, but finally it shows it
has the data.

log:
192.168.0.5
DEBUG [Thrift:17] 2012-06-28 15:59:42,198 StorageProxy.java (line 653)
Command/ConsistencyLevel is SliceByNamesReadCommand(table='drc',
key=7878323239537570657254616e67307878,
columnParent='QueryPath(columnFamilyName='queue', superColumnName='null',
columnName='null')',
columns=[6578656375746554696d65,6669726554696d65,67726f75705f6964,696e517565756554696d65,6c6f67526f6f744964,6d6f54797065,706172746974696f6e,7265636569766554696d65,72657175657374,7265747279,7365727669636550726f7669646572,
737461747573,757365724e616d65,])/QUORUM
DEBUG [Thrift:17] 2012-06-28 15:59:42,198 ReadCallback.java (line 79)
Blockfor is 2; setting up requests to /192.168.0.6,/192.168.0.1
DEBUG [Thrift:17] 2012-06-28 15:59:42,198 StorageProxy.java (line 674)
reading data from /192.168.0.6
DEBUG [Thrift:17] 2012-06-28 15:59:42,198 StorageProxy.java (line 694)
reading digest from /192.168.0.1
DEBUG [RequestResponseStage:2] 2012-06-28 15:59:42,199
ResponseVerbHandler.java (line 44) Processing response on a callback from
6556@/192.168.0.6
DEBUG [RequestResponseStage:2] 2012-06-28 15:59:42,199
AbstractRowResolver.java (line 66) Preprocessed data response
DEBUG [RequestResponseStage:6] 2012-06-28 15:59:42,199
ResponseVerbHandler.java (line 44) Processing response on a callback from
6557@/192.168.0.1
DEBUG [RequestResponseStage:6] 2012-06-28 15:59:42,199
AbstractRowResolver.java (line 66) Preprocessed digest response
DEBUG [Thrift:17] 2012-06-28 15:59:42,199 RowDigestResolver.java (line 65)
resolving 2 responses
DEBUG [Thrift:17] 2012-06-28 15:59:42,200 StorageProxy.java (line 733)
Digest mismatch: org.apache.cassandra.service.DigestMismatchException:
Mismatch for key DecoratedKey(100572974179274741747356988451225858264,
7878323239537570657254616e67307878) (b725ab25696111be49aaa7c4b7afa52d vs
d41d8cd98f00b204e9800998ecf8427e)
DEBUG [RequestResponseStage:9] 2012-06-28 15:59:42,201
ResponseVerbHandler.java (line 44) Processing response on a callback from
6558@/192.168.0.6
DEBUG [RequestResponseStage:7] 2012-06-28 15:59:42,201
ResponseVerbHandler.java (line 44) Processing response on a callback from
6559@/192.168.0.1
DEBUG [RequestResponseStage:9] 2012-06-28 15:59:42,201
AbstractRowResolver.java (line 66) Preprocessed data response
DEBUG [RequestResponseStage:7] 2012-06-28 15:59:42,201
AbstractRowResolver.java (line 66) Preprocessed data response
DEBUG [Thrift:17] 2012-06-28 15:59:42,201 RowRepairResolver.java (line 63)
resolving 2 responses
DEBUG [Thrift:17] 2012-06-28 15:59:42,201 SliceQueryFilter.java (line 123)
collecting 0 of 2147483647: 6669726554696d65:false:13@1340870382109004
DEBUG [Thrift:17] 2012-06-28 15:59:42,201 SliceQueryFilter.java (line 123)
collecting 1 of 2147483647: 67726f75705f6964:false:10@1340870382109014
DEBUG [Thrift:17] 2012-06-28 15:59:42,201 SliceQueryFilter.java (line 123)
collecting 2 of 2147483647: 696e517565756554696d65:false:13@1340870382109005
DEBUG [Thrift:17] 2012-06-28 15:59:42,201 SliceQueryFilter.java (line 123)
collecting 3 of 2147483647: 6c6f67526f6f744964:false:7@1340870382109015
DEBUG [Thrift:17] 2012-06-28 15:59:42,202 SliceQueryFilter.java (line 123)
collecting 4 of 2147483647: 6d6f54797065:false:6@1340870382109009
DEBUG [Thrift:17] 2012-06-28 15:59:42,202 SliceQueryFilter.java (line 123)
collecting 5 of 2147483647: 706172746974696f6e:false:2@1340870382109001
DEBUG [Thrift:17] 2012-06-28 15:59:42,202 SliceQueryFilter.java (line 123)
collecting 6 of 2147483647: 7265636569766554696d65:false:13@1340870382109003
DEBUG [Thrift:17] 2012-06-28 15:59:42,202 SliceQueryFilter.java (line 123)
collecting 7 of 2147483647: 72657175657374:false:300@1340870382109013
DEBUG [RequestResponseStage:5] 2012-06-28 15:59:42,202
ResponseVerbHandler.java (line 44) Processing response on a callback from
6552@/192.168.0.1
DEBUG [Thrift:17] 2012-06-28 15:59:42,202 SliceQueryFilter.java (line 123)
collecting 8 of 2147483647: 7265747279:false:1@1340870382109006
DEBUG [Thrift:17] 2012-06-28 15:59:42,202 SliceQueryFilter.java (line 123)
collecting 9 of 2147483647:
7365727669636550726f7669646572:false:4@1340870382109007
DEBUG 

Re: Consistency Problem with Quorum consistencyLevel configuration

2012-06-26 Thread Jason Tang
Hi
  After enable Cassandra debug log, I got following log, it shows the
delete mutation send to other two nodes rather then local node.
  And then the read command come to the local nodes.
  And local one found the mismatch.
  But I don't know why local node return the local dirty data. It supposed
to repair the data, and return correct one?

192.168.0.6:
DEBUG [MutationStage:61] 2012-06-26 23:09:00,036
RowMutationVerbHandler.java (line 60) RowMutation(keyspace='drc',
key='33323130537570657254616e6730', modifications=[ColumnFamily(queue
-deleted at 1340723340044000- [])]) applied.  Sending response to 3555@/
192.168.0.5

192.168.0.4:
DEBUG [MutationStage:40] 2012-06-26 23:09:00,041
RowMutationVerbHandler.java (line 60) RowMutation(keyspace='drc',
key='33323130537570657254616e6730', modifications=[ColumnFamily(queue
-deleted at 1340723340044000- [])]) applied.  Sending response to 3556@/
192.168.0.5

192.168.0.5 (local one):
DEBUG [pool-2-thread-20] 2012-06-26 23:09:00,105 StorageProxy.java (line
705) Digest mismatch: org.apache.cassandra.service.DigestMismatchException:
Mismatch for key DecoratedKey(7649972972837658739074639933581556,
33323130537570657254616e6730) (b20ac6ec0d29393d70e200027c094d13 vs
d41d8cd98f00b204e9800998ecf8427e)



2012/6/25 Jason Tang ares.t...@gmail.com

 Hi

 I met the consistency problem when we have Quorum for both read and
 write.

 I use MultigetSubSliceQuery to query rows from super column limit size
 100, and then read it, then delete it. And start another around.

 But I found, the row which should be delete by last query, it still
 shown from next around query.

 And also form normal Column Family, I updated the value of one column
 from status='FALSE' to status='TURE', and next time I query it, the status
 still 'FALSE'.

 More detail:

- It not happened not every time (1/10,000)
- The time between two round query is around 500 ms (but we found two
query which 2 seconds happened later then the first one, still have this
consistency problem)
- We use ntp as our cluster time synchronization solution.
- We have 6 nodes, and replication factor is 3

 Some body say, Cassandra suppose to have such problem, because read
 may not happen before write inside Cassandra. But for two seconds?! And if
 so, it meaningless to have Quorum or other consistency level configuration.

So first of all, is it the correct behavior of Cassandra, and if not,
 what data we need to analyze for further investment.

 BRs
 Ares



Consistency Problem with Quorum consistencyLevel configuration

2012-06-24 Thread Jason Tang
Hi

I met the consistency problem when we have Quorum for both read and
write.

I use MultigetSubSliceQuery to query rows from super column limit size
100, and then read it, then delete it. And start another around.

But I found, the row which should be delete by last query, it still
shown from next around query.

And also form normal Column Family, I updated the value of one column
from status='FALSE' to status='TURE', and next time I query it, the status
still 'FALSE'.

More detail:

   - It not happened not every time (1/10,000)
   - The time between two round query is around 500 ms (but we found two
   query which 2 seconds happened later then the first one, still have this
   consistency problem)
   - We use ntp as our cluster time synchronization solution.
   - We have 6 nodes, and replication factor is 3

Some body say, Cassandra suppose to have such problem, because read may
not happen before write inside Cassandra. But for two seconds?! And if so,
it meaningless to have Quorum or other consistency level configuration.

   So first of all, is it the correct behavior of Cassandra, and if not,
what data we need to analyze for further investment.

BRs
Ares


Re: GCInspector works every 10 seconds!

2012-06-18 Thread Jason Tang
Hi

After I enable key cache and row cache, the problem gone, I guess it
because we have lots of data in SSTable, and it takes more time, memory and
cpu to search the data.

BRs
//Tang Weiqiang

2012/6/18 aaron morton aa...@thelastpickle.com

   It is also strange that although no data in Cassandra can fulfill the
 query conditions, but it takes more time if we have more data in Cassandra.


 These log messages:

 DEBUG [ReadStage:89] 2012-06-17 20:17:26,958 SliceQueryFilter.java (line
 123) collecting 0 of 5000:
 7fff0137f63408920e049c22:true:4@1339865451865018
 DEBUG [ReadStage:89] 2012-06-17 20:17:26,958 SliceQueryFilter.java (line
 123) collecting 0 of 5000:
 7fff0137f63408a0eeab052a:true:4@1339865451866000

 Say that the slice query read columns from the disk that were deleted.

 Have you tried your test with a clean (no files on disk) database ?


 Cheers

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 18/06/2012, at 12:36 AM, Jason Tang wrote:

 Hi

After I change log level to DEBUG, I found some log.

   Although we don't have traffic to Cassandra, but we have scheduled the
 task to perform the sliceQuery.

   We use time-stamp as the index, we will perform the query by every
 second to check if we have tasks to do.

   After 24 hours, we have 40G data in Cassandra, and we configure
 Casssandra as Max JVM Heap 6G, memtable 1G, disk_access_mode:
 mmap_index_only.

   It is also strange that although no data in Cassandra can fulfill the
 query conditions, but it takes more time if we have more data in Cassandra.

   Because we total have 20 million records in Cassandra which has time
 stamp as the index, and we query by MultigetSubSliceQuery, and set the
 range the value which not match any data in Cassnadra, So it suppose to
 return fast, but as we have 20 million data, it takes 2 seconds to get the
 query result.

   Is the GC caused by the scheduled query operation, and why it takes so
 many memory. Could we improve it?

 System.log:
  INFO [ScheduledTasks:1] 2012-06-17 20:17:13,574 GCInspector.java (line
 123) GC for ParNew: 559 ms for 1 collections, 3258240912 used; max
 is 6274678784
 DEBUG [ReadStage:99] 2012-06-17 20:17:25,563 SliceQueryFilter.java (line
 123) collecting 0 of 5000:
 0138ad1035880137f3372f3e0e28e3b6:false:36@1339815309124015
 DEBUG [ReadStage:99] 2012-06-17 20:17:25,565 ReadVerbHandler.java (line
 60) Read key 3331; sending response to 158060445@/192.168.0.3
 DEBUG [ReadStage:96] 2012-06-17 20:17:25,845 SliceQueryFilter.java (line
 123) collecting 0 of 5000:
 0138ad1035880137f33a80cf6cb5d383:false:36@1339815526613007
 DEBUG [ReadStage:96] 2012-06-17 20:17:25,847 ReadVerbHandler.java (line
 60) Read key 3233; sending response to 158060447@/192.168.0.3
 DEBUG [ReadStage:105] 2012-06-17 20:17:25,952 SliceQueryFilter.java (line
 123) collecting 0 of 5000:
 0138ad1035880137f330cd70c86690cd:false:36@1339814890872015
 DEBUG [ReadStage:105] 2012-06-17 20:17:25,953 ReadVerbHandler.java (line
 75) digest is d41d8cd98f00b204e9800998ecf8427e
 DEBUG [ReadStage:105] 2012-06-17 20:17:25,953 ReadVerbHandler.java (line
 60) Read key 3139; sending response to 158060448@/192.168.0.3
 DEBUG [ReadStage:89] 2012-06-17 20:17:25,959 CollationController.java
 (line 191) collectAllData
 DEBUG [ReadStage:108] 2012-06-17 20:17:25,959 CollationController.java
 (line 191) collectAllData
 DEBUG [ReadStage:107] 2012-06-17 20:17:25,959 CollationController.java
 (line 191) collectAllData
 DEBUG [ReadStage:89] 2012-06-17 20:17:26,958 SliceQueryFilter.java (line
 123) collecting 0 of 5000:
 7fff0137f63408920e049c22:true:4@1339865451865018
 DEBUG [ReadStage:89] 2012-06-17 20:17:26,958 SliceQueryFilter.java (line
 123) collecting 0 of 5000:
 7fff0137f63408a0eeab052a:true:4@1339865451866000
 DEBUG [ReadStage:89] 2012-06-17 20:17:26,959 SliceQueryFilter.java (line
 123) collecting 0 of 5000:
 7fff0137f63408b1319577c9:true:4@1339865451867003
 DEBUG [ReadStage:89] 2012-06-17 20:17:26,959 SliceQueryFilter.java (line
 123) collecting 0 of 5000:
 7fff0137f63408c081e0b8a3:true:4@1339865451867004
 DEBUG [ReadStage:89] 2012-06-17 20:17:26,959 SliceQueryFilter.java (line
 123) collecting 0 of 5000:
 7fff0137f6340deefb8a0627:true:4@1339865451920001
 DEBUG [ReadStage:89] 2012-06-17 20:17:26,959 SliceQueryFilter.java (line
 123) collecting 0 of 5000:
 7fff0137f6340df9c21e9979:true:4@1339865451923002
 DEBUG [ReadStage:89] 2012-06-17 20:17:26,959 SliceQueryFilter.java (line
 123) collecting 0 of 5000:
 7fff0137f6340e095ead1498:true:4@1339865451928000
 DEBUG [ReadStage:89] 2012-06-17 20:17:26,960 SliceQueryFilter.java (line
 123) collecting 0 of 5000:
 7fff0137f6340e1af16cf151:true:4@1339865451935000
 DEBUG [ReadStage:89] 2012-06-17 20:17:26,960 SliceQueryFilter.java (line
 123) collecting

GCInspector works every 10 seconds!

2012-06-17 Thread Jason Tang
Hi

   After running load testing for 24 hours(insert, update and delete), now
no new traffic to Cassandra, but Cassnadra shows still have high load(CPU
usage), from the system.log, it shows it always perform GC. I don't know
why it work as that, seems memory is not low.

Here is some configuration and log, where I can find the clue why Cassandra
works as this?

cassandra.yaml
disk_access_mode: mmap_index_only

#  /opt/cassandra/bin/nodetool -h 127.0.0.1 -p 6080 tpstats
Pool NameActive   Pending  Completed   Blocked  All
time blocked
ReadStage 0 045387558 0
0
RequestResponseStage  0 096568347 0
0
MutationStage0 060215102 0
0
ReadRepairStage0 0  0
0 0
ReplicateOnWriteStage   0 0  0   0
0
GossipStage  0 0 399012
 0 0
AntiEntropyStage   0 0  0
 0 0
MigrationStage   0 0 30
  0 0
MemtablePostFlusher 0 0 279 0
  0
StreamStage  0 0  0
  0 0
FlushWriter0 0 1846
  0  1052
MiscStage 0 0  0
 0 0
InternalResponseStage   0 0  00
0
HintedHandoff 0 0  5
 0 0

Message type   Dropped
RANGE_SLICE  0
READ_REPAIR  0
BINARY   0
READ 1
MUTATION  1390
REQUEST_RESPONSE 0


 # /opt/cassandra/bin/nodetool -h 127.0.0.1 -p 6080 info
Token: 56713727820156410577229101238628035242
Gossip active: true
Load : 37.57 GB
Generation No: 1339813956
Uptime (seconds) : 120556
Heap Memory (MB) : 3261.14 / 5984.00
Data Center  : datacenter1
Rack : rack1
Exceptions   : 0


 INFO [ScheduledTasks:1] 2012-06-17 19:47:36,633 GCInspector.java (line
123) GC for ParNew: 222 ms for 1 collections, 2046077640 used; max
is 6274678784
 INFO [ScheduledTasks:1] 2012-06-17 19:48:41,714 GCInspector.java (line
123) GC for ParNew: 262 ms for 1 collections, 2228128408 used; max
is 6274678784
 INFO [ScheduledTasks:1] 2012-06-17 19:48:49,717 GCInspector.java (line
123) GC for ParNew: 237 ms for 1 collections, 2390412728 used; max
is 6274678784
 INFO [ScheduledTasks:1] 2012-06-17 19:48:57,719 GCInspector.java (line
123) GC for ParNew: 223 ms for 1 collections, 2508702896 used; max
is 6274678784
 INFO [ScheduledTasks:1] 2012-06-17 19:50:01,988 GCInspector.java (line
123) GC for ParNew: 232 ms for 1 collections, 2864574832 used; max
is 6274678784
 INFO [ScheduledTasks:1] 2012-06-17 19:50:10,075 GCInspector.java (line
123) GC for ParNew: 208 ms for 1 collections, 2964629856 used; max
is 6274678784
 INFO [ScheduledTasks:1] 2012-06-17 19:50:21,078 GCInspector.java (line
123) GC for ParNew: 258 ms for 1 collections, 3149127368 used; max
is 6274678784
 INFO [ScheduledTasks:1] 2012-06-17 19:51:26,095 GCInspector.java (line
123) GC for ParNew: 213 ms for 1 collections, 3421495400 used; max
is 6274678784
 INFO [ScheduledTasks:1] 2012-06-17 19:51:34,097 GCInspector.java (line
123) GC for ParNew: 218 ms for 1 collections, 3543978312 used; max
is 6274678784
 INFO [ScheduledTasks:1] 2012-06-17 19:52:37,229 GCInspector.java (line
123) GC for ParNew: 221 ms for 1 collections, 375229 used; max
is 6274678784
 INFO [ScheduledTasks:1] 2012-06-17 19:52:37,230 GCInspector.java (line
123) GC for ConcurrentMarkSweep: 206 ms for 1 collections, 3752313400 used;
max is 6274678784
 INFO [ScheduledTasks:1] 2012-06-17 19:52:46,507 GCInspector.java (line
123) GC for ParNew: 243 ms for 1 collections, 3663162192 used; max
is 6274678784
 INFO [ScheduledTasks:1] 2012-06-17 19:52:54,510 GCInspector.java (line
123) GC for ParNew: 283 ms for 1 collections, 1582282248 used; max
is 6274678784
 INFO [ScheduledTasks:1] 2012-06-17 19:54:01,704 GCInspector.java (line
123) GC for ParNew: 235 ms for 1 collections, 1935534800 used; max
is 6274678784
 INFO [ScheduledTasks:1] 2012-06-17 19:55:13,747 GCInspector.java (line
123) GC for ParNew: 233 ms for 1 collections, 2356975504 used; max
is 6274678784
 INFO [ScheduledTasks:1] 2012-06-17 19:55:21,749 GCInspector.java (line
123) GC for ParNew: 264 ms for 1 collections, 2530976328 used; max
is 6274678784
 INFO [ScheduledTasks:1] 2012-06-17 19:55:29,794 GCInspector.java (line
123) GC for ParNew: 224 ms for 1 collections, 2592311336 used; max
is 6274678784


BRs
//Ares


Re: GCInspector works every 10 seconds!

2012-06-17 Thread Jason Tang
Hi

   After I change log level to DEBUG, I found some log.

  Although we don't have traffic to Cassandra, but we have scheduled the
task to perform the sliceQuery.

  We use time-stamp as the index, we will perform the query by every second
to check if we have tasks to do.

  After 24 hours, we have 40G data in Cassandra, and we configure
Casssandra as Max JVM Heap 6G, memtable 1G, disk_access_mode:
mmap_index_only.

  It is also strange that although no data in Cassandra can fulfill the
query conditions, but it takes more time if we have more data in Cassandra.

  Because we total have 20 million records in Cassandra which has time
stamp as the index, and we query by MultigetSubSliceQuery, and set the
range the value which not match any data in Cassnadra, So it suppose to
return fast, but as we have 20 million data, it takes 2 seconds to get the
query result.

  Is the GC caused by the scheduled query operation, and why it takes so
many memory. Could we improve it?

System.log:
 INFO [ScheduledTasks:1] 2012-06-17 20:17:13,574 GCInspector.java (line
123) GC for ParNew: 559 ms for 1 collections, 3258240912 used; max
is 6274678784
DEBUG [ReadStage:99] 2012-06-17 20:17:25,563 SliceQueryFilter.java (line
123) collecting 0 of 5000:
0138ad1035880137f3372f3e0e28e3b6:false:36@1339815309124015
DEBUG [ReadStage:99] 2012-06-17 20:17:25,565 ReadVerbHandler.java (line 60)
Read key 3331; sending response to 158060445@/192.168.0.3
DEBUG [ReadStage:96] 2012-06-17 20:17:25,845 SliceQueryFilter.java (line
123) collecting 0 of 5000:
0138ad1035880137f33a80cf6cb5d383:false:36@1339815526613007
DEBUG [ReadStage:96] 2012-06-17 20:17:25,847 ReadVerbHandler.java (line 60)
Read key 3233; sending response to 158060447@/192.168.0.3
DEBUG [ReadStage:105] 2012-06-17 20:17:25,952 SliceQueryFilter.java (line
123) collecting 0 of 5000:
0138ad1035880137f330cd70c86690cd:false:36@1339814890872015
DEBUG [ReadStage:105] 2012-06-17 20:17:25,953 ReadVerbHandler.java (line
75) digest is d41d8cd98f00b204e9800998ecf8427e
DEBUG [ReadStage:105] 2012-06-17 20:17:25,953 ReadVerbHandler.java (line
60) Read key 3139; sending response to 158060448@/192.168.0.3
DEBUG [ReadStage:89] 2012-06-17 20:17:25,959 CollationController.java (line
191) collectAllData
DEBUG [ReadStage:108] 2012-06-17 20:17:25,959 CollationController.java
(line 191) collectAllData
DEBUG [ReadStage:107] 2012-06-17 20:17:25,959 CollationController.java
(line 191) collectAllData
DEBUG [ReadStage:89] 2012-06-17 20:17:26,958 SliceQueryFilter.java (line
123) collecting 0 of 5000:
7fff0137f63408920e049c22:true:4@1339865451865018
DEBUG [ReadStage:89] 2012-06-17 20:17:26,958 SliceQueryFilter.java (line
123) collecting 0 of 5000:
7fff0137f63408a0eeab052a:true:4@1339865451866000
DEBUG [ReadStage:89] 2012-06-17 20:17:26,959 SliceQueryFilter.java (line
123) collecting 0 of 5000:
7fff0137f63408b1319577c9:true:4@1339865451867003
DEBUG [ReadStage:89] 2012-06-17 20:17:26,959 SliceQueryFilter.java (line
123) collecting 0 of 5000:
7fff0137f63408c081e0b8a3:true:4@1339865451867004
DEBUG [ReadStage:89] 2012-06-17 20:17:26,959 SliceQueryFilter.java (line
123) collecting 0 of 5000:
7fff0137f6340deefb8a0627:true:4@1339865451920001
DEBUG [ReadStage:89] 2012-06-17 20:17:26,959 SliceQueryFilter.java (line
123) collecting 0 of 5000:
7fff0137f6340df9c21e9979:true:4@1339865451923002
DEBUG [ReadStage:89] 2012-06-17 20:17:26,959 SliceQueryFilter.java (line
123) collecting 0 of 5000:
7fff0137f6340e095ead1498:true:4@1339865451928000
DEBUG [ReadStage:89] 2012-06-17 20:17:26,960 SliceQueryFilter.java (line
123) collecting 0 of 5000:
7fff0137f6340e1af16cf151:true:4@1339865451935000
DEBUG [ReadStage:89] 2012-06-17 20:17:26,960 SliceQueryFilter.java (line
123) collecting 0 of 5000:
7fff0137f6340e396cfdc9fa:true:4@133986545195


BRs
//Ares

2012/6/17 Jason Tang ares.t...@gmail.com

 Hi

After running load testing for 24 hours(insert, update and delete), now
 no new traffic to Cassandra, but Cassnadra shows still have high load(CPU
 usage), from the system.log, it shows it always perform GC. I don't know
 why it work as that, seems memory is not low.

 Here is some configuration and log, where I can find the clue why
 Cassandra works as this?

 cassandra.yaml
 disk_access_mode: mmap_index_only

  #  /opt/cassandra/bin/nodetool -h 127.0.0.1 -p 6080 tpstats
 Pool NameActive   Pending  Completed   Blocked
  All time blocked
 ReadStage 0 045387558
 0 0
 RequestResponseStage  0 096568347 0
   0
 MutationStage0 060215102 0
 0
 ReadRepairStage0 0  0
   0 0
 ReplicateOnWriteStage   0 0  0

Re: Much more native memory used by Cassandra then the configured JVM heap size

2012-06-13 Thread Jason Tang
We suppose the cached memory will be released by OS, but from /proc/meminfo
, the cached memory is in Active status, so I am not sure if it will be
release by OS.

And for low memory, because we found Unable to reduce heap usage since
there are no dirty column families in system.log, and then Cassandra on
this node marked as down.

And because we configure JVM heap 6G and memtable 1G, so I don't know why
we have OOMs error.
So we wonder the Cassandra down caused by

   1. Low OS memory
   2. impact by our configuration: memtable_flush_writers=32,
   memtable_flush_queue_size=12
   3. Caused by delete operation (The data in our traffic is dynamical,
   which means each request may be deleted in one hour, new will be inserted)
   https://issues.apache.org/jira/browse/CASSANDRA-3741

So we want to find out why the Cassandra down after 24 hours load test.
(RCA of OOM)

2012/6/12 aaron morton aa...@thelastpickle.com

 see http://wiki.apache.org/cassandra/FAQ#mmap

  which cause the OS low memory.

 If the memory is used for mmapped access the os can get it back later.

 Is the low free memory causing a problem ?

 Cheers


   -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 12/06/2012, at 5:52 PM, Jason Tang wrote:

 Hi

 I found some information of this issue
 And seems we can have other strategy for data access to reduce mmap usage,
 in order to use less memory.

 But I didn't find the document to describe the parameters for Cassandra
 1.x, is it a good way to use this parameter to reduce shared memory usage
 and what's the impact? (btw, our data model is dynamical, which means the
 although the through put is high, but the life cycle of the data is short,
 one hour or less).

 
 # Choices are auto, standard, mmap, and mmap_index_only.
 disk_access_mode: auto
 

 http://comments.gmane.org/gmane.comp.db.cassandra.user/7390

 2012/6/12 Jason Tang ares.t...@gmail.com

 See my post, I limit the HVM heap 6G, but actually Cassandra will use
 more memory which is not calculated in JVM heap.

 I use top to monitor total memory used by Cassandra.

 =
 -Xms6G -Xmx6G -Xmn1600M

 2012/6/12 Jeffrey Kesselman jef...@gmail.com

 Btw.  I suggest you spin up JConsole as it will give you much more detai
 kon what your VM is actually doing.



 On Mon, Jun 11, 2012 at 9:14 PM, Jason Tang ares.t...@gmail.com wrote:

 Hi

  We have some problem with Cassandra memory usage, we configure the
 JVM HEAP 6G, but after runing Cassandra for several hours (insert, update,
 delete). The total memory used by Cassandra go up to 15G, which cause the
 OS low memory.
  So I wonder if it is normal to have so many memory used by cassandra?

 And how to limit the native memory used by Cassandra?


 ===
 Cassandra 1.0.3, 64 bit jdk.

 Memory ocupied by Cassandra 15G
   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
  9567 casadm20   0 28.3g  15g 9.1g S  269 65.1 385:57.65 java

 =
 -Xms6G -Xmx6G -Xmn1600M

  # ps -ef | grep  9567
 casadm9567 1 55 Jun11 ?05:59:44
 /opt/jdk1.6.0_29/bin/java -ea
 -javaagent:/opt/dve/cassandra/bin/../lib/jamm-0.2.5.jar
 -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms6G -Xmx6G
 -Xmn1600M -XX:+HeapDumpOnOutOfMemoryError -Xss128k -XX:+UseParNewGC
 -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75
 -XX:+UseCMSInitiatingOccupancyOnly -Djava.net.preferIPv4Stack=true
 -Dcom.sun.management.jmxremote.port=6080
 -Dcom.sun.management.jmxremote.ssl=false
 -Dcom.sun.management.jmxremote.authenticate=false
 -Daccess.properties=/opt/dve/cassandra/conf/access.properties
 -Dpasswd.properties=/opt/dve/cassandra/conf/passwd.properties
 -Dpasswd.mode=MD5 -Dlog4j.configuration=log4j-server.properties
 -Dlog4j.defaultInitOverride=true -cp
 /opt/dve/cassandra/bin/../conf:/opt/dve/cassandra/bin/../build/classes/main:/opt/dve/cassandra/bin/../build/classes/thrift:/opt/dve/cassandra/bin/../lib/Cassandra-Extensions-1.0.0.jar:/opt/dve/cassandra/bin/../lib/antlr-3.2.jar:/opt/dve/cassandra/bin/../lib/apache-cassandra-1.0.3.jar:/opt/dve/cassandra/bin/../lib/apache-cassandra-clientutil-1.0.3.jar:/opt/dve/cassandra/bin/../lib/apache-cassandra-thrift-1.0.3.jar:/opt/dve/cassandra/bin/../lib/avro-1.4.0-fixes.jar:/opt/dve/cassandra/bin/../lib/avro-1.4.0-sources-fixes.jar:/opt/dve/cassandra/bin/../lib/commons-cli-1.1.jar:/opt/dve/cassandra/bin/../lib/commons-codec-1.2.jar:/opt/dve/cassandra/bin/../lib/commons-lang-2.4.jar:/opt/dve/cassandra/bin/../lib/compress-lzf-0.8.4.jar:/opt/dve/cassandra/bin/../lib/concurrentlinkedhashmap-lru-1.2.jar:/opt/dve/cassandra/bin/../lib/guava-r08.jar:/opt/dve/cassandra/bin/../lib/high-scale-lib-1.1.2.jar:/opt/dve/cassandra/bin/../lib/jackson-core-asl-1.4.0.jar:/opt/dve/cassandra/bin/../lib/jackson-mapper-asl-1.4.0.jar:/opt/dve/cassandra/bin/../lib

Re: Much more native memory used by Cassandra then the configured JVM heap size

2012-06-11 Thread Jason Tang
See my post, I limit the HVM heap 6G, but actually Cassandra will use more
memory which is not calculated in JVM heap.

I use top to monitor total memory used by Cassandra.

=
-Xms6G -Xmx6G -Xmn1600M

2012/6/12 Jeffrey Kesselman jef...@gmail.com

 Btw.  I suggest you spin up JConsole as it will give you much more detai
 kon what your VM is actually doing.



 On Mon, Jun 11, 2012 at 9:14 PM, Jason Tang ares.t...@gmail.com wrote:

 Hi

 We have some problem with Cassandra memory usage, we configure the JVM
 HEAP 6G, but after runing Cassandra for several hours (insert, update,
 delete). The total memory used by Cassandra go up to 15G, which cause the
 OS low memory.
  So I wonder if it is normal to have so many memory used by cassandra?

 And how to limit the native memory used by Cassandra?


 ===
 Cassandra 1.0.3, 64 bit jdk.

 Memory ocupied by Cassandra 15G
   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
  9567 casadm20   0 28.3g  15g 9.1g S  269 65.1 385:57.65 java

 =
 -Xms6G -Xmx6G -Xmn1600M

  # ps -ef | grep  9567
 casadm9567 1 55 Jun11 ?05:59:44 /opt/jdk1.6.0_29/bin/java
 -ea -javaagent:/opt/dve/cassandra/bin/../lib/jamm-0.2.5.jar
 -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms6G -Xmx6G
 -Xmn1600M -XX:+HeapDumpOnOutOfMemoryError -Xss128k -XX:+UseParNewGC
 -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75
 -XX:+UseCMSInitiatingOccupancyOnly -Djava.net.preferIPv4Stack=true
 -Dcom.sun.management.jmxremote.port=6080
 -Dcom.sun.management.jmxremote.ssl=false
 -Dcom.sun.management.jmxremote.authenticate=false
 -Daccess.properties=/opt/dve/cassandra/conf/access.properties
 -Dpasswd.properties=/opt/dve/cassandra/conf/passwd.properties
 -Dpasswd.mode=MD5 -Dlog4j.configuration=log4j-server.properties
 -Dlog4j.defaultInitOverride=true -cp
 /opt/dve/cassandra/bin/../conf:/opt/dve/cassandra/bin/../build/classes/main:/opt/dve/cassandra/bin/../build/classes/thrift:/opt/dve/cassandra/bin/../lib/Cassandra-Extensions-1.0.0.jar:/opt/dve/cassandra/bin/../lib/antlr-3.2.jar:/opt/dve/cassandra/bin/../lib/apache-cassandra-1.0.3.jar:/opt/dve/cassandra/bin/../lib/apache-cassandra-clientutil-1.0.3.jar:/opt/dve/cassandra/bin/../lib/apache-cassandra-thrift-1.0.3.jar:/opt/dve/cassandra/bin/../lib/avro-1.4.0-fixes.jar:/opt/dve/cassandra/bin/../lib/avro-1.4.0-sources-fixes.jar:/opt/dve/cassandra/bin/../lib/commons-cli-1.1.jar:/opt/dve/cassandra/bin/../lib/commons-codec-1.2.jar:/opt/dve/cassandra/bin/../lib/commons-lang-2.4.jar:/opt/dve/cassandra/bin/../lib/compress-lzf-0.8.4.jar:/opt/dve/cassandra/bin/../lib/concurrentlinkedhashmap-lru-1.2.jar:/opt/dve/cassandra/bin/../lib/guava-r08.jar:/opt/dve/cassandra/bin/../lib/high-scale-lib-1.1.2.jar:/opt/dve/cassandra/bin/../lib/jackson-core-asl-1.4.0.jar:/opt/dve/cassandra/bin/../lib/jackson-mapper-asl-1.4.0.jar:/opt/dve/cassandra/bin/../lib/jamm-0.2.5.jar:/opt/dve/cassandra/bin/../lib/jline-0.9.94.jar:/opt/dve/cassandra/bin/../lib/json-simple-1.1.jar:/opt/dve/cassandra/bin/../lib/libthrift-0.6.jar:/opt/dve/cassandra/bin/../lib/log4j-1.2.16.jar:/opt/dve/cassandra/bin/../lib/servlet-api-2.5-20081211.jar:/opt/dve/cassandra/bin/../lib/slf4j-api-1.6.1.jar:/opt/dve/cassandra/bin/../lib/slf4j-log4j12-1.6.1.jar:/opt/dve/cassandra/bin/../lib/snakeyaml-1.6.jar:/opt/dve/cassandra/bin/../lib/snappy-java-1.0.4.1.jar
 org.apache.cassandra.thrift.CassandraDaemon

 ==
 # nodetool -h 127.0.0.1 -p 6080 info
 Token: 85070591730234615865843651857942052864
 Gossip active: true
 Load : 20.59 GB
 Generation No: 1339423322
 Uptime (seconds) : 39626
 Heap Memory (MB) : 3418.42 / 5984.00
 Data Center  : datacenter1
 Rack : rack1
 Exceptions   : 0

 =
 All row cache and key cache are disabled by default

 Key cache: disabled
 Row cache: disabled


 ==

 # pmap 9567
 9567: java
 START   SIZE RSS PSS   DIRTYSWAP PERM MAPPING
 4000 36K 36K 36K  0K  0K r-xp
 /opt/jdk1.6.0_29/bin/java
 40108000  8K  8K  8K  8K  0K rwxp
 /opt/jdk1.6.0_29/bin/java
 4010a000  18040K  17988K  17988K  17988K  0K rwxp [heap]
 00067ae0 6326700K 6258664K 6258664K 6258664K  0K rwxp [anon]
 0007fd06b000  48724K  0K  0K  0K  0K rwxp [anon]
 7fbed153 1331104K  0K  0K  0K  0K r-xs
 /var/cassandra/data/drc/queue-hb-219-Data.db
 7fbf22918000 2097152K  0K  0K  0K  0K r-xs
 /var/cassandra/data/drc/queue-hb-219-Data.db
 7fbfa2918000 2097148K 1124464K 1124462K  0K  0K r-xs
 /var/cassandra/data/drc/queue-hb-219-Data.db
 7fc022917000 2097156K 2096496K 2096492K  0K  0K r

Re: Much more native memory used by Cassandra then the configured JVM heap size

2012-06-11 Thread Jason Tang
Hi

I found some information of this issue
And seems we can have other strategy for data access to reduce mmap usage,
in order to use less memory.

But I didn't find the document to describe the parameters for Cassandra
1.x, is it a good way to use this parameter to reduce shared memory usage
and what's the impact? (btw, our data model is dynamical, which means the
although the through put is high, but the life cycle of the data is short,
one hour or less).


# Choices are auto, standard, mmap, and mmap_index_only.
disk_access_mode: auto


http://comments.gmane.org/gmane.comp.db.cassandra.user/7390

2012/6/12 Jason Tang ares.t...@gmail.com

 See my post, I limit the HVM heap 6G, but actually Cassandra will use more
 memory which is not calculated in JVM heap.

 I use top to monitor total memory used by Cassandra.

 =
 -Xms6G -Xmx6G -Xmn1600M

 2012/6/12 Jeffrey Kesselman jef...@gmail.com

 Btw.  I suggest you spin up JConsole as it will give you much more detai
 kon what your VM is actually doing.



 On Mon, Jun 11, 2012 at 9:14 PM, Jason Tang ares.t...@gmail.com wrote:

 Hi

  We have some problem with Cassandra memory usage, we configure the JVM
 HEAP 6G, but after runing Cassandra for several hours (insert, update,
 delete). The total memory used by Cassandra go up to 15G, which cause the
 OS low memory.
  So I wonder if it is normal to have so many memory used by cassandra?

 And how to limit the native memory used by Cassandra?


 ===
 Cassandra 1.0.3, 64 bit jdk.

 Memory ocupied by Cassandra 15G
   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
  9567 casadm20   0 28.3g  15g 9.1g S  269 65.1 385:57.65 java

 =
 -Xms6G -Xmx6G -Xmn1600M

  # ps -ef | grep  9567
 casadm9567 1 55 Jun11 ?05:59:44
 /opt/jdk1.6.0_29/bin/java -ea
 -javaagent:/opt/dve/cassandra/bin/../lib/jamm-0.2.5.jar
 -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms6G -Xmx6G
 -Xmn1600M -XX:+HeapDumpOnOutOfMemoryError -Xss128k -XX:+UseParNewGC
 -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75
 -XX:+UseCMSInitiatingOccupancyOnly -Djava.net.preferIPv4Stack=true
 -Dcom.sun.management.jmxremote.port=6080
 -Dcom.sun.management.jmxremote.ssl=false
 -Dcom.sun.management.jmxremote.authenticate=false
 -Daccess.properties=/opt/dve/cassandra/conf/access.properties
 -Dpasswd.properties=/opt/dve/cassandra/conf/passwd.properties
 -Dpasswd.mode=MD5 -Dlog4j.configuration=log4j-server.properties
 -Dlog4j.defaultInitOverride=true -cp
 /opt/dve/cassandra/bin/../conf:/opt/dve/cassandra/bin/../build/classes/main:/opt/dve/cassandra/bin/../build/classes/thrift:/opt/dve/cassandra/bin/../lib/Cassandra-Extensions-1.0.0.jar:/opt/dve/cassandra/bin/../lib/antlr-3.2.jar:/opt/dve/cassandra/bin/../lib/apache-cassandra-1.0.3.jar:/opt/dve/cassandra/bin/../lib/apache-cassandra-clientutil-1.0.3.jar:/opt/dve/cassandra/bin/../lib/apache-cassandra-thrift-1.0.3.jar:/opt/dve/cassandra/bin/../lib/avro-1.4.0-fixes.jar:/opt/dve/cassandra/bin/../lib/avro-1.4.0-sources-fixes.jar:/opt/dve/cassandra/bin/../lib/commons-cli-1.1.jar:/opt/dve/cassandra/bin/../lib/commons-codec-1.2.jar:/opt/dve/cassandra/bin/../lib/commons-lang-2.4.jar:/opt/dve/cassandra/bin/../lib/compress-lzf-0.8.4.jar:/opt/dve/cassandra/bin/../lib/concurrentlinkedhashmap-lru-1.2.jar:/opt/dve/cassandra/bin/../lib/guava-r08.jar:/opt/dve/cassandra/bin/../lib/high-scale-lib-1.1.2.jar:/opt/dve/cassandra/bin/../lib/jackson-core-asl-1.4.0.jar:/opt/dve/cassandra/bin/../lib/jackson-mapper-asl-1.4.0.jar:/opt/dve/cassandra/bin/../lib/jamm-0.2.5.jar:/opt/dve/cassandra/bin/../lib/jline-0.9.94.jar:/opt/dve/cassandra/bin/../lib/json-simple-1.1.jar:/opt/dve/cassandra/bin/../lib/libthrift-0.6.jar:/opt/dve/cassandra/bin/../lib/log4j-1.2.16.jar:/opt/dve/cassandra/bin/../lib/servlet-api-2.5-20081211.jar:/opt/dve/cassandra/bin/../lib/slf4j-api-1.6.1.jar:/opt/dve/cassandra/bin/../lib/slf4j-log4j12-1.6.1.jar:/opt/dve/cassandra/bin/../lib/snakeyaml-1.6.jar:/opt/dve/cassandra/bin/../lib/snappy-java-1.0.4.1.jar
 org.apache.cassandra.thrift.CassandraDaemon

 ==
 # nodetool -h 127.0.0.1 -p 6080 info
 Token: 85070591730234615865843651857942052864
 Gossip active: true
 Load : 20.59 GB
 Generation No: 1339423322
 Uptime (seconds) : 39626
 Heap Memory (MB) : 3418.42 / 5984.00
 Data Center  : datacenter1
 Rack : rack1
 Exceptions   : 0

 =
 All row cache and key cache are disabled by default

 Key cache: disabled
 Row cache: disabled


 ==

 # pmap 9567
 9567: java
 START   SIZE RSS PSS   DIRTYSWAP PERM MAPPING
 4000 36K 36K 36K  0K  0K r-xp
 /opt/jdk1.6.0_29/bin/java
 40108000  8K  8K  8K

TimedOutException caused by Stop the world activity

2012-05-27 Thread Jason Tang
Hi

My system is 4 nodes 64 bit cassandra cluster, 6G big per node,default
configuration (which means 1/3 heap for memtable), replicate number 3,
write all, read one.
When I run stress load testing, I got this TimedOutException, and some
operation failed, and all traffic hang for a while.

And when I have 1G memory 32 bit cassandra on standalone model, I didn't
find so frequently Stop the world behavior.

So I wonder what kind of operation will hang the cassandra system.

How to collect information for tuning.

From the system log and document, I guess there are three type operations:
1) Flush memtable when meet max size
2) Compact SSTable (why?)
3) Java GC

system.log:
 INFO [main] 2012-05-25 16:12:17,054 ColumnFamilyStore.java (line 688)
Enqueuing flush of Memtable-LocationInfo@1229893321(53/66 serialized/live
bytes, 2 ops)
 INFO [FlushWriter:1] 2012-05-25 16:12:17,054 Memtable.java (line 239)
Writing Memtable-LocationInfo@1229893321(53/66 serialized/live bytes, 2 ops)
 INFO [FlushWriter:1] 2012-05-25 16:12:17,166 Memtable.java (line 275)
Completed flushing
/var/proclog/raw/cassandra/data/system/LocationInfo-hb-2-Data.db (163 bytes)
...

 INFO [CompactionExecutor:441] 2012-05-28 08:02:55,345 CompactionTask.java
(line 112) Compacting
[SSTableReader(path='/var/proclog/raw/cassandra/data/myks/queue-hb-41-Data.db'),
SSTableReader(path='/var/proclog/raw/cassandra/data/
myks /queue-hb-32-Data.db'),
SSTableReader(path='/var/proclog/raw/cassandra/data/
myks /queue-hb-37-Data.db'),
SSTableReader(path='/var/proclog/raw/cassandra/data/
myks /queue-hb-53-Data.db')]
...

 WARN [ScheduledTasks:1] 2012-05-28 08:02:26,619 GCInspector.java (line
146) Heap is 0.7993011015621736 full.  You may need to reduce memtable
and/or cache sizes.  Cassandra will now flush up to the two largest
memtables to free up memory.  Adjust flush_largest_memtables_at threshold
in cassandra.yaml if you don't want Cassandra to do this automatically
 INFO [ScheduledTasks:1] 2012-05-28 08:02:54,980 GCInspector.java (line
123) GC for ConcurrentMarkSweep: 728 ms for 2 collections, 3594946600 used;
max is 6274678784
 INFO [ScheduledTasks:1] 2012-05-28 08:41:34,030 GCInspector.java (line
123) GC for ParNew: 1668 ms for 1 collections, 4171503448 used; max is
6274678784
 INFO [ScheduledTasks:1] 2012-05-28 08:41:48,978 GCInspector.java (line
123) GC for ParNew: 1087 ms for 1 collections, 2623067496 used; max is
6274678784
 INFO [ScheduledTasks:1] 2012-05-28 08:41:48,987 GCInspector.java (line
123) GC for ConcurrentMarkSweep: 3198 ms for 3 collections, 2623361280
used; max is 6274678784


Timeout Exception:
Caused by: org.apache.cassandra.thrift.TimedOutException: null
at
org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.java:19495)
~[na:na]
at
org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:1035)
~[na:na]
at
org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:1009)
~[na:na]
at
me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceServiceImpl.java:95)
~[na:na]
... 64 common frames omitted

BRs
//Tang Weiqiang


Re: Cassandra search performance

2012-05-12 Thread Jason Tang
I try to search one column, this column store the time as the type Long,
1,000,000 data equally distributed in 24 hours, I only want to search
certain time rang, eg from 01:30 to 01:50 or 08:00 to 12:00, but something
stranger happened.

Search 00:00 to 23:59 limit 100
It took less then 1 second scan 100 record

Search 00:00 to 00:20 limit 100
It took more then one minute scan around 2,400 recods

So the result shows it seems cassandra scan one by one to match the
condition, and the data is not ordered in sequence.

One more thing, to have equal condition, I make a redundant column to have
equal condition, the value is same for all records.
The search condition like get record where equal='equal' and time  00:00
and time  00:20

Is it the expected behavior of secondary index or I didn't use it correct.

Because I used to have another test, I have one string column most of it is
string 'true' and I add 100 'false' among 1,000,000 'true' , it shows it
only scan 100 records.

So how can I exam what happened inside cassadra, and where I can find out
the detail of how secondary works?

在 2012年5月8日星期二,Maxim Potekhin 写道:

  Thanks for the comments, much appreciated.

 Maxim


 On 5/7/2012 3:22 AM, David Jeske wrote:

 On Sun, Apr 29, 2012 at 4:32 PM, Maxim Potekhin 
 potek...@bnl.govjavascript:_e({}, 'cvml', 'potek...@bnl.gov');
  wrote:

 Looking at your example,as I think you understand, you forgo indexes by
 combining two conditions in one query, thinking along the lines of what is
 often done in RDBMS. A scan is expected in this case, and there is no
 magic to avoid it.


  This sounds like a mis-understanding of how RDBMSs work. If you combine
 two conditions in a single SQL query, the SQL execution optimizer looks at
 the cardinality of any indicies. If it can successfully predict that one of
 the conditions significantly reduces the set of rows that would be
 considered (such as a status match having 200 hits vs 1M rows in the
 table), then it selects this index for the first-iteration, and each index
 hit causes a record lookup which is then tested for the other conditions.
  (This is one of several query-execution types RDBMS systems use)

  I'm no Cassandra expert, so I don't know what it does WRT
 index-selection, but from the page written on secondary indicies, it seems
 like if you just query on status, and do the other filtering yourself it'll
 probably do what you want...

  http://www.datastax.com/dev/blog/whats-new-cassandra-07-secondary-indexes


  However, if this query is important, you can easily index on two
 conditions,
 using a composite type (look it up), or string concatenation for quick and
 easy solution.


  This is not necessarily a good idea. Creating a composite index explodes
 the index size unnecessarily. If a condition can reduce a query to 200
 records, there is no need to have a composite index including another
 condition.





Cassandra search performance

2012-04-25 Thread Jason Tang
Hi

   We have the such CF, and use secondary index to search for simple data
status, and among 1,000,000 row records, we have 200 records with status
we want.

  But when we start to search, the performance is very poor, and check with
the command ./bin/nodetool -h localhost -p 8199 cfstats , Cassandra read
1,000,000 records, and Read Latency is 0.2 ms, so totally it used 200
seconds.

  It use lots of CPU, and check the stack, all thread in Cassandra is read
from socket.

  So I wonder, how to really use index to find the 200 records instead of
scan all rows. (Supper Column?)

*ColumnFamily: queue*
*  Key Validation Class: org.apache.cassandra.db.marshal.BytesType*
*  Default column value validator:
org.apache.cassandra.db.marshal.BytesType*
*  Columns sorted by: org.apache.cassandra.db.marshal.BytesType*
*  Row cache size / save period in seconds / keys to save : 0.0/0/all*
*  Row Cache Provider:
org.apache.cassandra.cache.ConcurrentLinkedHashCacheProvider*
*  Key cache size / save period in seconds: 0.0/0*
*  GC grace seconds: 0*
*  Compaction min/max thresholds: 4/32*
*  Read repair chance: 0.0*
*  Replicate on write: false*
*  Bloom Filter FP chance: default*
*  Built indexes: [queue.idxStatus]*
*  Column Metadata:*
*Column Name: status (737461747573)*
*  Validation Class: org.apache.cassandra.db.marshal.AsciiType*
*  Index Name: idxStatus*
*  Index Type: KEYS*
*
*
BRs
//Jason


Re: Cassandra search performance

2012-04-25 Thread Jason Tang
And I found, if I only have the search condition status, it only scan 200
records.

But if I combine another condition partition then it scan all records
because partition condition match all records.

But combine with other condition such as userName, even all userName is
same in the 1,000,000 records, it only scan 200 records.

So it impacted by scan execution plan, if we have several search
conditions, how it works? Do we have the similar execution plan in
Cassandra?


在 2012年4月25日 下午9:18,Jason Tang ares.t...@gmail.com写道:

 Hi

We have the such CF, and use secondary index to search for simple data
 status, and among 1,000,000 row records, we have 200 records with status
 we want.

   But when we start to search, the performance is very poor, and check
 with the command ./bin/nodetool -h localhost -p 8199 cfstats , Cassandra
 read 1,000,000 records, and Read Latency is 0.2 ms, so totally it used
 200 seconds.

   It use lots of CPU, and check the stack, all thread in Cassandra is read
 from socket.

   So I wonder, how to really use index to find the 200 records instead of
 scan all rows. (Supper Column?)

 *ColumnFamily: queue*
 *  Key Validation Class: org.apache.cassandra.db.marshal.BytesType*
 *  Default column value validator:
 org.apache.cassandra.db.marshal.BytesType*
 *  Columns sorted by: org.apache.cassandra.db.marshal.BytesType*
 *  Row cache size / save period in seconds / keys to save : 0.0/0/all*
 *  Row Cache Provider:
 org.apache.cassandra.cache.ConcurrentLinkedHashCacheProvider*
 *  Key cache size / save period in seconds: 0.0/0*
 *  GC grace seconds: 0*
 *  Compaction min/max thresholds: 4/32*
 *  Read repair chance: 0.0*
 *  Replicate on write: false*
 *  Bloom Filter FP chance: default*
 *  Built indexes: [queue.idxStatus]*
 *  Column Metadata:*
 *Column Name: status (737461747573)*
 *  Validation Class: org.apache.cassandra.db.marshal.AsciiType*
 *  Index Name: idxStatus*
 *  Index Type: KEYS*
 *
 *
 BRs
 //Jason



Re: Cassandra search performance

2012-04-25 Thread Jason Tang
1.0.8

在 2012年4月25日 下午10:38,Philip Shon philip.s...@gmail.com写道:

 what version of cassandra are you using.  I found a big performance hit
 when querying on the secondary index.

 I came across this bug in versions prior to 1.1

 https://issues.apache.org/jira/browse/CASSANDRA-3545

 Hope that helps.

 2012/4/25 Jason Tang ares.t...@gmail.com

 And I found, if I only have the search condition status, it only scan
 200 records.

 But if I combine another condition partition then it scan all records
 because partition condition match all records.

 But combine with other condition such as userName, even all userName
 is same in the 1,000,000 records, it only scan 200 records.

 So it impacted by scan execution plan, if we have several search
 conditions, how it works? Do we have the similar execution plan in
 Cassandra?


 在 2012年4月25日 下午9:18,Jason Tang ares.t...@gmail.com写道:

 Hi

We have the such CF, and use secondary index to search for simple
 data status, and among 1,000,000 row records, we have 200 records with
 status we want.

   But when we start to search, the performance is very poor, and check
 with the command ./bin/nodetool -h localhost -p 8199 cfstats , Cassandra
 read 1,000,000 records, and Read Latency is 0.2 ms, so totally it used
 200 seconds.

   It use lots of CPU, and check the stack, all thread in Cassandra is
 read from socket.

   So I wonder, how to really use index to find the 200 records instead
 of scan all rows. (Supper Column?)

 *ColumnFamily: queue*
 *  Key Validation Class: org.apache.cassandra.db.marshal.BytesType*
 *  Default column value validator:
 org.apache.cassandra.db.marshal.BytesType*
 *  Columns sorted by: org.apache.cassandra.db.marshal.BytesType*
 *  Row cache size / save period in seconds / keys to save :
 0.0/0/all*
 *  Row Cache Provider:
 org.apache.cassandra.cache.ConcurrentLinkedHashCacheProvider*
 *  Key cache size / save period in seconds: 0.0/0*
 *  GC grace seconds: 0*
 *  Compaction min/max thresholds: 4/32*
 *  Read repair chance: 0.0*
 *  Replicate on write: false*
 *  Bloom Filter FP chance: default*
 *  Built indexes: [queue.idxStatus]*
 *  Column Metadata:*
 *Column Name: status (737461747573)*
 *  Validation Class: org.apache.cassandra.db.marshal.AsciiType*
 *  Index Name: idxStatus*
 *  Index Type: KEYS*
 *
 *
 BRs
  //Jason






Consistence for node shutdown and startup

2011-12-11 Thread Jason Tang
Hi

   Here is the case, if we have only two nodes, which share the data (write
one, read one),
   node One  node Two
|  Stopped Continue working and update the
data.
|  stopped  stopped
|  start working   stopped
|  update data stopped
|  startedstart working
v

 How about the conflict data when the two node on line separately. How it
synchronized by two nodes when they both on line finally?

BRs
//Tang Weiqiang