Any better solution to avoid TombstoneOverwhelmingException?
Our application will use Cassandra to persistent for asynchronous tasks, so in one time period, lots of records will be created in Cassandra (more then 10M). Later it will be executed. Due to disk space limitation, the executed records will be deleted. After gc_grace_seconds, it is expected to be auto removed from the disk. So for the next round of execution, the deleted records, should not be queried out. In this traffic, it will be generated lots of tombstones. To avoid TombstoneOverwhelmingException, One way is to larger tombstone_failure_threshold, but is there any impact for the system's performance on my traffic model, or is there any better solution for this traffic? BRs //Tang
Re: Any better solution to avoid TombstoneOverwhelmingException?
The traffic is continuously, which means when insert new records, at the same time, old records are executed (deleted) And the execution are based on time condition, so some stored records will be executed (deleted), some will be executed in the next round. For given TTL, it is same as delete, it will also generate the Tombstone. 2014-06-30 15:58 GMT+08:00 DuyHai Doan doanduy...@gmail.com: Why don't you store all current data into one partition and for the next round of execution, switch to a new partition ? This way you don't even need to remove data (if you insert with a given TTL) On Mon, Jun 30, 2014 at 8:43 AM, Jason Tang ares.t...@gmail.com wrote: Our application will use Cassandra to persistent for asynchronous tasks, so in one time period, lots of records will be created in Cassandra (more then 10M). Later it will be executed. Due to disk space limitation, the executed records will be deleted. After gc_grace_seconds, it is expected to be auto removed from the disk. So for the next round of execution, the deleted records, should not be queried out. In this traffic, it will be generated lots of tombstones. To avoid TombstoneOverwhelmingException, One way is to larger tombstone_failure_threshold, but is there any impact for the system's performance on my traffic model, or is there any better solution for this traffic? BRs //Tang
Re: heap issues - looking for advices on gc tuning
What's configuration of following parameters memtable_flush_queue_size: concurrent_compactors: 2013/10/30 Piavlo lolitus...@gmail.com Hi, Below I try to give a full picture to the problem I'm facing. This is a 12 node cluster, running on ec2 with m2.xlarge instances (17G ram , 2 cpus). Cassandra version is 1.0.8 Cluster normally having between 3000 - 1500 reads per second (depends on time of the day) and 1700 - 800 writes per second- according to Opscetner. RF=3, now row caches are used. Memory relevant configs from cassandra.yaml: flush_largest_memtables_at: 0.85 reduce_cache_sizes_at: 0.90 reduce_cache_capacity_to: 0.75 commitlog_total_space_in_mb: 4096 relevant JVM options used are: -Xms8000M -Xmx8000M -Xmn400M -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:MaxTenuringThreshold=1 -XX:**CMSInitiatingOccupancyFraction**=80 -XX:+** UseCMSInitiatingOccupancyOnly Now what happens is that with these settings after cassandra process restart, the GC it working fine at the beginning, and heap used looks like a saw with perfect teeth, eventually the teeth size start to diminish until the teeth become not noticable, and then cassandra starts to spend lot's of CPU time doing gc. It takes about 2 weeks until for such cycle , and then I need to restart cassandra process to improve performance. During all this time there are no memory related messages in cassandra system.log, except a GC for ParNew: little above 200ms once in a while. Things i've already done trying to reduce this eventual heap pressure. 1) reducing bloom_filter_fp_chance resulting in reduction from ~700MB to ~280MB total per node based on all Filter.db files on the node. 2) reducing key cache sizes, and dropping key_caches for CFs which do no not have many reads 3) the heap size was increased from 7000M to 8000M All these have not really helped , just the increase from 7000M to 8000M, helped in increase the cycle till excessive gc from ~9 days to ~14 days. I've tried to graph overtime the data that is supposed to be in heap vs actual heap size, by summing up all CFs bloom filter sizes + all CFs key cache capacities multipled by average key size + all CFs memtables data size reported (i've overestimated the data size a bit on purpose to be on the safe size). Here is a link to graph showing last 2 day metrics for a node which could not effectively do GC, and then cassandra process was restarted. http://awesomescreenshot.com/**0401w5y534http://awesomescreenshot.com/0401w5y534 You can clearly see that before and after restart, the size of data that is in supposed to be in heap, is the same pretty much the same, which makes me think that I really need is GC tunning. Also I suppose that this is not due to number of total keys each node has , which is between 300 - 200 milions keys for all CF key estimates summed on a code. The nodes have datasize between 75G to 45G accordingly to milions of keys. And all nodes are starting to have having GC heavy load after about 14 days. Also the excessive GC and heap usage are not affected by load which varies depending on time of the day (see read/write rates at the beginning of the mail). So again based on this , I assume this is not due to large number of keys or too much load on the cluster, but due to a pure GC misconfiguration issue. Things I remember that I've tried for GC tunning: 1) Changing -XX:MaxTenuringThreshold=1 to values like 8 - did not help. 2) Adding -XX:+CMSIncrementalMode -XX:+CMSIncrementalPacing -XX:** CMSIncrementalDutyCycleMin=0 -XX:CMSIncrementalDutyCycle=10 -XX:ParallelGCThreads=2 JVM_OPTS -XX:ParallelCMSThreads=1 this actually made things worse. 3) Adding -XX:-XX-UseAdaptiveSizePolicy -XX:SurvivorRatio=8 - did not help. Also since it takes like 2 weeks to verify that changing GC setting did not help, the process is painfully slow to try all the possibilities :) I'd highly appreciate any help and hints on the GC tunning. tnx Alex
Re: Side effects of hinted handoff lead to consistency problem
After check the log and configuration, I found it caused by two reason. 1. GC grace seconds I using hector client to connect cassandra, and the default value of GC grace seconds for each column family is **Zero** ! So when hinted handoff replay the temporary value, the tombstone on other two node is deleted by compaction. And then client will get the temporary value. 2. Secondary index Even after fix the first problem, I can still get temporary result from cassandra client. And I use the command like get my_cf where column_one='value' to query the data, then the temporary value show again. But when I using the raw key to query the record again, it disappeared. And from client, we always using row key to get the data, and in this way, I didn't get the temporary value. So it seems the secondary index is not restricted by the consistency configuration. And when I change GC grace seconds to 10 days. our problem solved, but it is still a strange behavior when using index query. 2013/10/8 Jason Tang ares.t...@gmail.com I have a 3 nodes cluster, replicate_factor is 3 also. Consistency level is Write quorum, Read quorum. Traffic has three major steps Create: Rowkey: Column: status=new, requests=x Update: Rowkey: Column: status=executing, requests=x Delete: Rowkey: When one node down, it can work according to consistency configuration, and the final status is all requests are finished and delete. So if running cassandra client to list the result (also set consistency quorum). It shows empty (only rowkey left), which is correct. But if we start the dead node, the hinted handoff model will write back the data to this node. So there are lots of create, update, delete. I don't know due to GC or compaction, the delete records on other two nodes seems not work, and if using cassandra client to list the data (also consistency quorum), the deleted row show again with column value. And if using client to check the data several times, you can find the data is changed, seems hinted handoff replay operation, the deleted data show up and then disappear. So the hinted handoff mechanism will faster the repair, but the temporary data will be seen from external (if data is deleted). Is there a way to have this procedure invisible from external, until the hinted handoff finished? What I want is final status synchronization, the temporary status is out of date and also incorrect, should never been seen from external. Is it due to row delete instead of column delete? Or compaction?
Re: Failed to solve Digest mismatch
I did some test on this issue, and it turns out the problem caused by local time stamp. In our traffic, the update and delete happened very fast, within 1 seconds, even within 100ms. And at that time, the ntp service seems not work well, the offset is same times even larger then 1 second. Then the some delete time stamp is before the create time stamp, so when do mismatch resolve, the result is not correct. 2012/7/4 aaron morton aa...@thelastpickle.com Jason, Are you able document the steps to reproduce this on a clean install ? Is so do you have time to create an issue on https://issues.apache.org/jira/browse/CASSANDRA Thanks - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 2/07/2012, at 1:49 AM, Jason Tang wrote: For the create/update/deleteColumn/deleteRow test case, for Quorum consistency level, 6 nodes, replicate factor 3, for one thread around 1/100 round, I can have this reproduced. And if I have 20 client threads to run the test client, the ratio is bigger. And the test group will be executed by one thread, and the client time stamp is unique and sequenced, guaranteed by Hector. And client only access the data from local Cassandra. And the query only use the row key which is unique. The column name is not unique, in my case, eg, status. And the row have around 7 columns, which are all not big, eg status:true, userName:Jason ... BRs //Ares 2012/7/1 Jonathan Ellis jbel...@gmail.com Is this Cassandra 1.1.1? How often do you observe this? How many columns are in the row? Can you reproduce when querying by column name, or only when slicing the row? On Thu, Jun 28, 2012 at 7:24 AM, Jason Tang ares.t...@gmail.com wrote: Hi First I delete one column, then I delete one row. Then try to read all columns from the same row, all operations from same client app. The consistency level is read/write quorum. Check the Cassandra log, the local node don't perform the delete operation but send the mutation to other nodes (192.168.0.6, 192.168.0.1) After delete, I try to read all columns from the row, I found the node found Digest mismatch due to Quorum consistency configuration, but the result is not correct. From the log, I can see the delete mutation already accepted by 192.168.0.6, 192.168.0.1, but when 192.168.0.5 read response from 0.6 and 0.1, and then it merge the data, but finally 0.5 shows the result which is the dirty data. Following logs shows the change of column 737461747573 , 192.168.0.5 try to read from 0.1 and 0.6, it should be deleted, but finally it shows it has the data. log: 192.168.0.5 DEBUG [Thrift:17] 2012-06-28 15:59:42,198 StorageProxy.java (line 653) Command/ConsistencyLevel is SliceByNamesReadCommand(table='drc', key=7878323239537570657254616e67307878, columnParent='QueryPath(columnFamilyName='queue', superColumnName='null', columnName='null')', columns=[6578656375746554696d65,6669726554696d65,67726f75705f6964,696e517565756554696d65,6c6f67526f6f744964,6d6f54797065,706172746974696f6e,7265636569766554696d65,72657175657374,7265747279,7365727669636550726f7669646572,737461747573,757365724e616d65,])/QUORUM DEBUG [Thrift:17] 2012-06-28 15:59:42,198 ReadCallback.java (line 79) Blockfor is 2; setting up requests to /192.168.0.6,/192.168.0.1 DEBUG [Thrift:17] 2012-06-28 15:59:42,198 StorageProxy.java (line 674) reading data from /192.168.0.6 DEBUG [Thrift:17] 2012-06-28 15:59:42,198 StorageProxy.java (line 694) reading digest from /192.168.0.1 DEBUG [RequestResponseStage:2] 2012-06-28 15:59:42,199 ResponseVerbHandler.java (line 44) Processing response on a callback from 6556@/192.168.0.6 DEBUG [RequestResponseStage:2] 2012-06-28 15:59:42,199 AbstractRowResolver.java (line 66) Preprocessed data response DEBUG [RequestResponseStage:6] 2012-06-28 15:59:42,199 ResponseVerbHandler.java (line 44) Processing response on a callback from 6557@/192.168.0.1 DEBUG [RequestResponseStage:6] 2012-06-28 15:59:42,199 AbstractRowResolver.java (line 66) Preprocessed digest response DEBUG [Thrift:17] 2012-06-28 15:59:42,199 RowDigestResolver.java (line 65) resolving 2 responses DEBUG [Thrift:17] 2012-06-28 15:59:42,200 StorageProxy.java (line 733) Digest mismatch: org.apache.cassandra.service.DigestMismatchException: Mismatch for key DecoratedKey(100572974179274741747356988451225858264, 7878323239537570657254616e67307878) (b725ab25696111be49aaa7c4b7afa52d vs d41d8cd98f00b204e9800998ecf8427e) DEBUG [RequestResponseStage:9] 2012-06-28 15:59:42,201 ResponseVerbHandler.java (line 44) Processing response on a callback from 6558@/192.168.0.6 DEBUG [RequestResponseStage:7] 2012-06-28 15:59:42,201 ResponseVerbHandler.java (line 44) Processing response on a callback from 6559@/192.168.0.1 DEBUG [RequestResponseStage:9] 2012-06-28 15:59:42,201 AbstractRowResolver.java (line 66
Side effects of hinted handoff lead to consistency problem
I have a 3 nodes cluster, replicate_factor is 3 also. Consistency level is Write quorum, Read quorum. Traffic has three major steps Create: Rowkey: Column: status=new, requests=x Update: Rowkey: Column: status=executing, requests=x Delete: Rowkey: When one node down, it can work according to consistency configuration, and the final status is all requests are finished and delete. So if running cassandra client to list the result (also set consistency quorum). It shows empty (only rowkey left), which is correct. But if we start the dead node, the hinted handoff model will write back the data to this node. So there are lots of create, update, delete. I don't know due to GC or compaction, the delete records on other two nodes seems not work, and if using cassandra client to list the data (also consistency quorum), the deleted row show again with column value. And if using client to check the data several times, you can find the data is changed, seems hinted handoff replay operation, the deleted data show up and then disappear. So the hinted handoff mechanism will faster the repair, but the temporary data will be seen from external (if data is deleted). Is there a way to have this procedure invisible from external, until the hinted handoff finished? What I want is final status synchronization, the temporary status is out of date and also incorrect, should never been seen from external. Is it due to row delete instead of column delete? Or compaction?
Why Cassandra so depend on client local timestamp?
Following case may be logical correct for Cassandra, but difficult for user. Let's say: Cassandra consistency level: write all, read one replication_factor:3 For one record, rowkey:001, column:status Client 1, insert value for rowkey 001, status:True, timestamp 11:00:05 Client 2 Slice Query, get the value True for rowkey 001, @11:00:00 Client 2, update value for rowkey 001, status:False, timestamp 11:00:02 So the client update sequence is True to False, although the update requests are from different nodes, but the sequence are logically ordered. But the result is rowkey:001, column:status, value: True So why Cassandra so depend on client local time? Why not using server localtime instead client local time? Because I am using consistency level write all, and replication_factor:3, so for all the 3 nodes, the update sequence is correct (True - False), they can give a correct final results. If for some reason, it need strong depends on operation's timestamp, then query operation also need a timestamp, then Client 2 will not see the value True, which happen in future. So either using server timestamp or provide a consistent view by using timestamp for query, it will be more consistent. Otherwise, the consistency of Cassandra is so weak.
Gossiper in Cassandra using unicast/broadcast/multicast ?
Hi We are considering using Cassandra in virtualization environment. I wonder is Cassandra using unicast/broadcast/multicast for node discover or communication? From the code, I find the broadcast address is used for heartbeat in Gossiper.java, but I don't know how actually it works when node communication and when node start up (not for new node added in) BRs
Re: Consistent problem when solve Digest mismatch
Actually I didn't concurrent update the same records, because I first create it, then search it, then delete it. The version conflict solved failed, due to delete local time stamp is earlier then create local time stamp. 2013/3/6 aaron morton aa...@thelastpickle.com Otherwise, it means the version conflict solving strong depends on global sequence id (timestamp) which need provide by client ? Yes. If you have an area of your data model that has a high degree of concurrency C* may not be the right match. In 1.1 we have atomic updates so clients see either the entire write or none of it. And sometimes you can design a data model that does mutate shared values, but writes ledger entries instead. See Matt Denis talk here http://www.datastax.com/events/cassandrasummit2012/presentations or this post http://thelastpickle.com/2012/08/18/Sorting-Lists-For-Humans/ Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 4/03/2013, at 4:30 PM, Jason Tang ares.t...@gmail.com wrote: Hi The timestamp provided by my client is unix timestamp (with ntp), and as I said, due to the ntp drift, the local unix timestamp is not accurately synchronized (compare to my case). So for short, client can not provide global sequence number to indicate the event order. But I wonder, I configured Cassandra consistency level as write QUORUM. So for one record, I suppose Cassandra has the ability to decide the final update results. Otherwise, it means the version conflict solving strong depends on global sequence id (timestamp) which need provide by client ? //Tang 2013/3/4 Sylvain Lebresne sylv...@datastax.com The problem is, what is the sequence number you are talking about is exactly? Or let me put it another way: if you do have a sequence number that provides a total ordering of your operation, then that is exactly what you should use as your timestamp. What Cassandra calls the timestamp, is exactly what you call seqID, it's the number Cassandra uses to decide the order of operation. Except that in real life, provided you have more than one client talking to Cassandra, then providing a total ordering of operation is hard, and in fact not doable efficiently. So in practice, people use unix timestamp (with ntp) which provide a very good while cheap approximation of the real life order of operations. But again, if you do know how to assign a more precise timestamp, Cassandra let you use that: you can provid your own timestamp (using unix timestamp is just the default). The point being, unix timestamp is the better approximation we have in practice. -- Sylvain On Mon, Mar 4, 2013 at 9:26 AM, Jason Tang ares.t...@gmail.com wrote: Hi Previous I met a consistency problem, you can refer the link below for the whole story. http://mail-archives.apache.org/mod_mbox/cassandra-user/201206.mbox/%3CCAFb+LUxna0jiY0V=AvXKzUdxSjApYm4zWk=ka9ljm-txc04...@mail.gmail.com%3E And after check the code, seems I found some clue of the problem. Maybe some one can check this. For short, I have Cassandra cluster (1.0.3), The consistency level is read/write quorum, replication_factor is 3. Here is event sequence: seqID NodeA NodeB NodeC 1. New New New 2. Update Update Update 3. Delete Delete When try to read from NodeB and NodeC, Digest mismatch exception triggered, so Cassandra try to resolve this version conflict. But the result is value Update. Here is the suspect root cause, the version conflict resolved based on time stamp. Node C local time is a bit earlier then node A. Update requests sent from node C with time stamp 00:00:00.050, Delete sent from node A with time stamp 00:00:00.020, which is not same as the event sequence. So the version conflict resolved incorrectly. It is true? If Yes, then it means, consistency level can secure the conflict been found, but to solve it correctly, dependence one time synchronization's accuracy, e.g. NTP ?
Re: Consistent problem when solve Digest mismatch
Hi The timestamp provided by my client is unix timestamp (with ntp), and as I said, due to the ntp drift, the local unix timestamp is not accurately synchronized (compare to my case). So for short, client can not provide global sequence number to indicate the event order. But I wonder, I configured Cassandra consistency level as write QUORUM. So for one record, I suppose Cassandra has the ability to decide the final update results. Otherwise, it means the version conflict solving strong depends on global sequence id (timestamp) which need provide by client ? //Tang 2013/3/4 Sylvain Lebresne sylv...@datastax.com The problem is, what is the sequence number you are talking about is exactly? Or let me put it another way: if you do have a sequence number that provides a total ordering of your operation, then that is exactly what you should use as your timestamp. What Cassandra calls the timestamp, is exactly what you call seqID, it's the number Cassandra uses to decide the order of operation. Except that in real life, provided you have more than one client talking to Cassandra, then providing a total ordering of operation is hard, and in fact not doable efficiently. So in practice, people use unix timestamp (with ntp) which provide a very good while cheap approximation of the real life order of operations. But again, if you do know how to assign a more precise timestamp, Cassandra let you use that: you can provid your own timestamp (using unix timestamp is just the default). The point being, unix timestamp is the better approximation we have in practice. -- Sylvain On Mon, Mar 4, 2013 at 9:26 AM, Jason Tang ares.t...@gmail.com wrote: Hi Previous I met a consistency problem, you can refer the link below for the whole story. http://mail-archives.apache.org/mod_mbox/cassandra-user/201206.mbox/%3CCAFb+LUxna0jiY0V=AvXKzUdxSjApYm4zWk=ka9ljm-txc04...@mail.gmail.com%3E And after check the code, seems I found some clue of the problem. Maybe some one can check this. For short, I have Cassandra cluster (1.0.3), The consistency level is read/write quorum, replication_factor is 3. Here is event sequence: seqID NodeA NodeB NodeC 1. New New New 2. Update Update Update 3. Delete Delete When try to read from NodeB and NodeC, Digest mismatch exception triggered, so Cassandra try to resolve this version conflict. But the result is value Update. Here is the suspect root cause, the version conflict resolved based on time stamp. Node C local time is a bit earlier then node A. Update requests sent from node C with time stamp 00:00:00.050, Delete sent from node A with time stamp 00:00:00.020, which is not same as the event sequence. So the version conflict resolved incorrectly. It is true? If Yes, then it means, consistency level can secure the conflict been found, but to solve it correctly, dependence one time synchronization's accuracy, e.g. NTP ?
Re: Cassandra Consistency problem with NTP
Delay read is acceptable, but problem still there: A request come to node One at local time PM 10:00:01.000 B request come to node Two at local time PM 10:00:00.980 The correct order is A -- B I am not sure how node C will handle the data, although A came before B, but B's timestamp is earlier then A ? 2013/1/17 Russell Haering russellhaer...@gmail.com One solution is to only read up to (now - 1 second). If this is a public API where you want to guarantee full consistency (ie, if you have added a message to the queue, it will definitely appear to be there) you can instead delay requests for 1 second before reading up to the moment that the request was received. In either of these approaches you can tune the time offset based on how closely synchronized you believe you can keep your clocks. The tradeoff of course, will be increased latency. On Wed, Jan 16, 2013 at 5:56 PM, Jason Tang ares.t...@gmail.com wrote: Hi I am using Cassandra in a message bus solution, the major responsibility of cassandra is recording the incoming requests for later consumming. One strategy is First in First out (FIFO), so I need to get the stored request in reversed order. I use NTP to synchronize the system time for the nodes in the cluster. (4 nodes). But the local time of each node are still have some inaccuracy, around 40 ms. The consistency level is write all and read one, and replicate factor is 3. But here is the problem: A request come to node One at local time PM 10:00:01.000 B request come to node Two at local time PM 10:00:00.980 The correct order is A -- B But the timestamp is B -- A So is there any way for Cassandra to keep the correct order for read operation? (e.g. logical timestamp ?) Or Cassandra strong depence on time synchronization solution? BRs //Tang
Re: Cassandra Consistency problem with NTP
Yes, Sylvain, you are correct. When I say A comes before B, it means client will secure the order, actually, B will be sent only after get response of A request. And Yes, A and B are not update same record, so it is not typical Cassandra consistency problem. And Yes, the column name is provide by client, and now I use the local timestamp, and local time of A and B are not synchronized well, so I have problem. So what I want is, Cassandra provide some information for client, to indicate A is stored before B, e.g. global unique timestamp, or row order. 2013/1/17 Sylvain Lebresne sylv...@datastax.com I'm not sure I fully understand your problem. You seem to be talking of ordering the requests, in the order they are generated. But in that case, you will rely on the ordering of columns within whatever row you store request A and B in, and that order depends on the column names, which in turns is client provided and doesn't depend at all of the time synchronization of the cluster nodes. And since you are able to say that request A comes before B, I suppose this means said requests are generated from the same source. In which case you just need to make sure that the column names storing each request respect the correct ordering. The column timestamps Cassandra uses are here to which update *to the same column* is the more recent one. So it only comes into play if you requests A and B update the same column and you're interested in knowing which one of the update will win when you read. But even if that's your case (which doesn't sound like it at all from your description), the column timestamp is only generated server side if you use CQL. And even in that latter case, it's a convenience and you can force a timestamp client side if you really wish. In other words, Cassandra dependency on time synchronization is not a strong one even in that case. But again, that doesn't seem at all to be the problem you are trying to solve. -- Sylvain On Thu, Jan 17, 2013 at 2:56 AM, Jason Tang ares.t...@gmail.com wrote: Hi I am using Cassandra in a message bus solution, the major responsibility of cassandra is recording the incoming requests for later consumming. One strategy is First in First out (FIFO), so I need to get the stored request in reversed order. I use NTP to synchronize the system time for the nodes in the cluster. (4 nodes). But the local time of each node are still have some inaccuracy, around 40 ms. The consistency level is write all and read one, and replicate factor is 3. But here is the problem: A request come to node One at local time PM 10:00:01.000 B request come to node Two at local time PM 10:00:00.980 The correct order is A -- B But the timestamp is B -- A So is there any way for Cassandra to keep the correct order for read operation? (e.g. logical timestamp ?) Or Cassandra strong depence on time synchronization solution? BRs //Tang
Re: is it possible to disable compaction per CF ?
setMaxCompactionThreshold(0) setMinCompactionThreshold(0) 2012/7/27 Илья Шипицин chipits...@gmail.com Hello! if we are dealing with append-only data model, so what if I disable compaction on certain CF ? any side effect ? can I do it with update column family with compaction_strategy = null ? Cheers, Ilya Shipitsin
Compaction not remove the deleted data from secondary index when use TTL
Hi For some consistency problem, we can not use delete direct to delete one row, and then we use TTL for each column of the row. We using the Cassandra as the central storage of the stateful system. All request will be stored in Cassandra, and marked as status;NEW, and then we change it to status:EXECUTING, then delete it (by TTL). And we use secondary index of column 'status', and after process 4 million requests, most of the requests are deleted from Cassandra. After executing compact from nodetool, the size of CF Requests SSTable is decreased to about 20M, but the Requests.idxStatus is continuously increased, and about 1.6G. And from the system log, I found the compact command from nodetool will not trigger the compaction of the secondary index, but during the traffic, when compaction of the CF Requests triggered, the compaction of the index will be started also. But the size of the SSTable not decreased as expected, it seems the data in secondary index not deleted. And since we only have 3 status, I can found such log INFO [CompactionExecutor:31] 2012-07-20 10:30:50,532 CompactionController.java (line 129) Compacting large row demo/ Requests.idxStatus:EXECUTING (264045300 bytes) incrementally So why the secondary index not compact to small size as expected, is it related to TTL? And is it possible to rebuild the index ? BRs
Re: Replication factor - Consistency Questions
Yes, for ALL, it is not good for HA, and because we meet problem when use QUORAM, and current solution is switch Write:QUORAM / Read:QUORAM when got UnavailableException exception. 2012/7/18 Jay Parashar jparas...@itscape.com Thanks..but write ALL will fail for any downed nodes. I am thinking of QUORAM. ** ** *From:* Jason Tang [mailto:ares.t...@gmail.com] *Sent:* Tuesday, July 17, 2012 8:24 PM *To:* user@cassandra.apache.org *Subject:* Re: Replication factor - Consistency Questions ** ** Hi ** ** I am starting using Cassandra for not a long time, and also have problems in consistency. ** ** Here is some thinking. If you have Write:Any / Read:One, it will have consistency problem, and if you want to repair, check your schema, and check the parameter Read repair chance: http://wiki.apache.org/cassandra/StorageConfiguration ** ** And if you want to get consistency result, my suggestion is to have Write:ALL / Read:One, since for Cassandra, write is more faster then read. ** ** For performance impact, you need to test your traffic, and if your memory can not cache all your data, or your network is not fast enough, then yes, it will impact to write one more node. ** ** BRs ** ** 2012/7/18 Jay Parashar jparas...@itscape.com Hello all, There is a lot of material on Replication factor and Consistency level but I am a little confused by what is happening on my setup. (Cassandra 1.1.2). I would appreciate any answers. My Setup: A cluster of 2 nodes evenly balanced. My RF =2, Consistency Level; Write = ANY and Read = 1 I know that my consistency is Weak but since my RF = 2, I thought data would be just duplicated in both the nodes but sometimes, querying does not give me the correct (or gives partial) results. In other times, it gives me the right results Is the Read Repair going on after the first query? But as RF = 2, data is duplicated then why the repair? Note: My query is done a while after the Writes so data should have been in both the nodes. Or is this not the case (flushing not happening etc)? I am thinking of making the Write as 1 and Read as QUORAM so R + W RF (1 + 2 2) to give strong consistency. Will that affect performance a lot (generally speaking)? Thanks in advance Regards Jay ** **
Re: Replication factor - Consistency Questions
Hi I am starting using Cassandra for not a long time, and also have problems in consistency. Here is some thinking. If you have Write:Any / Read:One, it will have consistency problem, and if you want to repair, check your schema, and check the parameter Read repair chance: http://wiki.apache.org/cassandra/StorageConfiguration And if you want to get consistency result, my suggestion is to have Write:ALL / Read:One, since for Cassandra, write is more faster then read. For performance impact, you need to test your traffic, and if your memory can not cache all your data, or your network is not fast enough, then yes, it will impact to write one more node. BRs 2012/7/18 Jay Parashar jparas...@itscape.com Hello all, There is a lot of material on Replication factor and Consistency level but I am a little confused by what is happening on my setup. (Cassandra 1.1.2). I would appreciate any answers. My Setup: A cluster of 2 nodes evenly balanced. My RF =2, Consistency Level; Write = ANY and Read = 1 I know that my consistency is Weak but since my RF = 2, I thought data would be just duplicated in both the nodes but sometimes, querying does not give me the correct (or gives partial) results. In other times, it gives me the right results Is the Read Repair going on after the first query? But as RF = 2, data is duplicated then why the repair? Note: My query is done a while after the Writes so data should have been in both the nodes. Or is this not the case (flushing not happening etc)? I am thinking of making the Write as 1 and Read as QUORAM so R + W RF (1 + 2 2) to give strong consistency. Will that affect performance a lot (generally speaking)? Thanks in advance Regards Jay
Re: Cassandra take 100% CPU for 2~3 minutes every half an hour and mutation lost
Hi After change the parameter of concurrent compactor, we can limit Cassandra to use 100% of one core at that moment. (concurrent_compactors: 1) And I got the stack of the crazy thread, it last 2~3 minutes, on same stack. Any clue of this issue? Thread 18114: (state = IN_JAVA) - java.util.AbstractList$Itr.hasNext() @bci=8, line=339 (Compiled frame; information may be imprecise) - org.apache.cassandra.db.ColumnFamilyStore.removeDeletedStandard(org.apache.cassandra.db.ColumnFamily, int) @bci=6, line=841 (Compiled frame) - org.apache.cassandra.db.ColumnFamilyStore.removeDeletedColumnsOnly(org.apache.cassandra.db.ColumnFamily, int) @bci=17, line=835 (Compiled frame) - org.apache.cassandra.db.ColumnFamilyStore.removeDeleted(org.apache.cassandra.db.ColumnFamily, int) @bci=8, line=826 (Compiled frame) - org.apache.cassandra.db.compaction.PrecompactedRow.removeDeletedAndOldShards(org.apache.cassandra.db.DecoratedKey, org.apache.cassandra.db.compaction.CompactionController, org.apache.cassandra.db.ColumnFamily) @bci=38, line=77 (Compiled frame) - org.apache.cassandra.db.compaction.PrecompactedRow.init(org.apache.cassandra.db.compaction.CompactionController, java.util.List) @bci=33, line=102 (Compiled frame) - org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(java.util.List) @bci=223, line=133 (Compiled frame) - org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced() @bci=44, line=102 (Compiled frame) - org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced() @bci=1, line=87 (Compiled frame) - org.apache.cassandra.utils.MergeIterator$ManyToOne.consume() @bci=88, line=116 (Compiled frame) - org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext() @bci=5, line=99 (Compiled frame) - com.google.common.collect.AbstractIterator.tryToComputeNext() @bci=9, line=140 (Compiled frame) - com.google.common.collect.AbstractIterator.hasNext() @bci=61, line=135 (Compiled frame) - com.google.common.collect.Iterators$7.computeNext() @bci=4, line=614 (Compiled frame) - com.google.common.collect.AbstractIterator.tryToComputeNext() @bci=9, line=140 (Compiled frame) - com.google.common.collect.AbstractIterator.hasNext() @bci=61, line=135 (Compiled frame) - org.apache.cassandra.db.compaction.CompactionTask.execute(org.apache.cassandra.db.compaction.CompactionManager$CompactionExecutorStatsCollector) @bci=542, line=141 (Compiled frame) - org.apache.cassandra.db.compaction.CompactionManager$1.call() @bci=117, line=134 (Interpreted frame) - org.apache.cassandra.db.compaction.CompactionManager$1.call() @bci=1, line=114 (Interpreted frame) - java.util.concurrent.FutureTask$Sync.innerRun() @bci=30, line=303 (Interpreted frame) - java.util.concurrent.FutureTask.run() @bci=4, line=138 (Interpreted frame) - java.util.concurrent.ThreadPoolExecutor$Worker.runTask(java.lang.Runnable) @bci=59, line=886 (Compiled frame) - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=28, line=908 (Compiled frame) - java.lang.Thread.run() @bci=11, line=662 (Interpreted frame) BRs //Jason 2012/7/11 Jason Tang ares.t...@gmail.com Hi I encounter the High CPU problem, Cassandra 1.0.3, happened on both sized and leveled compaction, 6G heap, 64bit Oracle java. For normal traffic, Cassandra will use 15% CPU. But every half a hour, Cassandra will use almost 100% total cpu (SUSE, 12 Core). And here is the top information for that moment. #top -H -p 12451 top - 12:30:14 up 15 days, 12:49, 6 users, load average: 10.52, 8.92, 8.14 Tasks: 706 total, 21 running, 685 sleeping, 0 stopped, 0 zombie Cpu(s): 25.7%us, 14.0%sy, 48.9%ni, 6.5%id, 0.0%wa, 0.0%hi, 4.9%si, 0.0%st Mem: 24150M total,12218M used,11932M free, 142M buffers Swap:0M total,0M used,0M free, 3714M cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 20291 casadm24 4 8003m 5.4g 167m R 92 22.7 0:42.46 java 20276 casadm24 4 8003m 5.4g 167m R 88 22.7 0:43.88 java 20181 casadm24 4 8003m 5.4g 167m R 86 22.7 0:52.97 java 20213 casadm24 4 8003m 5.4g 167m R 85 22.7 0:49.21 java 20188 casadm24 4 8003m 5.4g 167m R 82 22.7 0:54.34 java 20268 casadm24 4 8003m 5.4g 167m R 81 22.7 0:46.25 java 20269 casadm24 4 8003m 5.4g 167m R 41 22.7 0:15.11 java 20316 casadm24 4 8003m 5.4g 167m S 20 22.7 0:02.35 java 20191 casadm24 4 8003m 5.4g 167m R 15 22.7 0:16.85 java 12500 casadm20 0 8003m 5.4g 167m R6 22.7 1:07.86 java 15245 casadm20 0 8003m 5.4g 167m D5 22.7 0:36.45 java Jstack can not print the stack. Thread 20291: (state = IN_JAVA) Error occurred during stack walking: ... Thread 20276: (state = IN_JAVA) Error occurred during stack walking: After it come back, the stack shows: Thread 20291: (state = BLOCKED) - sun.misc.Unsafe.park(boolean, long) @bci
Cassandra take 100% CPU for 2~3 minutes every half an hour and mutation lost
Hi I encounter the High CPU problem, Cassandra 1.0.3, happened on both sized and leveled compaction, 6G heap, 64bit Oracle java. For normal traffic, Cassandra will use 15% CPU. But every half a hour, Cassandra will use almost 100% total cpu (SUSE, 12 Core). And here is the top information for that moment. #top -H -p 12451 top - 12:30:14 up 15 days, 12:49, 6 users, load average: 10.52, 8.92, 8.14 Tasks: 706 total, 21 running, 685 sleeping, 0 stopped, 0 zombie Cpu(s): 25.7%us, 14.0%sy, 48.9%ni, 6.5%id, 0.0%wa, 0.0%hi, 4.9%si, 0.0%st Mem: 24150M total,12218M used,11932M free, 142M buffers Swap:0M total,0M used,0M free, 3714M cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 20291 casadm24 4 8003m 5.4g 167m R 92 22.7 0:42.46 java 20276 casadm24 4 8003m 5.4g 167m R 88 22.7 0:43.88 java 20181 casadm24 4 8003m 5.4g 167m R 86 22.7 0:52.97 java 20213 casadm24 4 8003m 5.4g 167m R 85 22.7 0:49.21 java 20188 casadm24 4 8003m 5.4g 167m R 82 22.7 0:54.34 java 20268 casadm24 4 8003m 5.4g 167m R 81 22.7 0:46.25 java 20269 casadm24 4 8003m 5.4g 167m R 41 22.7 0:15.11 java 20316 casadm24 4 8003m 5.4g 167m S 20 22.7 0:02.35 java 20191 casadm24 4 8003m 5.4g 167m R 15 22.7 0:16.85 java 12500 casadm20 0 8003m 5.4g 167m R6 22.7 1:07.86 java 15245 casadm20 0 8003m 5.4g 167m D5 22.7 0:36.45 java Jstack can not print the stack. Thread 20291: (state = IN_JAVA) Error occurred during stack walking: ... Thread 20276: (state = IN_JAVA) Error occurred during stack walking: After it come back, the stack shows: Thread 20291: (state = BLOCKED) - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise) - java.util.concurrent.locks.LockSupport.parkNanos(java.lang.Object, long) @bci=20, line=196 (Compiled frame) - java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(java.util.concurrent.SynchronousQueue$TransferStack$SNode, boolean, long) @bci=174, line=424 (Compiled frame) - java.util.concurrent.SynchronousQueue$TransferStack.transfer(java.lang.Object, boolean, long) @bci=102, line=323 (Compiled frame) - java.util.concurrent.SynchronousQueue.poll(long, java.util.concurrent.TimeUnit) @bci=11, line=874 (Compiled frame) - java.util.concurrent.ThreadPoolExecutor.getTask() @bci=62, line=945 (Compiled frame) - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=18, line=907 (Compiled frame) - java.lang.Thread.run() @bci=11, line=662 (Interpreted frame And after this happened, the data is not correct, some large column which suppose to be deleted, come back again. Here is the suspect thread when it use up 100% Thread 20191: (state = IN_VM) - sun.misc.Unsafe.unpark(java.lang.Object) @bci=0 (Compiled frame; information may be imprecise) - java.util.concurrent.locks.LockSupport.unpark(java.lang.Thread) @bci=8, line=122 (Compiled frame) - java.util.concurrent.SynchronousQueue$TransferStack$SNode.tryMatch(java.util.concurrent.SynchronousQueue$TransferStack$SNode) @bci=34, line=242 (Compiled frame) - java.util.concurrent.SynchronousQueue$TransferStack.transfer(java.lang.Object, boolean, long) @bci=268, line=344 (Compiled frame) - java.util.concurrent.SynchronousQueue.offer(java.lang.Object) @bci=19, line=846 (Compiled frame) - java.util.concurrent.ThreadPoolExecutor.execute(java.lang.Runnable) @bci=43, line=653 (Compiled frame) - java.util.concurrent.AbstractExecutorService.submit(java.util.concurrent.Callable) @bci=20, line=92 (Compiled frame) - org.apache.cassandra.db.compaction.ParallelCompactionIterable$Reducer.getCompactedRow(java.util.List) @bci=86, line=190 (Compiled frame) - org.apache.cassandra.db.compaction.ParallelCompactionIterable$Reducer.getReduced() @bci=31, line=164 (Compiled frame) - org.apache.cassandra.db.compaction.ParallelCompactionIterable$Reducer.getReduced() @bci=1, line=144 (Compiled frame) - org.apache.cassandra.utils.MergeIterator$ManyToOne.consume() @bci=88, line=116 (Compiled frame) - org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext() @bci=5, line=99 (Compiled frame) - com.google.common.collect.AbstractIterator.tryToComputeNext() @bci=9, line=140 (Compiled frame) - com.google.common.collect.AbstractIterator.hasNext() @bci=61, line=135 (Compiled frame) - org.apache.cassandra.db.compaction.ParallelCompactionIterable$Unwrapper.computeNext() @bci=4, line=103 (Compiled frame) - org.apache.cassandra.db.compaction.ParallelCompactionIterable$Unwrapper.computeNext() @bci=1, line=90 (Compiled frame) - com.google.common.collect.AbstractIterator.tryToComputeNext() @bci=9, line=140 (Compiled frame) - com.google.common.collect.AbstractIterator.hasNext() @bci=61, line=135 (Compiled frame) - com.google.common.collect.Iterators$7.computeNext() @bci=4, line=614 (Compiled frame) -
Re: Failed to solve Digest mismatch
For the create/update/deleteColumn/deleteRow test case, for Quorum consistency level, 6 nodes, replicate factor 3, for one thread around 1/100 round, I can have this reproduced. And if I have 20 client threads to run the test client, the ratio is bigger. And the test group will be executed by one thread, and the client time stamp is unique and sequenced, guaranteed by Hector. And client only access the data from local Cassandra. And the query only use the row key which is unique. The column name is not unique, in my case, eg, status. And the row have around 7 columns, which are all not big, eg status:true, userName:Jason ... BRs //Ares 2012/7/1 Jonathan Ellis jbel...@gmail.com Is this Cassandra 1.1.1? How often do you observe this? How many columns are in the row? Can you reproduce when querying by column name, or only when slicing the row? On Thu, Jun 28, 2012 at 7:24 AM, Jason Tang ares.t...@gmail.com wrote: Hi First I delete one column, then I delete one row. Then try to read all columns from the same row, all operations from same client app. The consistency level is read/write quorum. Check the Cassandra log, the local node don't perform the delete operation but send the mutation to other nodes (192.168.0.6, 192.168.0.1) After delete, I try to read all columns from the row, I found the node found Digest mismatch due to Quorum consistency configuration, but the result is not correct. From the log, I can see the delete mutation already accepted by 192.168.0.6, 192.168.0.1, but when 192.168.0.5 read response from 0.6 and 0.1, and then it merge the data, but finally 0.5 shows the result which is the dirty data. Following logs shows the change of column 737461747573 , 192.168.0.5 try to read from 0.1 and 0.6, it should be deleted, but finally it shows it has the data. log: 192.168.0.5 DEBUG [Thrift:17] 2012-06-28 15:59:42,198 StorageProxy.java (line 653) Command/ConsistencyLevel is SliceByNamesReadCommand(table='drc', key=7878323239537570657254616e67307878, columnParent='QueryPath(columnFamilyName='queue', superColumnName='null', columnName='null')', columns=[6578656375746554696d65,6669726554696d65,67726f75705f6964,696e517565756554696d65,6c6f67526f6f744964,6d6f54797065,706172746974696f6e,7265636569766554696d65,72657175657374,7265747279,7365727669636550726f7669646572,737461747573,757365724e616d65,])/QUORUM DEBUG [Thrift:17] 2012-06-28 15:59:42,198 ReadCallback.java (line 79) Blockfor is 2; setting up requests to /192.168.0.6,/192.168.0.1 DEBUG [Thrift:17] 2012-06-28 15:59:42,198 StorageProxy.java (line 674) reading data from /192.168.0.6 DEBUG [Thrift:17] 2012-06-28 15:59:42,198 StorageProxy.java (line 694) reading digest from /192.168.0.1 DEBUG [RequestResponseStage:2] 2012-06-28 15:59:42,199 ResponseVerbHandler.java (line 44) Processing response on a callback from 6556@/192.168.0.6 DEBUG [RequestResponseStage:2] 2012-06-28 15:59:42,199 AbstractRowResolver.java (line 66) Preprocessed data response DEBUG [RequestResponseStage:6] 2012-06-28 15:59:42,199 ResponseVerbHandler.java (line 44) Processing response on a callback from 6557@/192.168.0.1 DEBUG [RequestResponseStage:6] 2012-06-28 15:59:42,199 AbstractRowResolver.java (line 66) Preprocessed digest response DEBUG [Thrift:17] 2012-06-28 15:59:42,199 RowDigestResolver.java (line 65) resolving 2 responses DEBUG [Thrift:17] 2012-06-28 15:59:42,200 StorageProxy.java (line 733) Digest mismatch: org.apache.cassandra.service.DigestMismatchException: Mismatch for key DecoratedKey(100572974179274741747356988451225858264, 7878323239537570657254616e67307878) (b725ab25696111be49aaa7c4b7afa52d vs d41d8cd98f00b204e9800998ecf8427e) DEBUG [RequestResponseStage:9] 2012-06-28 15:59:42,201 ResponseVerbHandler.java (line 44) Processing response on a callback from 6558@/192.168.0.6 DEBUG [RequestResponseStage:7] 2012-06-28 15:59:42,201 ResponseVerbHandler.java (line 44) Processing response on a callback from 6559@/192.168.0.1 DEBUG [RequestResponseStage:9] 2012-06-28 15:59:42,201 AbstractRowResolver.java (line 66) Preprocessed data response DEBUG [RequestResponseStage:7] 2012-06-28 15:59:42,201 AbstractRowResolver.java (line 66) Preprocessed data response DEBUG [Thrift:17] 2012-06-28 15:59:42,201 RowRepairResolver.java (line 63) resolving 2 responses DEBUG [Thrift:17] 2012-06-28 15:59:42,201 SliceQueryFilter.java (line 123) collecting 0 of 2147483647: 6669726554696d65:false:13@1340870382109004 DEBUG [Thrift:17] 2012-06-28 15:59:42,201 SliceQueryFilter.java (line 123) collecting 1 of 2147483647: 67726f75705f6964:false:10@1340870382109014 DEBUG [Thrift:17] 2012-06-28 15:59:42,201 SliceQueryFilter.java (line 123) collecting 2 of 2147483647: 696e517565756554696d65:false:13@1340870382109005 DEBUG [Thrift:17] 2012-06-28 15:59:42,201 SliceQueryFilter.java (line 123) collecting 3 of 2147483647
Failed to solve Digest mismatch
Hi First I delete one column, then I delete one row. Then try to read all columns from the same row, all operations from same client app. The consistency level is read/write quorum. Check the Cassandra log, the local node don't perform the delete operation but send the mutation to other nodes (192.168.0.6, 192.168.0.1) After delete, I try to read all columns from the row, I found the node found Digest mismatch due to Quorum consistency configuration, but the result is not correct. From the log, I can see the delete mutation already accepted by 192.168.0.6, 192.168.0.1, but when 192.168.0.5 read response from 0.6 and 0.1, and then it merge the data, but finally 0.5 shows the result which is the dirty data. Following logs shows the change of column 737461747573 , 192.168.0.5 try to read from 0.1 and 0.6, it should be deleted, but finally it shows it has the data. log: 192.168.0.5 DEBUG [Thrift:17] 2012-06-28 15:59:42,198 StorageProxy.java (line 653) Command/ConsistencyLevel is SliceByNamesReadCommand(table='drc', key=7878323239537570657254616e67307878, columnParent='QueryPath(columnFamilyName='queue', superColumnName='null', columnName='null')', columns=[6578656375746554696d65,6669726554696d65,67726f75705f6964,696e517565756554696d65,6c6f67526f6f744964,6d6f54797065,706172746974696f6e,7265636569766554696d65,72657175657374,7265747279,7365727669636550726f7669646572, 737461747573,757365724e616d65,])/QUORUM DEBUG [Thrift:17] 2012-06-28 15:59:42,198 ReadCallback.java (line 79) Blockfor is 2; setting up requests to /192.168.0.6,/192.168.0.1 DEBUG [Thrift:17] 2012-06-28 15:59:42,198 StorageProxy.java (line 674) reading data from /192.168.0.6 DEBUG [Thrift:17] 2012-06-28 15:59:42,198 StorageProxy.java (line 694) reading digest from /192.168.0.1 DEBUG [RequestResponseStage:2] 2012-06-28 15:59:42,199 ResponseVerbHandler.java (line 44) Processing response on a callback from 6556@/192.168.0.6 DEBUG [RequestResponseStage:2] 2012-06-28 15:59:42,199 AbstractRowResolver.java (line 66) Preprocessed data response DEBUG [RequestResponseStage:6] 2012-06-28 15:59:42,199 ResponseVerbHandler.java (line 44) Processing response on a callback from 6557@/192.168.0.1 DEBUG [RequestResponseStage:6] 2012-06-28 15:59:42,199 AbstractRowResolver.java (line 66) Preprocessed digest response DEBUG [Thrift:17] 2012-06-28 15:59:42,199 RowDigestResolver.java (line 65) resolving 2 responses DEBUG [Thrift:17] 2012-06-28 15:59:42,200 StorageProxy.java (line 733) Digest mismatch: org.apache.cassandra.service.DigestMismatchException: Mismatch for key DecoratedKey(100572974179274741747356988451225858264, 7878323239537570657254616e67307878) (b725ab25696111be49aaa7c4b7afa52d vs d41d8cd98f00b204e9800998ecf8427e) DEBUG [RequestResponseStage:9] 2012-06-28 15:59:42,201 ResponseVerbHandler.java (line 44) Processing response on a callback from 6558@/192.168.0.6 DEBUG [RequestResponseStage:7] 2012-06-28 15:59:42,201 ResponseVerbHandler.java (line 44) Processing response on a callback from 6559@/192.168.0.1 DEBUG [RequestResponseStage:9] 2012-06-28 15:59:42,201 AbstractRowResolver.java (line 66) Preprocessed data response DEBUG [RequestResponseStage:7] 2012-06-28 15:59:42,201 AbstractRowResolver.java (line 66) Preprocessed data response DEBUG [Thrift:17] 2012-06-28 15:59:42,201 RowRepairResolver.java (line 63) resolving 2 responses DEBUG [Thrift:17] 2012-06-28 15:59:42,201 SliceQueryFilter.java (line 123) collecting 0 of 2147483647: 6669726554696d65:false:13@1340870382109004 DEBUG [Thrift:17] 2012-06-28 15:59:42,201 SliceQueryFilter.java (line 123) collecting 1 of 2147483647: 67726f75705f6964:false:10@1340870382109014 DEBUG [Thrift:17] 2012-06-28 15:59:42,201 SliceQueryFilter.java (line 123) collecting 2 of 2147483647: 696e517565756554696d65:false:13@1340870382109005 DEBUG [Thrift:17] 2012-06-28 15:59:42,201 SliceQueryFilter.java (line 123) collecting 3 of 2147483647: 6c6f67526f6f744964:false:7@1340870382109015 DEBUG [Thrift:17] 2012-06-28 15:59:42,202 SliceQueryFilter.java (line 123) collecting 4 of 2147483647: 6d6f54797065:false:6@1340870382109009 DEBUG [Thrift:17] 2012-06-28 15:59:42,202 SliceQueryFilter.java (line 123) collecting 5 of 2147483647: 706172746974696f6e:false:2@1340870382109001 DEBUG [Thrift:17] 2012-06-28 15:59:42,202 SliceQueryFilter.java (line 123) collecting 6 of 2147483647: 7265636569766554696d65:false:13@1340870382109003 DEBUG [Thrift:17] 2012-06-28 15:59:42,202 SliceQueryFilter.java (line 123) collecting 7 of 2147483647: 72657175657374:false:300@1340870382109013 DEBUG [RequestResponseStage:5] 2012-06-28 15:59:42,202 ResponseVerbHandler.java (line 44) Processing response on a callback from 6552@/192.168.0.1 DEBUG [Thrift:17] 2012-06-28 15:59:42,202 SliceQueryFilter.java (line 123) collecting 8 of 2147483647: 7265747279:false:1@1340870382109006 DEBUG [Thrift:17] 2012-06-28 15:59:42,202 SliceQueryFilter.java (line 123) collecting 9 of 2147483647: 7365727669636550726f7669646572:false:4@1340870382109007 DEBUG
Re: Consistency Problem with Quorum consistencyLevel configuration
Hi After enable Cassandra debug log, I got following log, it shows the delete mutation send to other two nodes rather then local node. And then the read command come to the local nodes. And local one found the mismatch. But I don't know why local node return the local dirty data. It supposed to repair the data, and return correct one? 192.168.0.6: DEBUG [MutationStage:61] 2012-06-26 23:09:00,036 RowMutationVerbHandler.java (line 60) RowMutation(keyspace='drc', key='33323130537570657254616e6730', modifications=[ColumnFamily(queue -deleted at 1340723340044000- [])]) applied. Sending response to 3555@/ 192.168.0.5 192.168.0.4: DEBUG [MutationStage:40] 2012-06-26 23:09:00,041 RowMutationVerbHandler.java (line 60) RowMutation(keyspace='drc', key='33323130537570657254616e6730', modifications=[ColumnFamily(queue -deleted at 1340723340044000- [])]) applied. Sending response to 3556@/ 192.168.0.5 192.168.0.5 (local one): DEBUG [pool-2-thread-20] 2012-06-26 23:09:00,105 StorageProxy.java (line 705) Digest mismatch: org.apache.cassandra.service.DigestMismatchException: Mismatch for key DecoratedKey(7649972972837658739074639933581556, 33323130537570657254616e6730) (b20ac6ec0d29393d70e200027c094d13 vs d41d8cd98f00b204e9800998ecf8427e) 2012/6/25 Jason Tang ares.t...@gmail.com Hi I met the consistency problem when we have Quorum for both read and write. I use MultigetSubSliceQuery to query rows from super column limit size 100, and then read it, then delete it. And start another around. But I found, the row which should be delete by last query, it still shown from next around query. And also form normal Column Family, I updated the value of one column from status='FALSE' to status='TURE', and next time I query it, the status still 'FALSE'. More detail: - It not happened not every time (1/10,000) - The time between two round query is around 500 ms (but we found two query which 2 seconds happened later then the first one, still have this consistency problem) - We use ntp as our cluster time synchronization solution. - We have 6 nodes, and replication factor is 3 Some body say, Cassandra suppose to have such problem, because read may not happen before write inside Cassandra. But for two seconds?! And if so, it meaningless to have Quorum or other consistency level configuration. So first of all, is it the correct behavior of Cassandra, and if not, what data we need to analyze for further investment. BRs Ares
Consistency Problem with Quorum consistencyLevel configuration
Hi I met the consistency problem when we have Quorum for both read and write. I use MultigetSubSliceQuery to query rows from super column limit size 100, and then read it, then delete it. And start another around. But I found, the row which should be delete by last query, it still shown from next around query. And also form normal Column Family, I updated the value of one column from status='FALSE' to status='TURE', and next time I query it, the status still 'FALSE'. More detail: - It not happened not every time (1/10,000) - The time between two round query is around 500 ms (but we found two query which 2 seconds happened later then the first one, still have this consistency problem) - We use ntp as our cluster time synchronization solution. - We have 6 nodes, and replication factor is 3 Some body say, Cassandra suppose to have such problem, because read may not happen before write inside Cassandra. But for two seconds?! And if so, it meaningless to have Quorum or other consistency level configuration. So first of all, is it the correct behavior of Cassandra, and if not, what data we need to analyze for further investment. BRs Ares
Re: GCInspector works every 10 seconds!
Hi After I enable key cache and row cache, the problem gone, I guess it because we have lots of data in SSTable, and it takes more time, memory and cpu to search the data. BRs //Tang Weiqiang 2012/6/18 aaron morton aa...@thelastpickle.com It is also strange that although no data in Cassandra can fulfill the query conditions, but it takes more time if we have more data in Cassandra. These log messages: DEBUG [ReadStage:89] 2012-06-17 20:17:26,958 SliceQueryFilter.java (line 123) collecting 0 of 5000: 7fff0137f63408920e049c22:true:4@1339865451865018 DEBUG [ReadStage:89] 2012-06-17 20:17:26,958 SliceQueryFilter.java (line 123) collecting 0 of 5000: 7fff0137f63408a0eeab052a:true:4@1339865451866000 Say that the slice query read columns from the disk that were deleted. Have you tried your test with a clean (no files on disk) database ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 18/06/2012, at 12:36 AM, Jason Tang wrote: Hi After I change log level to DEBUG, I found some log. Although we don't have traffic to Cassandra, but we have scheduled the task to perform the sliceQuery. We use time-stamp as the index, we will perform the query by every second to check if we have tasks to do. After 24 hours, we have 40G data in Cassandra, and we configure Casssandra as Max JVM Heap 6G, memtable 1G, disk_access_mode: mmap_index_only. It is also strange that although no data in Cassandra can fulfill the query conditions, but it takes more time if we have more data in Cassandra. Because we total have 20 million records in Cassandra which has time stamp as the index, and we query by MultigetSubSliceQuery, and set the range the value which not match any data in Cassnadra, So it suppose to return fast, but as we have 20 million data, it takes 2 seconds to get the query result. Is the GC caused by the scheduled query operation, and why it takes so many memory. Could we improve it? System.log: INFO [ScheduledTasks:1] 2012-06-17 20:17:13,574 GCInspector.java (line 123) GC for ParNew: 559 ms for 1 collections, 3258240912 used; max is 6274678784 DEBUG [ReadStage:99] 2012-06-17 20:17:25,563 SliceQueryFilter.java (line 123) collecting 0 of 5000: 0138ad1035880137f3372f3e0e28e3b6:false:36@1339815309124015 DEBUG [ReadStage:99] 2012-06-17 20:17:25,565 ReadVerbHandler.java (line 60) Read key 3331; sending response to 158060445@/192.168.0.3 DEBUG [ReadStage:96] 2012-06-17 20:17:25,845 SliceQueryFilter.java (line 123) collecting 0 of 5000: 0138ad1035880137f33a80cf6cb5d383:false:36@1339815526613007 DEBUG [ReadStage:96] 2012-06-17 20:17:25,847 ReadVerbHandler.java (line 60) Read key 3233; sending response to 158060447@/192.168.0.3 DEBUG [ReadStage:105] 2012-06-17 20:17:25,952 SliceQueryFilter.java (line 123) collecting 0 of 5000: 0138ad1035880137f330cd70c86690cd:false:36@1339814890872015 DEBUG [ReadStage:105] 2012-06-17 20:17:25,953 ReadVerbHandler.java (line 75) digest is d41d8cd98f00b204e9800998ecf8427e DEBUG [ReadStage:105] 2012-06-17 20:17:25,953 ReadVerbHandler.java (line 60) Read key 3139; sending response to 158060448@/192.168.0.3 DEBUG [ReadStage:89] 2012-06-17 20:17:25,959 CollationController.java (line 191) collectAllData DEBUG [ReadStage:108] 2012-06-17 20:17:25,959 CollationController.java (line 191) collectAllData DEBUG [ReadStage:107] 2012-06-17 20:17:25,959 CollationController.java (line 191) collectAllData DEBUG [ReadStage:89] 2012-06-17 20:17:26,958 SliceQueryFilter.java (line 123) collecting 0 of 5000: 7fff0137f63408920e049c22:true:4@1339865451865018 DEBUG [ReadStage:89] 2012-06-17 20:17:26,958 SliceQueryFilter.java (line 123) collecting 0 of 5000: 7fff0137f63408a0eeab052a:true:4@1339865451866000 DEBUG [ReadStage:89] 2012-06-17 20:17:26,959 SliceQueryFilter.java (line 123) collecting 0 of 5000: 7fff0137f63408b1319577c9:true:4@1339865451867003 DEBUG [ReadStage:89] 2012-06-17 20:17:26,959 SliceQueryFilter.java (line 123) collecting 0 of 5000: 7fff0137f63408c081e0b8a3:true:4@1339865451867004 DEBUG [ReadStage:89] 2012-06-17 20:17:26,959 SliceQueryFilter.java (line 123) collecting 0 of 5000: 7fff0137f6340deefb8a0627:true:4@1339865451920001 DEBUG [ReadStage:89] 2012-06-17 20:17:26,959 SliceQueryFilter.java (line 123) collecting 0 of 5000: 7fff0137f6340df9c21e9979:true:4@1339865451923002 DEBUG [ReadStage:89] 2012-06-17 20:17:26,959 SliceQueryFilter.java (line 123) collecting 0 of 5000: 7fff0137f6340e095ead1498:true:4@1339865451928000 DEBUG [ReadStage:89] 2012-06-17 20:17:26,960 SliceQueryFilter.java (line 123) collecting 0 of 5000: 7fff0137f6340e1af16cf151:true:4@1339865451935000 DEBUG [ReadStage:89] 2012-06-17 20:17:26,960 SliceQueryFilter.java (line 123) collecting
GCInspector works every 10 seconds!
Hi After running load testing for 24 hours(insert, update and delete), now no new traffic to Cassandra, but Cassnadra shows still have high load(CPU usage), from the system.log, it shows it always perform GC. I don't know why it work as that, seems memory is not low. Here is some configuration and log, where I can find the clue why Cassandra works as this? cassandra.yaml disk_access_mode: mmap_index_only # /opt/cassandra/bin/nodetool -h 127.0.0.1 -p 6080 tpstats Pool NameActive Pending Completed Blocked All time blocked ReadStage 0 045387558 0 0 RequestResponseStage 0 096568347 0 0 MutationStage0 060215102 0 0 ReadRepairStage0 0 0 0 0 ReplicateOnWriteStage 0 0 0 0 0 GossipStage 0 0 399012 0 0 AntiEntropyStage 0 0 0 0 0 MigrationStage 0 0 30 0 0 MemtablePostFlusher 0 0 279 0 0 StreamStage 0 0 0 0 0 FlushWriter0 0 1846 0 1052 MiscStage 0 0 0 0 0 InternalResponseStage 0 0 00 0 HintedHandoff 0 0 5 0 0 Message type Dropped RANGE_SLICE 0 READ_REPAIR 0 BINARY 0 READ 1 MUTATION 1390 REQUEST_RESPONSE 0 # /opt/cassandra/bin/nodetool -h 127.0.0.1 -p 6080 info Token: 56713727820156410577229101238628035242 Gossip active: true Load : 37.57 GB Generation No: 1339813956 Uptime (seconds) : 120556 Heap Memory (MB) : 3261.14 / 5984.00 Data Center : datacenter1 Rack : rack1 Exceptions : 0 INFO [ScheduledTasks:1] 2012-06-17 19:47:36,633 GCInspector.java (line 123) GC for ParNew: 222 ms for 1 collections, 2046077640 used; max is 6274678784 INFO [ScheduledTasks:1] 2012-06-17 19:48:41,714 GCInspector.java (line 123) GC for ParNew: 262 ms for 1 collections, 2228128408 used; max is 6274678784 INFO [ScheduledTasks:1] 2012-06-17 19:48:49,717 GCInspector.java (line 123) GC for ParNew: 237 ms for 1 collections, 2390412728 used; max is 6274678784 INFO [ScheduledTasks:1] 2012-06-17 19:48:57,719 GCInspector.java (line 123) GC for ParNew: 223 ms for 1 collections, 2508702896 used; max is 6274678784 INFO [ScheduledTasks:1] 2012-06-17 19:50:01,988 GCInspector.java (line 123) GC for ParNew: 232 ms for 1 collections, 2864574832 used; max is 6274678784 INFO [ScheduledTasks:1] 2012-06-17 19:50:10,075 GCInspector.java (line 123) GC for ParNew: 208 ms for 1 collections, 2964629856 used; max is 6274678784 INFO [ScheduledTasks:1] 2012-06-17 19:50:21,078 GCInspector.java (line 123) GC for ParNew: 258 ms for 1 collections, 3149127368 used; max is 6274678784 INFO [ScheduledTasks:1] 2012-06-17 19:51:26,095 GCInspector.java (line 123) GC for ParNew: 213 ms for 1 collections, 3421495400 used; max is 6274678784 INFO [ScheduledTasks:1] 2012-06-17 19:51:34,097 GCInspector.java (line 123) GC for ParNew: 218 ms for 1 collections, 3543978312 used; max is 6274678784 INFO [ScheduledTasks:1] 2012-06-17 19:52:37,229 GCInspector.java (line 123) GC for ParNew: 221 ms for 1 collections, 375229 used; max is 6274678784 INFO [ScheduledTasks:1] 2012-06-17 19:52:37,230 GCInspector.java (line 123) GC for ConcurrentMarkSweep: 206 ms for 1 collections, 3752313400 used; max is 6274678784 INFO [ScheduledTasks:1] 2012-06-17 19:52:46,507 GCInspector.java (line 123) GC for ParNew: 243 ms for 1 collections, 3663162192 used; max is 6274678784 INFO [ScheduledTasks:1] 2012-06-17 19:52:54,510 GCInspector.java (line 123) GC for ParNew: 283 ms for 1 collections, 1582282248 used; max is 6274678784 INFO [ScheduledTasks:1] 2012-06-17 19:54:01,704 GCInspector.java (line 123) GC for ParNew: 235 ms for 1 collections, 1935534800 used; max is 6274678784 INFO [ScheduledTasks:1] 2012-06-17 19:55:13,747 GCInspector.java (line 123) GC for ParNew: 233 ms for 1 collections, 2356975504 used; max is 6274678784 INFO [ScheduledTasks:1] 2012-06-17 19:55:21,749 GCInspector.java (line 123) GC for ParNew: 264 ms for 1 collections, 2530976328 used; max is 6274678784 INFO [ScheduledTasks:1] 2012-06-17 19:55:29,794 GCInspector.java (line 123) GC for ParNew: 224 ms for 1 collections, 2592311336 used; max is 6274678784 BRs //Ares
Re: GCInspector works every 10 seconds!
Hi After I change log level to DEBUG, I found some log. Although we don't have traffic to Cassandra, but we have scheduled the task to perform the sliceQuery. We use time-stamp as the index, we will perform the query by every second to check if we have tasks to do. After 24 hours, we have 40G data in Cassandra, and we configure Casssandra as Max JVM Heap 6G, memtable 1G, disk_access_mode: mmap_index_only. It is also strange that although no data in Cassandra can fulfill the query conditions, but it takes more time if we have more data in Cassandra. Because we total have 20 million records in Cassandra which has time stamp as the index, and we query by MultigetSubSliceQuery, and set the range the value which not match any data in Cassnadra, So it suppose to return fast, but as we have 20 million data, it takes 2 seconds to get the query result. Is the GC caused by the scheduled query operation, and why it takes so many memory. Could we improve it? System.log: INFO [ScheduledTasks:1] 2012-06-17 20:17:13,574 GCInspector.java (line 123) GC for ParNew: 559 ms for 1 collections, 3258240912 used; max is 6274678784 DEBUG [ReadStage:99] 2012-06-17 20:17:25,563 SliceQueryFilter.java (line 123) collecting 0 of 5000: 0138ad1035880137f3372f3e0e28e3b6:false:36@1339815309124015 DEBUG [ReadStage:99] 2012-06-17 20:17:25,565 ReadVerbHandler.java (line 60) Read key 3331; sending response to 158060445@/192.168.0.3 DEBUG [ReadStage:96] 2012-06-17 20:17:25,845 SliceQueryFilter.java (line 123) collecting 0 of 5000: 0138ad1035880137f33a80cf6cb5d383:false:36@1339815526613007 DEBUG [ReadStage:96] 2012-06-17 20:17:25,847 ReadVerbHandler.java (line 60) Read key 3233; sending response to 158060447@/192.168.0.3 DEBUG [ReadStage:105] 2012-06-17 20:17:25,952 SliceQueryFilter.java (line 123) collecting 0 of 5000: 0138ad1035880137f330cd70c86690cd:false:36@1339814890872015 DEBUG [ReadStage:105] 2012-06-17 20:17:25,953 ReadVerbHandler.java (line 75) digest is d41d8cd98f00b204e9800998ecf8427e DEBUG [ReadStage:105] 2012-06-17 20:17:25,953 ReadVerbHandler.java (line 60) Read key 3139; sending response to 158060448@/192.168.0.3 DEBUG [ReadStage:89] 2012-06-17 20:17:25,959 CollationController.java (line 191) collectAllData DEBUG [ReadStage:108] 2012-06-17 20:17:25,959 CollationController.java (line 191) collectAllData DEBUG [ReadStage:107] 2012-06-17 20:17:25,959 CollationController.java (line 191) collectAllData DEBUG [ReadStage:89] 2012-06-17 20:17:26,958 SliceQueryFilter.java (line 123) collecting 0 of 5000: 7fff0137f63408920e049c22:true:4@1339865451865018 DEBUG [ReadStage:89] 2012-06-17 20:17:26,958 SliceQueryFilter.java (line 123) collecting 0 of 5000: 7fff0137f63408a0eeab052a:true:4@1339865451866000 DEBUG [ReadStage:89] 2012-06-17 20:17:26,959 SliceQueryFilter.java (line 123) collecting 0 of 5000: 7fff0137f63408b1319577c9:true:4@1339865451867003 DEBUG [ReadStage:89] 2012-06-17 20:17:26,959 SliceQueryFilter.java (line 123) collecting 0 of 5000: 7fff0137f63408c081e0b8a3:true:4@1339865451867004 DEBUG [ReadStage:89] 2012-06-17 20:17:26,959 SliceQueryFilter.java (line 123) collecting 0 of 5000: 7fff0137f6340deefb8a0627:true:4@1339865451920001 DEBUG [ReadStage:89] 2012-06-17 20:17:26,959 SliceQueryFilter.java (line 123) collecting 0 of 5000: 7fff0137f6340df9c21e9979:true:4@1339865451923002 DEBUG [ReadStage:89] 2012-06-17 20:17:26,959 SliceQueryFilter.java (line 123) collecting 0 of 5000: 7fff0137f6340e095ead1498:true:4@1339865451928000 DEBUG [ReadStage:89] 2012-06-17 20:17:26,960 SliceQueryFilter.java (line 123) collecting 0 of 5000: 7fff0137f6340e1af16cf151:true:4@1339865451935000 DEBUG [ReadStage:89] 2012-06-17 20:17:26,960 SliceQueryFilter.java (line 123) collecting 0 of 5000: 7fff0137f6340e396cfdc9fa:true:4@133986545195 BRs //Ares 2012/6/17 Jason Tang ares.t...@gmail.com Hi After running load testing for 24 hours(insert, update and delete), now no new traffic to Cassandra, but Cassnadra shows still have high load(CPU usage), from the system.log, it shows it always perform GC. I don't know why it work as that, seems memory is not low. Here is some configuration and log, where I can find the clue why Cassandra works as this? cassandra.yaml disk_access_mode: mmap_index_only # /opt/cassandra/bin/nodetool -h 127.0.0.1 -p 6080 tpstats Pool NameActive Pending Completed Blocked All time blocked ReadStage 0 045387558 0 0 RequestResponseStage 0 096568347 0 0 MutationStage0 060215102 0 0 ReadRepairStage0 0 0 0 0 ReplicateOnWriteStage 0 0 0
Re: Much more native memory used by Cassandra then the configured JVM heap size
We suppose the cached memory will be released by OS, but from /proc/meminfo , the cached memory is in Active status, so I am not sure if it will be release by OS. And for low memory, because we found Unable to reduce heap usage since there are no dirty column families in system.log, and then Cassandra on this node marked as down. And because we configure JVM heap 6G and memtable 1G, so I don't know why we have OOMs error. So we wonder the Cassandra down caused by 1. Low OS memory 2. impact by our configuration: memtable_flush_writers=32, memtable_flush_queue_size=12 3. Caused by delete operation (The data in our traffic is dynamical, which means each request may be deleted in one hour, new will be inserted) https://issues.apache.org/jira/browse/CASSANDRA-3741 So we want to find out why the Cassandra down after 24 hours load test. (RCA of OOM) 2012/6/12 aaron morton aa...@thelastpickle.com see http://wiki.apache.org/cassandra/FAQ#mmap which cause the OS low memory. If the memory is used for mmapped access the os can get it back later. Is the low free memory causing a problem ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 12/06/2012, at 5:52 PM, Jason Tang wrote: Hi I found some information of this issue And seems we can have other strategy for data access to reduce mmap usage, in order to use less memory. But I didn't find the document to describe the parameters for Cassandra 1.x, is it a good way to use this parameter to reduce shared memory usage and what's the impact? (btw, our data model is dynamical, which means the although the through put is high, but the life cycle of the data is short, one hour or less). # Choices are auto, standard, mmap, and mmap_index_only. disk_access_mode: auto http://comments.gmane.org/gmane.comp.db.cassandra.user/7390 2012/6/12 Jason Tang ares.t...@gmail.com See my post, I limit the HVM heap 6G, but actually Cassandra will use more memory which is not calculated in JVM heap. I use top to monitor total memory used by Cassandra. = -Xms6G -Xmx6G -Xmn1600M 2012/6/12 Jeffrey Kesselman jef...@gmail.com Btw. I suggest you spin up JConsole as it will give you much more detai kon what your VM is actually doing. On Mon, Jun 11, 2012 at 9:14 PM, Jason Tang ares.t...@gmail.com wrote: Hi We have some problem with Cassandra memory usage, we configure the JVM HEAP 6G, but after runing Cassandra for several hours (insert, update, delete). The total memory used by Cassandra go up to 15G, which cause the OS low memory. So I wonder if it is normal to have so many memory used by cassandra? And how to limit the native memory used by Cassandra? === Cassandra 1.0.3, 64 bit jdk. Memory ocupied by Cassandra 15G PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 9567 casadm20 0 28.3g 15g 9.1g S 269 65.1 385:57.65 java = -Xms6G -Xmx6G -Xmn1600M # ps -ef | grep 9567 casadm9567 1 55 Jun11 ?05:59:44 /opt/jdk1.6.0_29/bin/java -ea -javaagent:/opt/dve/cassandra/bin/../lib/jamm-0.2.5.jar -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms6G -Xmx6G -Xmn1600M -XX:+HeapDumpOnOutOfMemoryError -Xss128k -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=6080 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Daccess.properties=/opt/dve/cassandra/conf/access.properties -Dpasswd.properties=/opt/dve/cassandra/conf/passwd.properties -Dpasswd.mode=MD5 -Dlog4j.configuration=log4j-server.properties -Dlog4j.defaultInitOverride=true -cp /opt/dve/cassandra/bin/../conf:/opt/dve/cassandra/bin/../build/classes/main:/opt/dve/cassandra/bin/../build/classes/thrift:/opt/dve/cassandra/bin/../lib/Cassandra-Extensions-1.0.0.jar:/opt/dve/cassandra/bin/../lib/antlr-3.2.jar:/opt/dve/cassandra/bin/../lib/apache-cassandra-1.0.3.jar:/opt/dve/cassandra/bin/../lib/apache-cassandra-clientutil-1.0.3.jar:/opt/dve/cassandra/bin/../lib/apache-cassandra-thrift-1.0.3.jar:/opt/dve/cassandra/bin/../lib/avro-1.4.0-fixes.jar:/opt/dve/cassandra/bin/../lib/avro-1.4.0-sources-fixes.jar:/opt/dve/cassandra/bin/../lib/commons-cli-1.1.jar:/opt/dve/cassandra/bin/../lib/commons-codec-1.2.jar:/opt/dve/cassandra/bin/../lib/commons-lang-2.4.jar:/opt/dve/cassandra/bin/../lib/compress-lzf-0.8.4.jar:/opt/dve/cassandra/bin/../lib/concurrentlinkedhashmap-lru-1.2.jar:/opt/dve/cassandra/bin/../lib/guava-r08.jar:/opt/dve/cassandra/bin/../lib/high-scale-lib-1.1.2.jar:/opt/dve/cassandra/bin/../lib/jackson-core-asl-1.4.0.jar:/opt/dve/cassandra/bin/../lib/jackson-mapper-asl-1.4.0.jar:/opt/dve/cassandra/bin/../lib
Re: Much more native memory used by Cassandra then the configured JVM heap size
See my post, I limit the HVM heap 6G, but actually Cassandra will use more memory which is not calculated in JVM heap. I use top to monitor total memory used by Cassandra. = -Xms6G -Xmx6G -Xmn1600M 2012/6/12 Jeffrey Kesselman jef...@gmail.com Btw. I suggest you spin up JConsole as it will give you much more detai kon what your VM is actually doing. On Mon, Jun 11, 2012 at 9:14 PM, Jason Tang ares.t...@gmail.com wrote: Hi We have some problem with Cassandra memory usage, we configure the JVM HEAP 6G, but after runing Cassandra for several hours (insert, update, delete). The total memory used by Cassandra go up to 15G, which cause the OS low memory. So I wonder if it is normal to have so many memory used by cassandra? And how to limit the native memory used by Cassandra? === Cassandra 1.0.3, 64 bit jdk. Memory ocupied by Cassandra 15G PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 9567 casadm20 0 28.3g 15g 9.1g S 269 65.1 385:57.65 java = -Xms6G -Xmx6G -Xmn1600M # ps -ef | grep 9567 casadm9567 1 55 Jun11 ?05:59:44 /opt/jdk1.6.0_29/bin/java -ea -javaagent:/opt/dve/cassandra/bin/../lib/jamm-0.2.5.jar -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms6G -Xmx6G -Xmn1600M -XX:+HeapDumpOnOutOfMemoryError -Xss128k -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=6080 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Daccess.properties=/opt/dve/cassandra/conf/access.properties -Dpasswd.properties=/opt/dve/cassandra/conf/passwd.properties -Dpasswd.mode=MD5 -Dlog4j.configuration=log4j-server.properties -Dlog4j.defaultInitOverride=true -cp /opt/dve/cassandra/bin/../conf:/opt/dve/cassandra/bin/../build/classes/main:/opt/dve/cassandra/bin/../build/classes/thrift:/opt/dve/cassandra/bin/../lib/Cassandra-Extensions-1.0.0.jar:/opt/dve/cassandra/bin/../lib/antlr-3.2.jar:/opt/dve/cassandra/bin/../lib/apache-cassandra-1.0.3.jar:/opt/dve/cassandra/bin/../lib/apache-cassandra-clientutil-1.0.3.jar:/opt/dve/cassandra/bin/../lib/apache-cassandra-thrift-1.0.3.jar:/opt/dve/cassandra/bin/../lib/avro-1.4.0-fixes.jar:/opt/dve/cassandra/bin/../lib/avro-1.4.0-sources-fixes.jar:/opt/dve/cassandra/bin/../lib/commons-cli-1.1.jar:/opt/dve/cassandra/bin/../lib/commons-codec-1.2.jar:/opt/dve/cassandra/bin/../lib/commons-lang-2.4.jar:/opt/dve/cassandra/bin/../lib/compress-lzf-0.8.4.jar:/opt/dve/cassandra/bin/../lib/concurrentlinkedhashmap-lru-1.2.jar:/opt/dve/cassandra/bin/../lib/guava-r08.jar:/opt/dve/cassandra/bin/../lib/high-scale-lib-1.1.2.jar:/opt/dve/cassandra/bin/../lib/jackson-core-asl-1.4.0.jar:/opt/dve/cassandra/bin/../lib/jackson-mapper-asl-1.4.0.jar:/opt/dve/cassandra/bin/../lib/jamm-0.2.5.jar:/opt/dve/cassandra/bin/../lib/jline-0.9.94.jar:/opt/dve/cassandra/bin/../lib/json-simple-1.1.jar:/opt/dve/cassandra/bin/../lib/libthrift-0.6.jar:/opt/dve/cassandra/bin/../lib/log4j-1.2.16.jar:/opt/dve/cassandra/bin/../lib/servlet-api-2.5-20081211.jar:/opt/dve/cassandra/bin/../lib/slf4j-api-1.6.1.jar:/opt/dve/cassandra/bin/../lib/slf4j-log4j12-1.6.1.jar:/opt/dve/cassandra/bin/../lib/snakeyaml-1.6.jar:/opt/dve/cassandra/bin/../lib/snappy-java-1.0.4.1.jar org.apache.cassandra.thrift.CassandraDaemon == # nodetool -h 127.0.0.1 -p 6080 info Token: 85070591730234615865843651857942052864 Gossip active: true Load : 20.59 GB Generation No: 1339423322 Uptime (seconds) : 39626 Heap Memory (MB) : 3418.42 / 5984.00 Data Center : datacenter1 Rack : rack1 Exceptions : 0 = All row cache and key cache are disabled by default Key cache: disabled Row cache: disabled == # pmap 9567 9567: java START SIZE RSS PSS DIRTYSWAP PERM MAPPING 4000 36K 36K 36K 0K 0K r-xp /opt/jdk1.6.0_29/bin/java 40108000 8K 8K 8K 8K 0K rwxp /opt/jdk1.6.0_29/bin/java 4010a000 18040K 17988K 17988K 17988K 0K rwxp [heap] 00067ae0 6326700K 6258664K 6258664K 6258664K 0K rwxp [anon] 0007fd06b000 48724K 0K 0K 0K 0K rwxp [anon] 7fbed153 1331104K 0K 0K 0K 0K r-xs /var/cassandra/data/drc/queue-hb-219-Data.db 7fbf22918000 2097152K 0K 0K 0K 0K r-xs /var/cassandra/data/drc/queue-hb-219-Data.db 7fbfa2918000 2097148K 1124464K 1124462K 0K 0K r-xs /var/cassandra/data/drc/queue-hb-219-Data.db 7fc022917000 2097156K 2096496K 2096492K 0K 0K r
Re: Much more native memory used by Cassandra then the configured JVM heap size
Hi I found some information of this issue And seems we can have other strategy for data access to reduce mmap usage, in order to use less memory. But I didn't find the document to describe the parameters for Cassandra 1.x, is it a good way to use this parameter to reduce shared memory usage and what's the impact? (btw, our data model is dynamical, which means the although the through put is high, but the life cycle of the data is short, one hour or less). # Choices are auto, standard, mmap, and mmap_index_only. disk_access_mode: auto http://comments.gmane.org/gmane.comp.db.cassandra.user/7390 2012/6/12 Jason Tang ares.t...@gmail.com See my post, I limit the HVM heap 6G, but actually Cassandra will use more memory which is not calculated in JVM heap. I use top to monitor total memory used by Cassandra. = -Xms6G -Xmx6G -Xmn1600M 2012/6/12 Jeffrey Kesselman jef...@gmail.com Btw. I suggest you spin up JConsole as it will give you much more detai kon what your VM is actually doing. On Mon, Jun 11, 2012 at 9:14 PM, Jason Tang ares.t...@gmail.com wrote: Hi We have some problem with Cassandra memory usage, we configure the JVM HEAP 6G, but after runing Cassandra for several hours (insert, update, delete). The total memory used by Cassandra go up to 15G, which cause the OS low memory. So I wonder if it is normal to have so many memory used by cassandra? And how to limit the native memory used by Cassandra? === Cassandra 1.0.3, 64 bit jdk. Memory ocupied by Cassandra 15G PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 9567 casadm20 0 28.3g 15g 9.1g S 269 65.1 385:57.65 java = -Xms6G -Xmx6G -Xmn1600M # ps -ef | grep 9567 casadm9567 1 55 Jun11 ?05:59:44 /opt/jdk1.6.0_29/bin/java -ea -javaagent:/opt/dve/cassandra/bin/../lib/jamm-0.2.5.jar -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms6G -Xmx6G -Xmn1600M -XX:+HeapDumpOnOutOfMemoryError -Xss128k -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=6080 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Daccess.properties=/opt/dve/cassandra/conf/access.properties -Dpasswd.properties=/opt/dve/cassandra/conf/passwd.properties -Dpasswd.mode=MD5 -Dlog4j.configuration=log4j-server.properties -Dlog4j.defaultInitOverride=true -cp /opt/dve/cassandra/bin/../conf:/opt/dve/cassandra/bin/../build/classes/main:/opt/dve/cassandra/bin/../build/classes/thrift:/opt/dve/cassandra/bin/../lib/Cassandra-Extensions-1.0.0.jar:/opt/dve/cassandra/bin/../lib/antlr-3.2.jar:/opt/dve/cassandra/bin/../lib/apache-cassandra-1.0.3.jar:/opt/dve/cassandra/bin/../lib/apache-cassandra-clientutil-1.0.3.jar:/opt/dve/cassandra/bin/../lib/apache-cassandra-thrift-1.0.3.jar:/opt/dve/cassandra/bin/../lib/avro-1.4.0-fixes.jar:/opt/dve/cassandra/bin/../lib/avro-1.4.0-sources-fixes.jar:/opt/dve/cassandra/bin/../lib/commons-cli-1.1.jar:/opt/dve/cassandra/bin/../lib/commons-codec-1.2.jar:/opt/dve/cassandra/bin/../lib/commons-lang-2.4.jar:/opt/dve/cassandra/bin/../lib/compress-lzf-0.8.4.jar:/opt/dve/cassandra/bin/../lib/concurrentlinkedhashmap-lru-1.2.jar:/opt/dve/cassandra/bin/../lib/guava-r08.jar:/opt/dve/cassandra/bin/../lib/high-scale-lib-1.1.2.jar:/opt/dve/cassandra/bin/../lib/jackson-core-asl-1.4.0.jar:/opt/dve/cassandra/bin/../lib/jackson-mapper-asl-1.4.0.jar:/opt/dve/cassandra/bin/../lib/jamm-0.2.5.jar:/opt/dve/cassandra/bin/../lib/jline-0.9.94.jar:/opt/dve/cassandra/bin/../lib/json-simple-1.1.jar:/opt/dve/cassandra/bin/../lib/libthrift-0.6.jar:/opt/dve/cassandra/bin/../lib/log4j-1.2.16.jar:/opt/dve/cassandra/bin/../lib/servlet-api-2.5-20081211.jar:/opt/dve/cassandra/bin/../lib/slf4j-api-1.6.1.jar:/opt/dve/cassandra/bin/../lib/slf4j-log4j12-1.6.1.jar:/opt/dve/cassandra/bin/../lib/snakeyaml-1.6.jar:/opt/dve/cassandra/bin/../lib/snappy-java-1.0.4.1.jar org.apache.cassandra.thrift.CassandraDaemon == # nodetool -h 127.0.0.1 -p 6080 info Token: 85070591730234615865843651857942052864 Gossip active: true Load : 20.59 GB Generation No: 1339423322 Uptime (seconds) : 39626 Heap Memory (MB) : 3418.42 / 5984.00 Data Center : datacenter1 Rack : rack1 Exceptions : 0 = All row cache and key cache are disabled by default Key cache: disabled Row cache: disabled == # pmap 9567 9567: java START SIZE RSS PSS DIRTYSWAP PERM MAPPING 4000 36K 36K 36K 0K 0K r-xp /opt/jdk1.6.0_29/bin/java 40108000 8K 8K 8K
TimedOutException caused by Stop the world activity
Hi My system is 4 nodes 64 bit cassandra cluster, 6G big per node,default configuration (which means 1/3 heap for memtable), replicate number 3, write all, read one. When I run stress load testing, I got this TimedOutException, and some operation failed, and all traffic hang for a while. And when I have 1G memory 32 bit cassandra on standalone model, I didn't find so frequently Stop the world behavior. So I wonder what kind of operation will hang the cassandra system. How to collect information for tuning. From the system log and document, I guess there are three type operations: 1) Flush memtable when meet max size 2) Compact SSTable (why?) 3) Java GC system.log: INFO [main] 2012-05-25 16:12:17,054 ColumnFamilyStore.java (line 688) Enqueuing flush of Memtable-LocationInfo@1229893321(53/66 serialized/live bytes, 2 ops) INFO [FlushWriter:1] 2012-05-25 16:12:17,054 Memtable.java (line 239) Writing Memtable-LocationInfo@1229893321(53/66 serialized/live bytes, 2 ops) INFO [FlushWriter:1] 2012-05-25 16:12:17,166 Memtable.java (line 275) Completed flushing /var/proclog/raw/cassandra/data/system/LocationInfo-hb-2-Data.db (163 bytes) ... INFO [CompactionExecutor:441] 2012-05-28 08:02:55,345 CompactionTask.java (line 112) Compacting [SSTableReader(path='/var/proclog/raw/cassandra/data/myks/queue-hb-41-Data.db'), SSTableReader(path='/var/proclog/raw/cassandra/data/ myks /queue-hb-32-Data.db'), SSTableReader(path='/var/proclog/raw/cassandra/data/ myks /queue-hb-37-Data.db'), SSTableReader(path='/var/proclog/raw/cassandra/data/ myks /queue-hb-53-Data.db')] ... WARN [ScheduledTasks:1] 2012-05-28 08:02:26,619 GCInspector.java (line 146) Heap is 0.7993011015621736 full. You may need to reduce memtable and/or cache sizes. Cassandra will now flush up to the two largest memtables to free up memory. Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically INFO [ScheduledTasks:1] 2012-05-28 08:02:54,980 GCInspector.java (line 123) GC for ConcurrentMarkSweep: 728 ms for 2 collections, 3594946600 used; max is 6274678784 INFO [ScheduledTasks:1] 2012-05-28 08:41:34,030 GCInspector.java (line 123) GC for ParNew: 1668 ms for 1 collections, 4171503448 used; max is 6274678784 INFO [ScheduledTasks:1] 2012-05-28 08:41:48,978 GCInspector.java (line 123) GC for ParNew: 1087 ms for 1 collections, 2623067496 used; max is 6274678784 INFO [ScheduledTasks:1] 2012-05-28 08:41:48,987 GCInspector.java (line 123) GC for ConcurrentMarkSweep: 3198 ms for 3 collections, 2623361280 used; max is 6274678784 Timeout Exception: Caused by: org.apache.cassandra.thrift.TimedOutException: null at org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.java:19495) ~[na:na] at org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:1035) ~[na:na] at org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:1009) ~[na:na] at me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceServiceImpl.java:95) ~[na:na] ... 64 common frames omitted BRs //Tang Weiqiang
Re: Cassandra search performance
I try to search one column, this column store the time as the type Long, 1,000,000 data equally distributed in 24 hours, I only want to search certain time rang, eg from 01:30 to 01:50 or 08:00 to 12:00, but something stranger happened. Search 00:00 to 23:59 limit 100 It took less then 1 second scan 100 record Search 00:00 to 00:20 limit 100 It took more then one minute scan around 2,400 recods So the result shows it seems cassandra scan one by one to match the condition, and the data is not ordered in sequence. One more thing, to have equal condition, I make a redundant column to have equal condition, the value is same for all records. The search condition like get record where equal='equal' and time 00:00 and time 00:20 Is it the expected behavior of secondary index or I didn't use it correct. Because I used to have another test, I have one string column most of it is string 'true' and I add 100 'false' among 1,000,000 'true' , it shows it only scan 100 records. So how can I exam what happened inside cassadra, and where I can find out the detail of how secondary works? 在 2012年5月8日星期二,Maxim Potekhin 写道: Thanks for the comments, much appreciated. Maxim On 5/7/2012 3:22 AM, David Jeske wrote: On Sun, Apr 29, 2012 at 4:32 PM, Maxim Potekhin potek...@bnl.govjavascript:_e({}, 'cvml', 'potek...@bnl.gov'); wrote: Looking at your example,as I think you understand, you forgo indexes by combining two conditions in one query, thinking along the lines of what is often done in RDBMS. A scan is expected in this case, and there is no magic to avoid it. This sounds like a mis-understanding of how RDBMSs work. If you combine two conditions in a single SQL query, the SQL execution optimizer looks at the cardinality of any indicies. If it can successfully predict that one of the conditions significantly reduces the set of rows that would be considered (such as a status match having 200 hits vs 1M rows in the table), then it selects this index for the first-iteration, and each index hit causes a record lookup which is then tested for the other conditions. (This is one of several query-execution types RDBMS systems use) I'm no Cassandra expert, so I don't know what it does WRT index-selection, but from the page written on secondary indicies, it seems like if you just query on status, and do the other filtering yourself it'll probably do what you want... http://www.datastax.com/dev/blog/whats-new-cassandra-07-secondary-indexes However, if this query is important, you can easily index on two conditions, using a composite type (look it up), or string concatenation for quick and easy solution. This is not necessarily a good idea. Creating a composite index explodes the index size unnecessarily. If a condition can reduce a query to 200 records, there is no need to have a composite index including another condition.
Cassandra search performance
Hi We have the such CF, and use secondary index to search for simple data status, and among 1,000,000 row records, we have 200 records with status we want. But when we start to search, the performance is very poor, and check with the command ./bin/nodetool -h localhost -p 8199 cfstats , Cassandra read 1,000,000 records, and Read Latency is 0.2 ms, so totally it used 200 seconds. It use lots of CPU, and check the stack, all thread in Cassandra is read from socket. So I wonder, how to really use index to find the 200 records instead of scan all rows. (Supper Column?) *ColumnFamily: queue* * Key Validation Class: org.apache.cassandra.db.marshal.BytesType* * Default column value validator: org.apache.cassandra.db.marshal.BytesType* * Columns sorted by: org.apache.cassandra.db.marshal.BytesType* * Row cache size / save period in seconds / keys to save : 0.0/0/all* * Row Cache Provider: org.apache.cassandra.cache.ConcurrentLinkedHashCacheProvider* * Key cache size / save period in seconds: 0.0/0* * GC grace seconds: 0* * Compaction min/max thresholds: 4/32* * Read repair chance: 0.0* * Replicate on write: false* * Bloom Filter FP chance: default* * Built indexes: [queue.idxStatus]* * Column Metadata:* *Column Name: status (737461747573)* * Validation Class: org.apache.cassandra.db.marshal.AsciiType* * Index Name: idxStatus* * Index Type: KEYS* * * BRs //Jason
Re: Cassandra search performance
And I found, if I only have the search condition status, it only scan 200 records. But if I combine another condition partition then it scan all records because partition condition match all records. But combine with other condition such as userName, even all userName is same in the 1,000,000 records, it only scan 200 records. So it impacted by scan execution plan, if we have several search conditions, how it works? Do we have the similar execution plan in Cassandra? 在 2012年4月25日 下午9:18,Jason Tang ares.t...@gmail.com写道: Hi We have the such CF, and use secondary index to search for simple data status, and among 1,000,000 row records, we have 200 records with status we want. But when we start to search, the performance is very poor, and check with the command ./bin/nodetool -h localhost -p 8199 cfstats , Cassandra read 1,000,000 records, and Read Latency is 0.2 ms, so totally it used 200 seconds. It use lots of CPU, and check the stack, all thread in Cassandra is read from socket. So I wonder, how to really use index to find the 200 records instead of scan all rows. (Supper Column?) *ColumnFamily: queue* * Key Validation Class: org.apache.cassandra.db.marshal.BytesType* * Default column value validator: org.apache.cassandra.db.marshal.BytesType* * Columns sorted by: org.apache.cassandra.db.marshal.BytesType* * Row cache size / save period in seconds / keys to save : 0.0/0/all* * Row Cache Provider: org.apache.cassandra.cache.ConcurrentLinkedHashCacheProvider* * Key cache size / save period in seconds: 0.0/0* * GC grace seconds: 0* * Compaction min/max thresholds: 4/32* * Read repair chance: 0.0* * Replicate on write: false* * Bloom Filter FP chance: default* * Built indexes: [queue.idxStatus]* * Column Metadata:* *Column Name: status (737461747573)* * Validation Class: org.apache.cassandra.db.marshal.AsciiType* * Index Name: idxStatus* * Index Type: KEYS* * * BRs //Jason
Re: Cassandra search performance
1.0.8 在 2012年4月25日 下午10:38,Philip Shon philip.s...@gmail.com写道: what version of cassandra are you using. I found a big performance hit when querying on the secondary index. I came across this bug in versions prior to 1.1 https://issues.apache.org/jira/browse/CASSANDRA-3545 Hope that helps. 2012/4/25 Jason Tang ares.t...@gmail.com And I found, if I only have the search condition status, it only scan 200 records. But if I combine another condition partition then it scan all records because partition condition match all records. But combine with other condition such as userName, even all userName is same in the 1,000,000 records, it only scan 200 records. So it impacted by scan execution plan, if we have several search conditions, how it works? Do we have the similar execution plan in Cassandra? 在 2012年4月25日 下午9:18,Jason Tang ares.t...@gmail.com写道: Hi We have the such CF, and use secondary index to search for simple data status, and among 1,000,000 row records, we have 200 records with status we want. But when we start to search, the performance is very poor, and check with the command ./bin/nodetool -h localhost -p 8199 cfstats , Cassandra read 1,000,000 records, and Read Latency is 0.2 ms, so totally it used 200 seconds. It use lots of CPU, and check the stack, all thread in Cassandra is read from socket. So I wonder, how to really use index to find the 200 records instead of scan all rows. (Supper Column?) *ColumnFamily: queue* * Key Validation Class: org.apache.cassandra.db.marshal.BytesType* * Default column value validator: org.apache.cassandra.db.marshal.BytesType* * Columns sorted by: org.apache.cassandra.db.marshal.BytesType* * Row cache size / save period in seconds / keys to save : 0.0/0/all* * Row Cache Provider: org.apache.cassandra.cache.ConcurrentLinkedHashCacheProvider* * Key cache size / save period in seconds: 0.0/0* * GC grace seconds: 0* * Compaction min/max thresholds: 4/32* * Read repair chance: 0.0* * Replicate on write: false* * Bloom Filter FP chance: default* * Built indexes: [queue.idxStatus]* * Column Metadata:* *Column Name: status (737461747573)* * Validation Class: org.apache.cassandra.db.marshal.AsciiType* * Index Name: idxStatus* * Index Type: KEYS* * * BRs //Jason
Consistence for node shutdown and startup
Hi Here is the case, if we have only two nodes, which share the data (write one, read one), node One node Two | Stopped Continue working and update the data. | stopped stopped | start working stopped | update data stopped | startedstart working v How about the conflict data when the two node on line separately. How it synchronized by two nodes when they both on line finally? BRs //Tang Weiqiang