Re: Any better solution to avoid TombstoneOverwhelmingException?

2014-06-30 Thread Jason Tang
The traffic is continuously, which means when insert new records, at the
same time, old records are executed (deleted)

And the execution are based on time condition, so some stored records will
be executed (deleted), some will be executed in the next round.

For given TTL, it is same as delete, it will also generate the Tombstone.


2014-06-30 15:58 GMT+08:00 DuyHai Doan :

> Why don't you store all current data into one partition and for the next
> round of execution, switch to a new partition ? This way you don't even
> need to remove data (if you insert with a given TTL)
>
>
> On Mon, Jun 30, 2014 at 8:43 AM, Jason Tang  wrote:
>
>> Our application will use Cassandra to persistent for asynchronous tasks,
>> so in one time period, lots of records will be created in Cassandra (more
>> then 10M). Later it will be executed.
>>
>> Due to disk space limitation, the executed records will be deleted.
>> After gc_grace_seconds, it is expected to be auto removed from the disk.
>>
>> So for the next round of execution, the deleted records, should not be
>> queried out.
>>
>> In this traffic, it will be generated lots of tombstones.
>>
>> To avoid TombstoneOverwhelmingException, One way is to larger
>> tombstone_failure_threshold, but is there any impact for the system's
>> performance on my traffic model, or is there any better solution for this
>> traffic?
>>
>>
>> BRs
>> //Tang
>>
>
>


Any better solution to avoid TombstoneOverwhelmingException?

2014-06-29 Thread Jason Tang
Our application will use Cassandra to persistent for asynchronous tasks, so
in one time period, lots of records will be created in Cassandra (more then
10M). Later it will be executed.

Due to disk space limitation, the executed records will be deleted.
After gc_grace_seconds, it is expected to be auto removed from the disk.

So for the next round of execution, the deleted records, should not be
queried out.

In this traffic, it will be generated lots of tombstones.

To avoid TombstoneOverwhelmingException, One way is to larger
tombstone_failure_threshold, but is there any impact for the system's
performance on my traffic model, or is there any better solution for this
traffic?


BRs
//Tang


Re: heap issues - looking for advices on gc tuning

2013-10-30 Thread Jason Tang
What's configuration of following parameters
memtable_flush_queue_size:
concurrent_compactors:


2013/10/30 Piavlo 

> Hi,
>
> Below I try to give a full picture to the problem I'm facing.
>
> This is a 12 node cluster, running on ec2 with m2.xlarge instances (17G
> ram , 2 cpus).
> Cassandra version is 1.0.8
> Cluster normally having between 3000 - 1500 reads per second (depends on
> time of the day) and 1700 - 800 writes per second- according to Opscetner.
> RF=3, now row caches are used.
>
> Memory relevant  configs from cassandra.yaml:
> flush_largest_memtables_at: 0.85
> reduce_cache_sizes_at: 0.90
> reduce_cache_capacity_to: 0.75
> commitlog_total_space_in_mb: 4096
>
> relevant JVM options used are:
> -Xms8000M -Xmx8000M -Xmn400M
> -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
> -XX:MaxTenuringThreshold=1
> -XX:**CMSInitiatingOccupancyFraction**=80 -XX:+**
> UseCMSInitiatingOccupancyOnly"
>
> Now what happens is that with these settings after cassandra process
> restart, the GC it working fine at the beginning, and heap used looks like a
> saw with perfect teeth, eventually the teeth size start to diminish until
> the teeth become not noticable, and then cassandra starts to spend lot's of
> CPU time
> doing gc. It takes about 2 weeks until for such cycle , and then I need to
> restart cassandra process to improve performance.
> During all this time there are no memory  related messages in cassandra
> system.log, except a "GC for ParNew: little above 200ms" once in a while.
>
> Things i've already done trying to reduce this eventual heap pressure.
> 1) reducing bloom_filter_fp_chance  resulting in reduction from ~700MB to
> ~280MB total per node based on all Filter.db files on the node.
> 2) reducing key cache sizes, and dropping key_caches for CFs which do no
> not have many reads
> 3) the heap size was increased from 7000M to 8000M
> All these have not really helped , just the increase from 7000M to 8000M,
> helped in increase the cycle till excessive gc from ~9 days to ~14 days.
>
> I've tried to graph overtime the data that is supposed to be in heap vs
> actual heap size, by summing up all CFs bloom filter sizes + all CFs key
> cache capacities multipled by average key size + all CFs memtables data
> size reported (i've overestimated the data size a bit on purpose to be on
> the safe size).
> Here is a link to graph showing last 2 day metrics for a node which could
> not effectively do GC, and then cassandra process was restarted.
> http://awesomescreenshot.com/**0401w5y534
> You can clearly see that before and after restart, the size of data that
> is in supposed to be in heap, is the same pretty much the same,
> which makes me think that I really need is GC tunning.
>
> Also I suppose that this is not due to number of total keys each node has
> , which is between 300 - 200 milions keys for all CF key estimates summed
> on a code.
> The nodes have datasize between 75G to 45G  accordingly to milions of
> keys. And all nodes are starting to have having GC heavy load after about
> 14 days.
> Also the excessive GC and heap usage are not affected by load which varies
> depending on time of the day (see read/write rates at the beginning of the
> mail).
> So again based on this , I assume this is not due to large number of keys
> or too much load on the cluster,  but due to a pure GC misconfiguration
> issue.
>
> Things I remember that I've tried for GC tunning:
> 1) Changing -XX:MaxTenuringThreshold=1 to values like 8 - did not help.
> 2) Adding  -XX:+CMSIncrementalMode -XX:+CMSIncrementalPacing -XX:**
> CMSIncrementalDutyCycleMin=0
>   -XX:CMSIncrementalDutyCycle=10 -XX:ParallelGCThreads=2
> JVM_OPTS -XX:ParallelCMSThreads=1
> this actually made things worse.
> 3) Adding -XX:-XX-UseAdaptiveSizePolicy -XX:SurvivorRatio=8 - did not help.
>
> Also since it takes like 2 weeks to verify that changing GC setting did
> not help, the process is painfully slow to try all the possibilities :)
> I'd highly appreciate any help and hints on the GC tunning.
>
> tnx
> Alex
>
>
>
>
>
>
>


Re: Side effects of hinted handoff lead to consistency problem

2013-10-14 Thread Jason Tang
After check the log and configuration, I found it caused by two reason.

 1. GC grace seconds
I using hector client to connect cassandra, and the default value of GC
grace seconds for each column family is **Zero** ! So when hinted handoff
replay the temporary value, the tombstone on other two node is deleted by
compaction. And then client will get the temporary value.

 2. Secondary index
Even after fix the first problem, I can still get temporary result from
cassandra client. And I use the command like "get my_cf where
column_one='value' " to query the data, then the temporary value show
again. But when I using the raw key to query the record again, it
disappeared.
And from client, we always using row key to get the data, and in this
way, I didn't get the temporary value.

So it seems the secondary index is not restricted by the consistency
configuration.

And when I change GC grace seconds to 10 days. our problem solved, but
it is still a strange behavior when using index query.


2013/10/8 Jason Tang 

> I have a 3 nodes cluster, replicate_factor is 3 also. Consistency level is
> Write quorum, Read quorum.
> Traffic has three major steps
> Create:
> Rowkey: 
> Column: status=new, requests="x"
> Update:
>  Rowkey: 
>  Column: status=executing, requests="x"
> Delete:
>  Rowkey: 
>
> When one node down, it can work according to consistency configuration,
> and the final status is all requests are finished and delete.
>
> So if running cassandra client to list the result (also set consistency
> quorum). It shows empty (only rowkey left), which is correct.
>
> But if we start the dead node, the hinted handoff model will write back
> the data to this node. So there are lots of create, update, delete.
>
> I don't know due to GC or compaction, the delete records on other two
> nodes seems not work, and if using cassandra client to list the data (also
> consistency quorum), the deleted row show again with column value.
>
> And if using client to check the data several times, you can find the data
> is changed, seems hinted handoff replay operation, the deleted data show up
> and then disappear.
>
> So the hinted handoff mechanism will faster the repair, but the temporary
> data will be seen from external (if data is deleted).
>
> Is there a way to have this procedure invisible from external, until the
> hinted handoff finished?
>
> What I want is final status synchronization, the temporary status is out
> of date and also incorrect, should never been seen from external.
>
> Is it due to row delete instead of column delete? Or compaction?
>


Re: Failed to solve Digest mismatch

2013-10-09 Thread Jason Tang
I did some test on this issue, and it turns out the problem caused by local
time stamp.
In our traffic, the update and delete happened very fast, within 1 seconds,
even within 100ms.
And at that time, the ntp service seems not work well, the offset is same
times even larger then 1 second.

Then the some delete time stamp is before the create time stamp, so
when do mismatch
resolve, the result is not correct.


2012/7/4 aaron morton 

> Jason,
> Are you able document the steps to reproduce this on a clean install ?
>
> Is so do you have time to create an issue on
> https://issues.apache.org/jira/browse/CASSANDRA
>
> Thanks
>
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 2/07/2012, at 1:49 AM, Jason Tang wrote:
>
> For the create/update/deleteColumn/deleteRow test case, for Quorum
> consistency level, 6 nodes, replicate factor 3, for one thread around 1/100
> round, I can have this reproduced.
>
> And if I have 20 client threads to run the test client, the ratio is
> bigger.
>
> And the test group will be executed by one thread, and the client time
> stamp is unique and sequenced, guaranteed by Hector.
>
> And client only access the data from local Cassandra.
>
> And the query only use the row key which is unique. The column name is not
> unique, in my case, eg, "status".
>
> And the row have around 7 columns, which are all not big, eg
> "status:true", "userName:Jason" ...
>
> BRs
> //Ares
>
> 2012/7/1 Jonathan Ellis 
>
>> Is this Cassandra 1.1.1?
>>
>> How often do you observe this?  How many columns are in the row?  Can
>> you reproduce when querying by column name, or only when "slicing" the
>> row?
>>
>> On Thu, Jun 28, 2012 at 7:24 AM, Jason Tang  wrote:
>> > Hi
>> >
>> >First I delete one column, then I delete one row. Then try to read
>> all
>> > columns from the same row, all operations from same client app.
>> >
>> >The consistency level is read/write quorum.
>> >
>> >Check the Cassandra log, the local node don't perform the delete
>> > operation but send the mutation to other nodes (192.168.0.6,
>> 192.168.0.1)
>> >
>> >After delete, I try to read all columns from the row, I found the
>> node
>> > found "Digest mismatch" due to Quorum consistency configuration, but the
>> > result is not correct.
>> >
>> >From the log, I can see the delete mutation already accepted
>> > by 192.168.0.6, 192.168.0.1,  but when 192.168.0.5 read response from
>> 0.6
>> > and 0.1, and then it merge the data, but finally 0.5 shows the result
>> which
>> > is the dirty data.
>> >
>> >Following logs shows the change of column "737461747573" ,
>> 192.168.0.5
>> > try to read from 0.1 and 0.6, it should be deleted, but finally it
>> shows it
>> > has the data.
>> >
>> > log:
>> > 192.168.0.5
>> > DEBUG [Thrift:17] 2012-06-28 15:59:42,198 StorageProxy.java (line 653)
>> > Command/ConsistencyLevel is SliceByNamesReadCommand(table='drc',
>> > key=7878323239537570657254616e67307878,
>> > columnParent='QueryPath(columnFamilyName='queue',
>> superColumnName='null',
>> > columnName='null')',
>> >
>> columns=[6578656375746554696d65,6669726554696d65,67726f75705f6964,696e517565756554696d65,6c6f67526f6f744964,6d6f54797065,706172746974696f6e,7265636569766554696d65,72657175657374,7265747279,7365727669636550726f7669646572,737461747573,757365724e616d65,])/QUORUM
>> > DEBUG [Thrift:17] 2012-06-28 15:59:42,198 ReadCallback.java (line 79)
>> > Blockfor is 2; setting up requests to /192.168.0.6,/192.168.0.1
>> > DEBUG [Thrift:17] 2012-06-28 15:59:42,198 StorageProxy.java (line 674)
>> > reading data from /192.168.0.6
>> > DEBUG [Thrift:17] 2012-06-28 15:59:42,198 StorageProxy.java (line 694)
>> > reading digest from /192.168.0.1
>> > DEBUG [RequestResponseStage:2] 2012-06-28 15:59:42,199
>> > ResponseVerbHandler.java (line 44) Processing response on a callback
>> from
>> > 6556@/192.168.0.6
>> > DEBUG [RequestResponseStage:2] 2012-06-28 15:59:42,199
>> > AbstractRowResolver.java (line 66) Preprocessed data response
>> > DEBUG [RequestResponseStage:6] 2012-06-28 15:59:42,199
>> > ResponseVerbHandler.java (line 44) Processing response on a callback
>> from
>> > 6557@/192.168.0.1
>> > DEBUG [RequestResponseStage:6]

Side effects of hinted handoff lead to consistency problem

2013-10-07 Thread Jason Tang
I have a 3 nodes cluster, replicate_factor is 3 also. Consistency level is
Write quorum, Read quorum.
Traffic has three major steps
Create:
Rowkey: 
Column: status=new, requests="x"
Update:
 Rowkey: 
 Column: status=executing, requests="x"
Delete:
 Rowkey: 

When one node down, it can work according to consistency configuration, and
the final status is all requests are finished and delete.

So if running cassandra client to list the result (also set consistency
quorum). It shows empty (only rowkey left), which is correct.

But if we start the dead node, the hinted handoff model will write back the
data to this node. So there are lots of create, update, delete.

I don't know due to GC or compaction, the delete records on other two nodes
seems not work, and if using cassandra client to list the data (also
consistency quorum), the deleted row show again with column value.

And if using client to check the data several times, you can find the data
is changed, seems hinted handoff replay operation, the deleted data show up
and then disappear.

So the hinted handoff mechanism will faster the repair, but the temporary
data will be seen from external (if data is deleted).

Is there a way to have this procedure invisible from external, until the
hinted handoff finished?

What I want is final status synchronization, the temporary status is out of
date and also incorrect, should never been seen from external.

Is it due to row delete instead of column delete? Or compaction?


Why Cassandra so depend on client local timestamp?

2013-09-30 Thread Jason Tang
Following case may be logical correct for Cassandra, but difficult for user.
Let's say:

Cassandra consistency level: write all, read one
replication_factor:3

For one record, rowkey:001, column:status

Client 1, insert value for rowkey 001, status:True, timestamp 11:00:05
Client 2 Slice Query, get the value True for rowkey 001, @11:00:00
Client 2, update value for rowkey 001, status:False, timestamp 11:00:02

So the client update sequence is True to False, although the update
requests are from different nodes, but the sequence are logically ordered.

But the result is rowkey:001, column:status, value: True

So why Cassandra so depend on client local time? Why not using server
localtime instead client local time?

Because I am using consistency level write all, and replication_factor:3,
so for all the 3 nodes, the update sequence is correct (True -> False),
they can give a correct final results.

If for some reason, it need strong depends on operation's timestamp, then
query operation also need a timestamp, then Client 2 will not see the value
True, which happen in "future".

So either using server timestamp or provide a consistent view by using
timestamp for query, it will be more consistent.

Otherwise, the consistency of Cassandra is so weak.


Gossiper in Cassandra using unicast/broadcast/multicast ?

2013-06-20 Thread Jason Tang
Hi

   We are considering using Cassandra in virtualization environment. I
wonder is Cassandra using unicast/broadcast/multicast for node discover or
communication?

  From the code, I find the broadcast address is used for heartbeat in
Gossiper.java, but I don't know how actually it works when node
communication and when node start up (not for new node added in)

BRs


Re: Consistent problem when solve Digest mismatch

2013-03-06 Thread Jason Tang
Actually I didn't concurrent update the same records, because I first
create it, then search it, then delete it. The version conflict solved
failed, due to delete local time stamp is earlier then create local time
stamp.


2013/3/6 aaron morton 

> Otherwise, it means the version conflict solving strong depends on global
> sequence id (timestamp) which need provide by client ?
>
> Yes.
> If you have an  area of your data model that has a high degree of
> concurrency C* may not be the right match.
>
> In 1.1 we have atomic updates so clients see either the entire write or
> none of it. And sometimes you can design a data model that does mutate
> shared values, but writes ledger entries instead. See Matt Denis talk here
> http://www.datastax.com/events/cassandrasummit2012/presentations or this
> post http://thelastpickle.com/2012/08/18/Sorting-Lists-For-Humans/
>
> Cheers
>
> -
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 4/03/2013, at 4:30 PM, Jason Tang  wrote:
>
> Hi
>
> The timestamp provided by my client is unix timestamp (with ntp), and as I
> said, due to the ntp drift, the local unix timestamp is not accurately
> synchronized (compare to my case).
>
> So for short, client can not provide global sequence number to indicate
> the event order.
>
> But I wonder, I configured Cassandra consistency level as write QUORUM. So
> for one record, I suppose Cassandra has the ability to decide the final
> update results.
>
> Otherwise, it means the version conflict solving strong depends on global
> sequence id (timestamp) which need provide by client ?
>
>
> //Tang
>
>
> 2013/3/4 Sylvain Lebresne 
>
>> The problem is, what is the sequence number you are talking about is
>> exactly?
>>
>> Or let me put it another way: if you do have a sequence number that
>> provides a total ordering of your operation, then that is exactly what you
>> should use as your timestamp. What Cassandra calls the timestamp, is
>> exactly what you call seqID, it's the number Cassandra uses to decide the
>> order of operation.
>>
>> Except that in real life, provided you have more than one client talking
>> to Cassandra, then providing a total ordering of operation is hard, and in
>> fact not doable efficiently. So in practice, people use unix timestamp
>> (with ntp) which provide a very good while cheap approximation of the real
>> life order of operations.
>>
>> But again, if you do know how to assign a more precise "timestamp",
>> Cassandra let you use that: you can provid your own timestamp (using unix
>> timestamp is just the default). The point being, unix timestamp is the
>> better approximation we have in practice.
>>
>> --
>> Sylvain
>>
>>
>> On Mon, Mar 4, 2013 at 9:26 AM, Jason Tang  wrote:
>>
>>> Hi
>>>
>>>   Previous I met a consistency problem, you can refer the link below for
>>> the whole story.
>>>
>>> http://mail-archives.apache.org/mod_mbox/cassandra-user/201206.mbox/%3CCAFb+LUxna0jiY0V=AvXKzUdxSjApYm4zWk=ka9ljm-txc04...@mail.gmail.com%3E
>>>
>>>   And after check the code, seems I found some clue of the problem.
>>> Maybe some one can check this.
>>>
>>>   For short, I have Cassandra cluster (1.0.3), The consistency level is
>>> read/write quorum, replication_factor is 3.
>>>
>>>   Here is event sequence:
>>>
>>> seqID   NodeA   NodeB   NodeC
>>> 1. New  New   New
>>> 2. Update  Update   Update
>>> 3. Delete   Delete
>>>
>>> When try to read from NodeB and NodeC, "Digest mismatch" exception
>>> triggered, so Cassandra try to resolve this version conflict.
>>> But the result is value "Update".
>>>
>>> Here is the suspect root cause, the version conflict resolved based
>>> on time stamp.
>>>
>>> Node C local time is a bit earlier then node A.
>>>
>>> "Update" requests sent from node C with time stamp 00:00:00.050,
>>> "Delete" sent from node A with time stamp 00:00:00.020, which is not same
>>> as the event sequence.
>>>
>>> So the version conflict resolved incorrectly.
>>>
>>> It is true?
>>>
>>> If Yes, then it means, consistency level can secure the conflict been
>>> found, but to solve it correctly, dependence one time synchronization's
>>> accuracy, e.g. NTP ?
>>>
>>>
>>>
>>
>
>


Re: Consistent problem when solve Digest mismatch

2013-03-04 Thread Jason Tang
Hi

The timestamp provided by my client is unix timestamp (with ntp), and as I
said, due to the ntp drift, the local unix timestamp is not accurately
synchronized (compare to my case).

So for short, client can not provide global sequence number to indicate the
event order.

But I wonder, I configured Cassandra consistency level as write QUORUM. So
for one record, I suppose Cassandra has the ability to decide the final
update results.

Otherwise, it means the version conflict solving strong depends on global
sequence id (timestamp) which need provide by client ?


//Tang


2013/3/4 Sylvain Lebresne 

> The problem is, what is the sequence number you are talking about is
> exactly?
>
> Or let me put it another way: if you do have a sequence number that
> provides a total ordering of your operation, then that is exactly what you
> should use as your timestamp. What Cassandra calls the timestamp, is
> exactly what you call seqID, it's the number Cassandra uses to decide the
> order of operation.
>
> Except that in real life, provided you have more than one client talking
> to Cassandra, then providing a total ordering of operation is hard, and in
> fact not doable efficiently. So in practice, people use unix timestamp
> (with ntp) which provide a very good while cheap approximation of the real
> life order of operations.
>
> But again, if you do know how to assign a more precise "timestamp",
> Cassandra let you use that: you can provid your own timestamp (using unix
> timestamp is just the default). The point being, unix timestamp is the
> better approximation we have in practice.
>
> --
> Sylvain
>
>
> On Mon, Mar 4, 2013 at 9:26 AM, Jason Tang  wrote:
>
>> Hi
>>
>>   Previous I met a consistency problem, you can refer the link below for
>> the whole story.
>>
>> http://mail-archives.apache.org/mod_mbox/cassandra-user/201206.mbox/%3CCAFb+LUxna0jiY0V=AvXKzUdxSjApYm4zWk=ka9ljm-txc04...@mail.gmail.com%3E
>>
>>   And after check the code, seems I found some clue of the problem. Maybe
>> some one can check this.
>>
>>   For short, I have Cassandra cluster (1.0.3), The consistency level is
>> read/write quorum, replication_factor is 3.
>>
>>   Here is event sequence:
>>
>> seqID   NodeA   NodeB   NodeC
>> 1. New  New   New
>> 2. Update  Update   Update
>> 3. Delete   Delete
>>
>> When try to read from NodeB and NodeC, "Digest mismatch" exception
>> triggered, so Cassandra try to resolve this version conflict.
>> But the result is value "Update".
>>
>> Here is the suspect root cause, the version conflict resolved based
>> on time stamp.
>>
>> Node C local time is a bit earlier then node A.
>>
>> "Update" requests sent from node C with time stamp 00:00:00.050, "Delete"
>> sent from node A with time stamp 00:00:00.020, which is not same as the
>> event sequence.
>>
>> So the version conflict resolved incorrectly.
>>
>> It is true?
>>
>> If Yes, then it means, consistency level can secure the conflict been
>> found, but to solve it correctly, dependence one time synchronization's
>> accuracy, e.g. NTP ?
>>
>>
>>
>


Re: Cassandra Consistency problem with NTP

2013-01-16 Thread Jason Tang
Yes, Sylvain, you are correct.
When I say "A comes before B",  it means client will secure the order,
actually, B will be sent only after get response of A request.

And Yes, A and B are not update same record, so it is not typical Cassandra
consistency problem.

And Yes, the column name is provide by client, and now I use the local
timestamp, and local time of A and B are not synchronized well, so I have
problem.

So what I want is, Cassandra provide some information for client, to
indicate A is stored before B, e.g. global unique timestamp, or  row order.




2013/1/17 Sylvain Lebresne 

> I'm not sure I fully understand your problem. You seem to be talking of
> ordering the requests, in the order they are generated. But in that case,
> you will rely on the ordering of columns within whatever row you store
> request A and B in, and that order depends on the column names, which in
> turns is client provided and doesn't depend at all of the time
> synchronization of the cluster nodes. And since you are able to say that
> request A comes before B, I suppose this means said requests are generated
> from the same source. In which case you just need to make sure that the
> column names storing each request respect the correct ordering.
>
> The column timestamps Cassandra uses are here to which update *to the same
> column* is the more recent one. So it only comes into play if you requests
> A and B update the same column and you're interested in knowing which one
> of the update will "win" when you read. But even if that's your case (which
> doesn't sound like it at all from your description), the column timestamp
> is only generated server side if you use CQL. And even in that latter case,
> it's a convenience and you can force a timestamp client side if you really
> wish. In other words, Cassandra dependency on time synchronization is not a
> strong one even in that case. But again, that doesn't seem at all to be the
> problem you are trying to solve.
>
> --
> Sylvain
>
>
> On Thu, Jan 17, 2013 at 2:56 AM, Jason Tang  wrote:
>
>> Hi
>>
>> I am using Cassandra in a message bus solution, the major responsibility
>> of cassandra is recording the incoming requests for later consumming.
>>
>> One strategy is First in First out (FIFO), so I need to get the stored
>> request in reversed order.
>>
>> I use NTP to synchronize the system time for the nodes in the cluster. (4
>> nodes).
>>
>> But the local time of each node are still have some inaccuracy, around 40
>> ms.
>>
>> The consistency level is write all and read one, and replicate factor is
>> 3.
>>
>> But here is the problem:
>> A request come to node One at local time PM 10:00:01.000
>> B request come to node Two at local time PM 10:00:00.980
>>
>> The correct order is A --> B
>> But the timestamp is B --> A
>>
>> So is there any way for Cassandra to keep the correct order for read
>> operation? (e.g. logical timestamp ?)
>>
>> Or Cassandra strong depence on time synchronization solution?
>>
>> BRs
>> //Tang
>>
>>
>>
>>
>>
>


Re: Cassandra Consistency problem with NTP

2013-01-16 Thread Jason Tang
Delay read is acceptable, but problem still there:
A request come to node One at local time PM 10:00:01.000
B request come to node Two at local time PM 10:00:00.980

The correct order is A --> B
I am not sure how node C will handle the data, although A came before B,
but B's timestamp is earlier then A ?



2013/1/17 Russell Haering 

> One solution is to only read up to (now - 1 second). If this is a public
> API where you want to guarantee full consistency (ie, if you have added a
> message to the queue, it will definitely appear to be there) you can
> instead delay requests for 1 second before reading up to the moment that
> the request was received.
>
> In either of these approaches you can tune the time offset based on how
> closely synchronized you believe you can keep your clocks. The tradeoff of
> course, will be increased latency.
>
>
> On Wed, Jan 16, 2013 at 5:56 PM, Jason Tang  wrote:
>
>> Hi
>>
>> I am using Cassandra in a message bus solution, the major responsibility
>> of cassandra is recording the incoming requests for later consumming.
>>
>> One strategy is First in First out (FIFO), so I need to get the stored
>> request in reversed order.
>>
>> I use NTP to synchronize the system time for the nodes in the cluster. (4
>> nodes).
>>
>> But the local time of each node are still have some inaccuracy, around 40
>> ms.
>>
>> The consistency level is write all and read one, and replicate factor is
>> 3.
>>
>> But here is the problem:
>> A request come to node One at local time PM 10:00:01.000
>> B request come to node Two at local time PM 10:00:00.980
>>
>> The correct order is A --> B
>> But the timestamp is B --> A
>>
>> So is there any way for Cassandra to keep the correct order for read
>> operation? (e.g. logical timestamp ?)
>>
>> Or Cassandra strong depence on time synchronization solution?
>>
>> BRs
>> //Tang
>>
>>
>>
>>
>>
>


Re: is it possible to disable compaction per CF ?

2012-07-27 Thread Jason Tang
setMaxCompactionThreshold(0)
setMinCompactionThreshold(0)

2012/7/27 Илья Шипицин 

> Hello!
>
> if we are dealing with append-only data model, so what if I disable
> compaction on certain CF ?
> any side effect ?
>
> can I do it with
>
> "update column family  with compaction_strategy = null " ?
>
> Cheers,
> Ilya Shipitsin
>


Compaction not remove the deleted data from secondary index when use TTL

2012-07-19 Thread Jason Tang
Hi

 For some consistency problem, we can not use delete direct to delete
one row, and then we use TTL for each column of the row.

 We using the Cassandra as the central storage of the stateful system.
All request will be stored in Cassandra, and marked as status;NEW, and then
we change it to status:EXECUTING, then delete it (by TTL).

 And we use secondary index of column 'status', and after process 4
million requests, most of the requests are deleted from Cassandra.

 After executing compact from nodetool, the size of CF Requests SSTable
is decreased to about 20M, but the Requests.idxStatus
is continuously increased, and about 1.6G.

 And from the system log, I found the compact command from nodetool
will not trigger the compaction of the secondary index, but during the
traffic, when compaction of the CF Requests triggered, the compaction of
the index will be started also.

But the size of the SSTable not decreased as expected, it seems the
data in secondary index not deleted. And since we only have 3 status, I can
found such log
 INFO [CompactionExecutor:31] 2012-07-20 10:30:50,532
CompactionController.java (line 129) Compacting large row demo/
Requests.idxStatus:EXECUTING (264045300 bytes) incrementally

So why the secondary index not compact to small size as expected, is it
related to TTL?

And is it possible to rebuild the index ?

BRs


Re: Replication factor - Consistency Questions

2012-07-17 Thread Jason Tang
Yes, for ALL, it is not good for HA, and because we meet problem when use
QUORAM, and current solution is switch Write:QUORAM / Read:QUORAM when got
"UnavailableException" exception.

2012/7/18 Jay Parashar 

> Thanks..but write ALL will fail for any downed nodes. I am thinking of
> QUORAM.
>
> ** **
>
> *From:* Jason Tang [mailto:ares.t...@gmail.com]
> *Sent:* Tuesday, July 17, 2012 8:24 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Replication factor - Consistency Questions
>
> ** **
>
> Hi
>
> ** **
>
> I am starting using Cassandra for not a long time, and also have problems
> in consistency.
>
> ** **
>
> Here is some thinking.
>
> If you have Write:Any / Read:One, it will have consistency problem, and if
> you want to repair, check your schema, and check the parameter "Read repair
> chance: "
>
> http://wiki.apache.org/cassandra/StorageConfiguration 
>
> ** **
>
> And if you want to get consistency result, my suggestion is to have
> Write:ALL / Read:One, since for Cassandra, write is more faster then read.
> 
>
> ** **
>
> For performance impact, you need to test your traffic, and if your memory
> can not cache all your data, or your network is not fast enough, then yes,
> it will impact to write one more node.
>
> ** **
>
> BRs
>
> ** **
>
> 2012/7/18 Jay Parashar 
>
> Hello all,
>
> There is a lot of material on Replication factor and Consistency level but
> I
> am a little confused by what is happening on my setup. (Cassandra 1.1.2). I
> would appreciate any answers.
>
> My Setup: A cluster of 2 nodes evenly balanced. My RF =2, Consistency
> Level;
> Write = ANY and Read = 1
>
> I know that my consistency is Weak but since my RF = 2, I thought data
> would
> be just duplicated in both the nodes but sometimes, querying does not give
> me the correct (or gives partial) results. In other times, it gives me the
> right results
> Is the Read Repair going on after the first query? But as RF = 2, data is
> duplicated then why the repair?
> Note: My query is done a while after the Writes so data should have been in
> both the nodes. Or is this not the case (flushing not happening etc)?
>
> I am thinking of making the Write as 1 and Read as QUORAM so R + W > RF (1
> +
> 2 > 2) to give strong consistency. Will that affect performance a lot
> (generally speaking)?
>
> Thanks in advance
> Regards
>
> Jay
>
> 
>
> ** **
>


Re: Replication factor - Consistency Questions

2012-07-17 Thread Jason Tang
Hi

I am starting using Cassandra for not a long time, and also have problems
in consistency.

Here is some thinking.
If you have Write:Any / Read:One, it will have consistency problem, and if
you want to repair, check your schema, and check the parameter "Read repair
chance: "
http://wiki.apache.org/cassandra/StorageConfiguration

And if you want to get consistency result, my suggestion is to have
Write:ALL / Read:One, since for Cassandra, write is more faster then read.

For performance impact, you need to test your traffic, and if your memory
can not cache all your data, or your network is not fast enough, then yes,
it will impact to write one more node.

BRs


2012/7/18 Jay Parashar 

> Hello all,
>
> There is a lot of material on Replication factor and Consistency level but
> I
> am a little confused by what is happening on my setup. (Cassandra 1.1.2). I
> would appreciate any answers.
>
> My Setup: A cluster of 2 nodes evenly balanced. My RF =2, Consistency
> Level;
> Write = ANY and Read = 1
>
> I know that my consistency is Weak but since my RF = 2, I thought data
> would
> be just duplicated in both the nodes but sometimes, querying does not give
> me the correct (or gives partial) results. In other times, it gives me the
> right results
> Is the Read Repair going on after the first query? But as RF = 2, data is
> duplicated then why the repair?
> Note: My query is done a while after the Writes so data should have been in
> both the nodes. Or is this not the case (flushing not happening etc)?
>
> I am thinking of making the Write as 1 and Read as QUORAM so R + W > RF (1
> +
> 2 > 2) to give strong consistency. Will that affect performance a lot
> (generally speaking)?
>
> Thanks in advance
> Regards
>
> Jay
>
>
>


What does "Replicate on write" mean?

2012-07-17 Thread Jason Tang
Hi

   I have a 4 nodes Cassandra cluster, and replicate factor is 3, and write
consistent level is ALL, and each write suppose to write to at least 3
nodes, right?

   I check the schema, and found the parameter "Replicate on write: false",
what does this parameter mean.

   How it impact the write behavior and consistency level?

BRs


Re: Cassandra take 100% CPU for 2~3 minutes every half an hour and mutation lost

2012-07-12 Thread Jason Tang
Hi

After change the parameter of concurrent compactor, we can limit Cassandra
to use 100% of one core at that moment. (concurrent_compactors: 1)

And I got the stack of the "crazy" thread, it last 2~3 minutes, on same
stack.

Any clue of this issue?

Thread 18114: (state = IN_JAVA)

 - java.util.AbstractList$Itr.hasNext() @bci=8, line=339 (Compiled frame;
information may be imprecise)

 -
org.apache.cassandra.db.ColumnFamilyStore.removeDeletedStandard(org.apache.cassandra.db.ColumnFamily,
int) @bci=6, line=841 (Compiled frame)

 -
org.apache.cassandra.db.ColumnFamilyStore.removeDeletedColumnsOnly(org.apache.cassandra.db.ColumnFamily,
int) @bci=17, line=835 (Compiled frame)

 -
org.apache.cassandra.db.ColumnFamilyStore.removeDeleted(org.apache.cassandra.db.ColumnFamily,
int) @bci=8, line=826 (Compiled frame)

 -
org.apache.cassandra.db.compaction.PrecompactedRow.removeDeletedAndOldShards(org.apache.cassandra.db.DecoratedKey,
org.apache.cassandra.db.compaction.CompactionController,
org.apache.cassandra.db.ColumnFamily) @bci=38, line=77 (Compiled frame)

 -
org.apache.cassandra.db.compaction.PrecompactedRow.(org.apache.cassandra.db.compaction.CompactionController,
java.util.List) @bci=33, line=102 (Compiled frame)

 -
org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(java.util.List)
@bci=223, line=133 (Compiled frame)

 -
org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced()
@bci=44, line=102 (Compiled frame)

 -
org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced()
@bci=1, line=87 (Compiled frame)

 - org.apache.cassandra.utils.MergeIterator$ManyToOne.consume() @bci=88,
line=116 (Compiled frame)

 - org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext() @bci=5,
line=99 (Compiled frame)

 - com.google.common.collect.AbstractIterator.tryToComputeNext() @bci=9,
line=140 (Compiled frame)

 - com.google.common.collect.AbstractIterator.hasNext() @bci=61, line=135
(Compiled frame)

 - com.google.common.collect.Iterators$7.computeNext() @bci=4, line=614
(Compiled frame)

 - com.google.common.collect.AbstractIterator.tryToComputeNext() @bci=9,
line=140 (Compiled frame)

 - com.google.common.collect.AbstractIterator.hasNext() @bci=61, line=135
(Compiled frame)

 -
org.apache.cassandra.db.compaction.CompactionTask.execute(org.apache.cassandra.db.compaction.CompactionManager$CompactionExecutorStatsCollector)
@bci=542, line=141 (Compiled frame)

 - org.apache.cassandra.db.compaction.CompactionManager$1.call() @bci=117,
line=134 (Interpreted frame)

 - org.apache.cassandra.db.compaction.CompactionManager$1.call() @bci=1,
line=114 (Interpreted frame)

 - java.util.concurrent.FutureTask$Sync.innerRun() @bci=30, line=303
(Interpreted frame)

 - java.util.concurrent.FutureTask.run() @bci=4, line=138 (Interpreted
frame)

 -
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(java.lang.Runnable)
@bci=59, line=886 (Compiled frame)

 - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=28, line=908
(Compiled frame)

 - java.lang.Thread.run() @bci=11, line=662 (Interpreted frame)



BRs

//Jason



2012/7/11 Jason Tang 

> Hi
>
> I encounter the High CPU problem, Cassandra 1.0.3, happened on both
> sized and leveled compaction, 6G heap, 64bit Oracle java. For normal
> traffic, Cassandra will use 15% CPU.
>
> But every half a hour, Cassandra will use almost 100% total cpu (SUSE,
> 12 Core).
>
> And here is the top information for that moment.
>
> #top -H -p 12451
>
> top - 12:30:14 up 15 days, 12:49,  6 users,  load average: 10.52, 8.92,
> 8.14
> Tasks: 706 total,  21 running, 685 sleeping,   0 stopped,   0 zombie
> Cpu(s): 25.7%us, 14.0%sy, 48.9%ni,  6.5%id,  0.0%wa,  0.0%hi,  4.9%si,
>  0.0%st
> Mem: 24150M total,12218M used,11932M free,  142M buffers
> Swap:0M total,0M used,0M free, 3714M cached
>
>   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
> 20291 casadm24   4 8003m 5.4g 167m R   92 22.7   0:42.46 java
> 20276 casadm24   4 8003m 5.4g 167m R   88 22.7   0:43.88 java
> 20181 casadm24   4 8003m 5.4g 167m R   86 22.7   0:52.97 java
> 20213 casadm24   4 8003m 5.4g 167m R   85 22.7   0:49.21 java
> 20188 casadm24   4 8003m 5.4g 167m R   82 22.7   0:54.34 java
> 20268 casadm24   4 8003m 5.4g 167m R   81 22.7   0:46.25 java
> 20269 casadm24   4 8003m 5.4g 167m R   41 22.7   0:15.11 java
> 20316 casadm24   4 8003m 5.4g 167m S   20 22.7   0:02.35 java
> 20191 casadm24   4 8003m 5.4g 167m R   15 22.7   0:16.85 java
> 12500 casadm20   0 8003m 5.4g 167m R6 22.7   1:07.86 java
> 15245 casadm20   0 8003m 5.4g 167m D5 22.7   0:36.45 java
>
> Jstack can not print the stack.
> Thread 20291: (state = IN_JAVA)
> Error occurred during stack walking:
> ...
> Thread 20276: (state 

Cassandra take 100% CPU for 2~3 minutes every half an hour and mutation lost

2012-07-10 Thread Jason Tang
Hi

I encounter the High CPU problem, Cassandra 1.0.3, happened on both
sized and leveled compaction, 6G heap, 64bit Oracle java. For normal
traffic, Cassandra will use 15% CPU.

But every half a hour, Cassandra will use almost 100% total cpu (SUSE,
12 Core).

And here is the top information for that moment.

#top -H -p 12451

top - 12:30:14 up 15 days, 12:49,  6 users,  load average: 10.52, 8.92, 8.14
Tasks: 706 total,  21 running, 685 sleeping,   0 stopped,   0 zombie
Cpu(s): 25.7%us, 14.0%sy, 48.9%ni,  6.5%id,  0.0%wa,  0.0%hi,  4.9%si,
 0.0%st
Mem: 24150M total,12218M used,11932M free,  142M buffers
Swap:0M total,0M used,0M free, 3714M cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
20291 casadm24   4 8003m 5.4g 167m R   92 22.7   0:42.46 java
20276 casadm24   4 8003m 5.4g 167m R   88 22.7   0:43.88 java
20181 casadm24   4 8003m 5.4g 167m R   86 22.7   0:52.97 java
20213 casadm24   4 8003m 5.4g 167m R   85 22.7   0:49.21 java
20188 casadm24   4 8003m 5.4g 167m R   82 22.7   0:54.34 java
20268 casadm24   4 8003m 5.4g 167m R   81 22.7   0:46.25 java
20269 casadm24   4 8003m 5.4g 167m R   41 22.7   0:15.11 java
20316 casadm24   4 8003m 5.4g 167m S   20 22.7   0:02.35 java
20191 casadm24   4 8003m 5.4g 167m R   15 22.7   0:16.85 java
12500 casadm20   0 8003m 5.4g 167m R6 22.7   1:07.86 java
15245 casadm20   0 8003m 5.4g 167m D5 22.7   0:36.45 java

Jstack can not print the stack.
Thread 20291: (state = IN_JAVA)
Error occurred during stack walking:
...
Thread 20276: (state = IN_JAVA)
Error occurred during stack walking:

After it come back, the stack shows:
Thread 20291: (state = BLOCKED)
 - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information
may be imprecise)
 - java.util.concurrent.locks.LockSupport.parkNanos(java.lang.Object, long)
@bci=20, line=196 (Compiled frame)
 -
java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(java.util.concurrent.SynchronousQueue$TransferStack$SNode,
boolean, long) @bci=174, line=424 (Compiled frame)
 -
java.util.concurrent.SynchronousQueue$TransferStack.transfer(java.lang.Object,
boolean, long) @bci=102, line=323 (Compiled frame)
 - java.util.concurrent.SynchronousQueue.poll(long,
java.util.concurrent.TimeUnit) @bci=11, line=874 (Compiled frame)
 - java.util.concurrent.ThreadPoolExecutor.getTask() @bci=62, line=945
(Compiled frame)
 - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=18, line=907
(Compiled frame)
 - java.lang.Thread.run() @bci=11, line=662 (Interpreted frame

And after this happened, the data is not correct, some
large column which suppose to be deleted, come back again.
Here is the suspect thread when it use up 100%
Thread 20191: (state = IN_VM)
 - sun.misc.Unsafe.unpark(java.lang.Object) @bci=0 (Compiled frame;
information may be imprecise)
 - java.util.concurrent.locks.LockSupport.unpark(java.lang.Thread) @bci=8,
line=122 (Compiled frame)
 -
java.util.concurrent.SynchronousQueue$TransferStack$SNode.tryMatch(java.util.concurrent.SynchronousQueue$TransferStack$SNode)
@bci=34, line=242 (Compiled frame)
 -
java.util.concurrent.SynchronousQueue$TransferStack.transfer(java.lang.Object,
boolean, long) @bci=268, line=344 (Compiled frame)
 - java.util.concurrent.SynchronousQueue.offer(java.lang.Object) @bci=19,
line=846 (Compiled frame)
 - java.util.concurrent.ThreadPoolExecutor.execute(java.lang.Runnable)
@bci=43, line=653 (Compiled frame)
 -
java.util.concurrent.AbstractExecutorService.submit(java.util.concurrent.Callable)
@bci=20, line=92 (Compiled frame)
 -
org.apache.cassandra.db.compaction.ParallelCompactionIterable$Reducer.getCompactedRow(java.util.List)
@bci=86, line=190 (Compiled frame) -
org.apache.cassandra.db.compaction.ParallelCompactionIterable$Reducer.getReduced()
@bci=31, line=164 (Compiled frame)
 -
org.apache.cassandra.db.compaction.ParallelCompactionIterable$Reducer.getReduced()
@bci=1, line=144 (Compiled frame)
 - org.apache.cassandra.utils.MergeIterator$ManyToOne.consume() @bci=88,
line=116 (Compiled frame)
 - org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext() @bci=5,
line=99 (Compiled frame)
 - com.google.common.collect.AbstractIterator.tryToComputeNext() @bci=9,
line=140 (Compiled frame)
 - com.google.common.collect.AbstractIterator.hasNext() @bci=61, line=135
(Compiled frame)
 -
org.apache.cassandra.db.compaction.ParallelCompactionIterable$Unwrapper.computeNext()
@bci=4, line=103 (Compiled frame)
 -
org.apache.cassandra.db.compaction.ParallelCompactionIterable$Unwrapper.computeNext()
@bci=1, line=90 (Compiled frame)
 - com.google.common.collect.AbstractIterator.tryToComputeNext() @bci=9,
line=140 (Compiled frame)
 - com.google.common.collect.AbstractIterator.hasNext() @bci=61, line=135
(Compiled frame)
 - com.google.common.collect.Iterators$7.computeNext() @bci=4, line=614
(Compiled frame)
 - com.google.common.collect.AbstractIterator.tr

Re: Failed to solve Digest mismatch

2012-07-01 Thread Jason Tang
For the create/update/deleteColumn/deleteRow test case, for Quorum
consistency level, 6 nodes, replicate factor 3, for one thread around 1/100
round, I can have this reproduced.

And if I have 20 client threads to run the test client, the ratio is bigger.

And the test group will be executed by one thread, and the client time
stamp is unique and sequenced, guaranteed by Hector.

And client only access the data from local Cassandra.

And the query only use the row key which is unique. The column name is not
unique, in my case, eg, "status".

And the row have around 7 columns, which are all not big, eg "status:true",
"userName:Jason" ...

BRs
//Ares

2012/7/1 Jonathan Ellis 

> Is this Cassandra 1.1.1?
>
> How often do you observe this?  How many columns are in the row?  Can
> you reproduce when querying by column name, or only when "slicing" the
> row?
>
> On Thu, Jun 28, 2012 at 7:24 AM, Jason Tang  wrote:
> > Hi
> >
> >First I delete one column, then I delete one row. Then try to read all
> > columns from the same row, all operations from same client app.
> >
> >The consistency level is read/write quorum.
> >
> >Check the Cassandra log, the local node don't perform the delete
> > operation but send the mutation to other nodes (192.168.0.6, 192.168.0.1)
> >
> >After delete, I try to read all columns from the row, I found the node
> > found "Digest mismatch" due to Quorum consistency configuration, but the
> > result is not correct.
> >
> >From the log, I can see the delete mutation already accepted
> > by 192.168.0.6, 192.168.0.1,  but when 192.168.0.5 read response from 0.6
> > and 0.1, and then it merge the data, but finally 0.5 shows the result
> which
> > is the dirty data.
> >
> >Following logs shows the change of column "737461747573" , 192.168.0.5
> > try to read from 0.1 and 0.6, it should be deleted, but finally it shows
> it
> > has the data.
> >
> > log:
> > 192.168.0.5
> > DEBUG [Thrift:17] 2012-06-28 15:59:42,198 StorageProxy.java (line 653)
> > Command/ConsistencyLevel is SliceByNamesReadCommand(table='drc',
> > key=7878323239537570657254616e67307878,
> > columnParent='QueryPath(columnFamilyName='queue', superColumnName='null',
> > columnName='null')',
> >
> columns=[6578656375746554696d65,6669726554696d65,67726f75705f6964,696e517565756554696d65,6c6f67526f6f744964,6d6f54797065,706172746974696f6e,7265636569766554696d65,72657175657374,7265747279,7365727669636550726f7669646572,737461747573,757365724e616d65,])/QUORUM
> > DEBUG [Thrift:17] 2012-06-28 15:59:42,198 ReadCallback.java (line 79)
> > Blockfor is 2; setting up requests to /192.168.0.6,/192.168.0.1
> > DEBUG [Thrift:17] 2012-06-28 15:59:42,198 StorageProxy.java (line 674)
> > reading data from /192.168.0.6
> > DEBUG [Thrift:17] 2012-06-28 15:59:42,198 StorageProxy.java (line 694)
> > reading digest from /192.168.0.1
> > DEBUG [RequestResponseStage:2] 2012-06-28 15:59:42,199
> > ResponseVerbHandler.java (line 44) Processing response on a callback from
> > 6556@/192.168.0.6
> > DEBUG [RequestResponseStage:2] 2012-06-28 15:59:42,199
> > AbstractRowResolver.java (line 66) Preprocessed data response
> > DEBUG [RequestResponseStage:6] 2012-06-28 15:59:42,199
> > ResponseVerbHandler.java (line 44) Processing response on a callback from
> > 6557@/192.168.0.1
> > DEBUG [RequestResponseStage:6] 2012-06-28 15:59:42,199
> > AbstractRowResolver.java (line 66) Preprocessed digest response
> > DEBUG [Thrift:17] 2012-06-28 15:59:42,199 RowDigestResolver.java (line
> 65)
> > resolving 2 responses
> > DEBUG [Thrift:17] 2012-06-28 15:59:42,200 StorageProxy.java (line 733)
> > Digest mismatch: org.apache.cassandra.service.DigestMismatchException:
> > Mismatch for key DecoratedKey(100572974179274741747356988451225858264,
> > 7878323239537570657254616e67307878) (b725ab25696111be49aaa7c4b7afa52d vs
> > d41d8cd98f00b204e9800998ecf8427e)
> > DEBUG [RequestResponseStage:9] 2012-06-28 15:59:42,201
> > ResponseVerbHandler.java (line 44) Processing response on a callback from
> > 6558@/192.168.0.6
> > DEBUG [RequestResponseStage:7] 2012-06-28 15:59:42,201
> > ResponseVerbHandler.java (line 44) Processing response on a callback from
> > 6559@/192.168.0.1
> > DEBUG [RequestResponseStage:9] 2012-06-28 15:59:42,201
> > AbstractRowResolver.java (line 66) Preprocessed data response
> > DEBUG [RequestResponseStage:7] 2012-06-28 15:59:42,201
> > AbstractRowResolver.java (line 66) Preprocessed data respons

Failed to solve Digest mismatch

2012-06-28 Thread Jason Tang
Hi

   First I delete one column, then I delete one row. Then try to read all
columns from the same row, all operations from same client app.

   The consistency level is read/write quorum.

   Check the Cassandra log, the local node don't perform the delete
operation but send the mutation to other nodes (192.168.0.6, 192.168.0.1)

   After delete, I try to read all columns from the row, I found the node
found "Digest mismatch" due to Quorum consistency configuration, but the
result is not correct.

   From the log, I can see the delete mutation already accepted
by 192.168.0.6, 192.168.0.1,  but when 192.168.0.5 read response from 0.6
and 0.1, and then it merge the data, but finally 0.5 shows the result which
is the dirty data.

   Following logs shows the change of column "737461747573" , 192.168.0.5
try to read from 0.1 and 0.6, it should be deleted, but finally it shows it
has the data.

log:
192.168.0.5
DEBUG [Thrift:17] 2012-06-28 15:59:42,198 StorageProxy.java (line 653)
Command/ConsistencyLevel is SliceByNamesReadCommand(table='drc',
key=7878323239537570657254616e67307878,
columnParent='QueryPath(columnFamilyName='queue', superColumnName='null',
columnName='null')',
columns=[6578656375746554696d65,6669726554696d65,67726f75705f6964,696e517565756554696d65,6c6f67526f6f744964,6d6f54797065,706172746974696f6e,7265636569766554696d65,72657175657374,7265747279,7365727669636550726f7669646572,
737461747573,757365724e616d65,])/QUORUM
DEBUG [Thrift:17] 2012-06-28 15:59:42,198 ReadCallback.java (line 79)
Blockfor is 2; setting up requests to /192.168.0.6,/192.168.0.1
DEBUG [Thrift:17] 2012-06-28 15:59:42,198 StorageProxy.java (line 674)
reading data from /192.168.0.6
DEBUG [Thrift:17] 2012-06-28 15:59:42,198 StorageProxy.java (line 694)
reading digest from /192.168.0.1
DEBUG [RequestResponseStage:2] 2012-06-28 15:59:42,199
ResponseVerbHandler.java (line 44) Processing response on a callback from
6556@/192.168.0.6
DEBUG [RequestResponseStage:2] 2012-06-28 15:59:42,199
AbstractRowResolver.java (line 66) Preprocessed data response
DEBUG [RequestResponseStage:6] 2012-06-28 15:59:42,199
ResponseVerbHandler.java (line 44) Processing response on a callback from
6557@/192.168.0.1
DEBUG [RequestResponseStage:6] 2012-06-28 15:59:42,199
AbstractRowResolver.java (line 66) Preprocessed digest response
DEBUG [Thrift:17] 2012-06-28 15:59:42,199 RowDigestResolver.java (line 65)
resolving 2 responses
DEBUG [Thrift:17] 2012-06-28 15:59:42,200 StorageProxy.java (line 733)
Digest mismatch: org.apache.cassandra.service.DigestMismatchException:
Mismatch for key DecoratedKey(100572974179274741747356988451225858264,
7878323239537570657254616e67307878) (b725ab25696111be49aaa7c4b7afa52d vs
d41d8cd98f00b204e9800998ecf8427e)
DEBUG [RequestResponseStage:9] 2012-06-28 15:59:42,201
ResponseVerbHandler.java (line 44) Processing response on a callback from
6558@/192.168.0.6
DEBUG [RequestResponseStage:7] 2012-06-28 15:59:42,201
ResponseVerbHandler.java (line 44) Processing response on a callback from
6559@/192.168.0.1
DEBUG [RequestResponseStage:9] 2012-06-28 15:59:42,201
AbstractRowResolver.java (line 66) Preprocessed data response
DEBUG [RequestResponseStage:7] 2012-06-28 15:59:42,201
AbstractRowResolver.java (line 66) Preprocessed data response
DEBUG [Thrift:17] 2012-06-28 15:59:42,201 RowRepairResolver.java (line 63)
resolving 2 responses
DEBUG [Thrift:17] 2012-06-28 15:59:42,201 SliceQueryFilter.java (line 123)
collecting 0 of 2147483647: 6669726554696d65:false:13@1340870382109004
DEBUG [Thrift:17] 2012-06-28 15:59:42,201 SliceQueryFilter.java (line 123)
collecting 1 of 2147483647: 67726f75705f6964:false:10@1340870382109014
DEBUG [Thrift:17] 2012-06-28 15:59:42,201 SliceQueryFilter.java (line 123)
collecting 2 of 2147483647: 696e517565756554696d65:false:13@1340870382109005
DEBUG [Thrift:17] 2012-06-28 15:59:42,201 SliceQueryFilter.java (line 123)
collecting 3 of 2147483647: 6c6f67526f6f744964:false:7@1340870382109015
DEBUG [Thrift:17] 2012-06-28 15:59:42,202 SliceQueryFilter.java (line 123)
collecting 4 of 2147483647: 6d6f54797065:false:6@1340870382109009
DEBUG [Thrift:17] 2012-06-28 15:59:42,202 SliceQueryFilter.java (line 123)
collecting 5 of 2147483647: 706172746974696f6e:false:2@1340870382109001
DEBUG [Thrift:17] 2012-06-28 15:59:42,202 SliceQueryFilter.java (line 123)
collecting 6 of 2147483647: 7265636569766554696d65:false:13@1340870382109003
DEBUG [Thrift:17] 2012-06-28 15:59:42,202 SliceQueryFilter.java (line 123)
collecting 7 of 2147483647: 72657175657374:false:300@1340870382109013
DEBUG [RequestResponseStage:5] 2012-06-28 15:59:42,202
ResponseVerbHandler.java (line 44) Processing response on a callback from
6552@/192.168.0.1
DEBUG [Thrift:17] 2012-06-28 15:59:42,202 SliceQueryFilter.java (line 123)
collecting 8 of 2147483647: 7265747279:false:1@1340870382109006
DEBUG [Thrift:17] 2012-06-28 15:59:42,202 SliceQueryFilter.java (line 123)
collecting 9 of 2147483647:
7365727669636550726f7669646572:false:4@1340870382109007
DEBUG [Thrift

Re: Consistency Problem with Quorum consistencyLevel configuration

2012-06-26 Thread Jason Tang
Hi
  After enable Cassandra debug log, I got following log, it shows the
delete mutation send to other two nodes rather then local node.
  And then the read command come to the local nodes.
  And local one found the mismatch.
  But I don't know why local node return the local dirty data. It supposed
to repair the data, and return correct one?

192.168.0.6:
DEBUG [MutationStage:61] 2012-06-26 23:09:00,036
RowMutationVerbHandler.java (line 60) RowMutation(keyspace='drc',
key='33323130537570657254616e6730', modifications=[ColumnFamily(queue
-deleted at 1340723340044000- [])]) applied.  Sending response to 3555@/
192.168.0.5

192.168.0.4:
DEBUG [MutationStage:40] 2012-06-26 23:09:00,041
RowMutationVerbHandler.java (line 60) RowMutation(keyspace='drc',
key='33323130537570657254616e6730', modifications=[ColumnFamily(queue
-deleted at 1340723340044000- [])]) applied.  Sending response to 3556@/
192.168.0.5

192.168.0.5 (local one):
DEBUG [pool-2-thread-20] 2012-06-26 23:09:00,105 StorageProxy.java (line
705) Digest mismatch: org.apache.cassandra.service.DigestMismatchException:
Mismatch for key DecoratedKey(7649972972837658739074639933581556,
33323130537570657254616e6730) (b20ac6ec0d29393d70e200027c094d13 vs
d41d8cd98f00b204e9800998ecf8427e)



2012/6/25 Jason Tang 

> Hi
>
> I met the consistency problem when we have Quorum for both read and
> write.
>
> I use MultigetSubSliceQuery to query rows from super column limit size
> 100, and then read it, then delete it. And start another around.
>
> But I found, the row which should be delete by last query, it still
> shown from next around query.
>
> And also form normal Column Family, I updated the value of one column
> from status='FALSE' to status='TURE', and next time I query it, the status
> still 'FALSE'.
>
> More detail:
>
>- It not happened not every time (1/10,000)
>- The time between two round query is around 500 ms (but we found two
>query which 2 seconds happened later then the first one, still have this
>consistency problem)
>- We use ntp as our cluster time synchronization solution.
>- We have 6 nodes, and replication factor is 3
>
> Some body say, Cassandra suppose to have such problem, because read
> may not happen before write inside Cassandra. But for two seconds?! And if
> so, it meaningless to have Quorum or other consistency level configuration.
>
>So first of all, is it the correct behavior of Cassandra, and if not,
> what data we need to analyze for further investment.
>
> BRs
> Ares
>


Consistency Problem with Quorum consistencyLevel configuration

2012-06-24 Thread Jason Tang
Hi

I met the consistency problem when we have Quorum for both read and
write.

I use MultigetSubSliceQuery to query rows from super column limit size
100, and then read it, then delete it. And start another around.

But I found, the row which should be delete by last query, it still
shown from next around query.

And also form normal Column Family, I updated the value of one column
from status='FALSE' to status='TURE', and next time I query it, the status
still 'FALSE'.

More detail:

   - It not happened not every time (1/10,000)
   - The time between two round query is around 500 ms (but we found two
   query which 2 seconds happened later then the first one, still have this
   consistency problem)
   - We use ntp as our cluster time synchronization solution.
   - We have 6 nodes, and replication factor is 3

Some body say, Cassandra suppose to have such problem, because read may
not happen before write inside Cassandra. But for two seconds?! And if so,
it meaningless to have Quorum or other consistency level configuration.

   So first of all, is it the correct behavior of Cassandra, and if not,
what data we need to analyze for further investment.

BRs
Ares


Re: GCInspector works every 10 seconds!

2012-06-18 Thread Jason Tang
Hi

After I enable key cache and row cache, the problem gone, I guess it
because we have lots of data in SSTable, and it takes more time, memory and
cpu to search the data.

BRs
//Tang Weiqiang

2012/6/18 aaron morton 

>   It is also strange that although no data in Cassandra can fulfill the
> query conditions, but it takes more time if we have more data in Cassandra.
>
>
> These log messages:
>
> DEBUG [ReadStage:89] 2012-06-17 20:17:26,958 SliceQueryFilter.java (line
> 123) collecting 0 of 5000:
> 7fff0137f63408920e049c22:true:4@1339865451865018
> DEBUG [ReadStage:89] 2012-06-17 20:17:26,958 SliceQueryFilter.java (line
> 123) collecting 0 of 5000:
> 7fff0137f63408a0eeab052a:true:4@1339865451866000
>
> Say that the slice query read columns from the disk that were deleted.
>
> Have you tried your test with a clean (no files on disk) database ?
>
>
> Cheers
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 18/06/2012, at 12:36 AM, Jason Tang wrote:
>
> Hi
>
>After I change log level to DEBUG, I found some log.
>
>   Although we don't have traffic to Cassandra, but we have scheduled the
> task to perform the sliceQuery.
>
>   We use time-stamp as the index, we will perform the query by every
> second to check if we have tasks to do.
>
>   After 24 hours, we have 40G data in Cassandra, and we configure
> Casssandra as Max JVM Heap 6G, memtable 1G, disk_access_mode:
> mmap_index_only.
>
>   It is also strange that although no data in Cassandra can fulfill the
> query conditions, but it takes more time if we have more data in Cassandra.
>
>   Because we total have 20 million records in Cassandra which has time
> stamp as the index, and we query by MultigetSubSliceQuery, and set the
> range the value which not match any data in Cassnadra, So it suppose to
> return fast, but as we have 20 million data, it takes 2 seconds to get the
> query result.
>
>   Is the GC caused by the scheduled query operation, and why it takes so
> many memory. Could we improve it?
>
> System.log:
>  INFO [ScheduledTasks:1] 2012-06-17 20:17:13,574 GCInspector.java (line
> 123) GC for ParNew: 559 ms for 1 collections, 3258240912 used; max
> is 6274678784
> DEBUG [ReadStage:99] 2012-06-17 20:17:25,563 SliceQueryFilter.java (line
> 123) collecting 0 of 5000:
> 0138ad1035880137f3372f3e0e28e3b6:false:36@1339815309124015
> DEBUG [ReadStage:99] 2012-06-17 20:17:25,565 ReadVerbHandler.java (line
> 60) Read key 3331; sending response to 158060445@/192.168.0.3
> DEBUG [ReadStage:96] 2012-06-17 20:17:25,845 SliceQueryFilter.java (line
> 123) collecting 0 of 5000:
> 0138ad1035880137f33a80cf6cb5d383:false:36@1339815526613007
> DEBUG [ReadStage:96] 2012-06-17 20:17:25,847 ReadVerbHandler.java (line
> 60) Read key 3233; sending response to 158060447@/192.168.0.3
> DEBUG [ReadStage:105] 2012-06-17 20:17:25,952 SliceQueryFilter.java (line
> 123) collecting 0 of 5000:
> 0138ad1035880137f330cd70c86690cd:false:36@1339814890872015
> DEBUG [ReadStage:105] 2012-06-17 20:17:25,953 ReadVerbHandler.java (line
> 75) digest is d41d8cd98f00b204e9800998ecf8427e
> DEBUG [ReadStage:105] 2012-06-17 20:17:25,953 ReadVerbHandler.java (line
> 60) Read key 3139; sending response to 158060448@/192.168.0.3
> DEBUG [ReadStage:89] 2012-06-17 20:17:25,959 CollationController.java
> (line 191) collectAllData
> DEBUG [ReadStage:108] 2012-06-17 20:17:25,959 CollationController.java
> (line 191) collectAllData
> DEBUG [ReadStage:107] 2012-06-17 20:17:25,959 CollationController.java
> (line 191) collectAllData
> DEBUG [ReadStage:89] 2012-06-17 20:17:26,958 SliceQueryFilter.java (line
> 123) collecting 0 of 5000:
> 7fff0137f63408920e049c22:true:4@1339865451865018
> DEBUG [ReadStage:89] 2012-06-17 20:17:26,958 SliceQueryFilter.java (line
> 123) collecting 0 of 5000:
> 7fff0137f63408a0eeab052a:true:4@1339865451866000
> DEBUG [ReadStage:89] 2012-06-17 20:17:26,959 SliceQueryFilter.java (line
> 123) collecting 0 of 5000:
> 7fff0137f63408b1319577c9:true:4@1339865451867003
> DEBUG [ReadStage:89] 2012-06-17 20:17:26,959 SliceQueryFilter.java (line
> 123) collecting 0 of 5000:
> 7fff0137f63408c081e0b8a3:true:4@1339865451867004
> DEBUG [ReadStage:89] 2012-06-17 20:17:26,959 SliceQueryFilter.java (line
> 123) collecting 0 of 5000:
> 7fff0137f6340deefb8a0627:true:4@1339865451920001
> DEBUG [ReadStage:89] 2012-06-17 20:17:26,959 SliceQueryFilter.java (line
> 123) collecting 0 of 5000:
> 7fff0137f6340df9c21e9979:true:4@1339865451923002
> DEBUG [ReadStage:89] 2012-06-17 20:17:26,9

Re: GCInspector works every 10 seconds!

2012-06-17 Thread Jason Tang
Hi

   After I change log level to DEBUG, I found some log.

  Although we don't have traffic to Cassandra, but we have scheduled the
task to perform the sliceQuery.

  We use time-stamp as the index, we will perform the query by every second
to check if we have tasks to do.

  After 24 hours, we have 40G data in Cassandra, and we configure
Casssandra as Max JVM Heap 6G, memtable 1G, disk_access_mode:
mmap_index_only.

  It is also strange that although no data in Cassandra can fulfill the
query conditions, but it takes more time if we have more data in Cassandra.

  Because we total have 20 million records in Cassandra which has time
stamp as the index, and we query by MultigetSubSliceQuery, and set the
range the value which not match any data in Cassnadra, So it suppose to
return fast, but as we have 20 million data, it takes 2 seconds to get the
query result.

  Is the GC caused by the scheduled query operation, and why it takes so
many memory. Could we improve it?

System.log:
 INFO [ScheduledTasks:1] 2012-06-17 20:17:13,574 GCInspector.java (line
123) GC for ParNew: 559 ms for 1 collections, 3258240912 used; max
is 6274678784
DEBUG [ReadStage:99] 2012-06-17 20:17:25,563 SliceQueryFilter.java (line
123) collecting 0 of 5000:
0138ad1035880137f3372f3e0e28e3b6:false:36@1339815309124015
DEBUG [ReadStage:99] 2012-06-17 20:17:25,565 ReadVerbHandler.java (line 60)
Read key 3331; sending response to 158060445@/192.168.0.3
DEBUG [ReadStage:96] 2012-06-17 20:17:25,845 SliceQueryFilter.java (line
123) collecting 0 of 5000:
0138ad1035880137f33a80cf6cb5d383:false:36@1339815526613007
DEBUG [ReadStage:96] 2012-06-17 20:17:25,847 ReadVerbHandler.java (line 60)
Read key 3233; sending response to 158060447@/192.168.0.3
DEBUG [ReadStage:105] 2012-06-17 20:17:25,952 SliceQueryFilter.java (line
123) collecting 0 of 5000:
0138ad1035880137f330cd70c86690cd:false:36@1339814890872015
DEBUG [ReadStage:105] 2012-06-17 20:17:25,953 ReadVerbHandler.java (line
75) digest is d41d8cd98f00b204e9800998ecf8427e
DEBUG [ReadStage:105] 2012-06-17 20:17:25,953 ReadVerbHandler.java (line
60) Read key 3139; sending response to 158060448@/192.168.0.3
DEBUG [ReadStage:89] 2012-06-17 20:17:25,959 CollationController.java (line
191) collectAllData
DEBUG [ReadStage:108] 2012-06-17 20:17:25,959 CollationController.java
(line 191) collectAllData
DEBUG [ReadStage:107] 2012-06-17 20:17:25,959 CollationController.java
(line 191) collectAllData
DEBUG [ReadStage:89] 2012-06-17 20:17:26,958 SliceQueryFilter.java (line
123) collecting 0 of 5000:
7fff0137f63408920e049c22:true:4@1339865451865018
DEBUG [ReadStage:89] 2012-06-17 20:17:26,958 SliceQueryFilter.java (line
123) collecting 0 of 5000:
7fff0137f63408a0eeab052a:true:4@1339865451866000
DEBUG [ReadStage:89] 2012-06-17 20:17:26,959 SliceQueryFilter.java (line
123) collecting 0 of 5000:
7fff0137f63408b1319577c9:true:4@1339865451867003
DEBUG [ReadStage:89] 2012-06-17 20:17:26,959 SliceQueryFilter.java (line
123) collecting 0 of 5000:
7fff0137f63408c081e0b8a3:true:4@1339865451867004
DEBUG [ReadStage:89] 2012-06-17 20:17:26,959 SliceQueryFilter.java (line
123) collecting 0 of 5000:
7fff0137f6340deefb8a0627:true:4@1339865451920001
DEBUG [ReadStage:89] 2012-06-17 20:17:26,959 SliceQueryFilter.java (line
123) collecting 0 of 5000:
7fff0137f6340df9c21e9979:true:4@1339865451923002
DEBUG [ReadStage:89] 2012-06-17 20:17:26,959 SliceQueryFilter.java (line
123) collecting 0 of 5000:
7fff0137f6340e095ead1498:true:4@1339865451928000
DEBUG [ReadStage:89] 2012-06-17 20:17:26,960 SliceQueryFilter.java (line
123) collecting 0 of 5000:
7fff0137f6340e1af16cf151:true:4@1339865451935000
DEBUG [ReadStage:89] 2012-06-17 20:17:26,960 SliceQueryFilter.java (line
123) collecting 0 of 5000:
7fff0137f6340e396cfdc9fa:true:4@133986545195


BRs
//Ares

2012/6/17 Jason Tang 

> Hi
>
>After running load testing for 24 hours(insert, update and delete), now
> no new traffic to Cassandra, but Cassnadra shows still have high load(CPU
> usage), from the system.log, it shows it always perform GC. I don't know
> why it work as that, seems memory is not low.
>
> Here is some configuration and log, where I can find the clue why
> Cassandra works as this?
>
> cassandra.yaml
> disk_access_mode: mmap_index_only
>
>  #  /opt/cassandra/bin/nodetool -h 127.0.0.1 -p 6080 tpstats
> Pool NameActive   Pending  Completed   Blocked
>  All time blocked
> ReadStage 0 045387558
> 0 0
> RequestResponseStage  0 096568347 0
>   0
> MutationStage0 060215102 0
> 0
> ReadRepairStage0 0  0
&

GCInspector works every 10 seconds!

2012-06-17 Thread Jason Tang
Hi

   After running load testing for 24 hours(insert, update and delete), now
no new traffic to Cassandra, but Cassnadra shows still have high load(CPU
usage), from the system.log, it shows it always perform GC. I don't know
why it work as that, seems memory is not low.

Here is some configuration and log, where I can find the clue why Cassandra
works as this?

cassandra.yaml
disk_access_mode: mmap_index_only

#  /opt/cassandra/bin/nodetool -h 127.0.0.1 -p 6080 tpstats
Pool NameActive   Pending  Completed   Blocked  All
time blocked
ReadStage 0 045387558 0
0
RequestResponseStage  0 096568347 0
0
MutationStage0 060215102 0
0
ReadRepairStage0 0  0
0 0
ReplicateOnWriteStage   0 0  0   0
0
GossipStage  0 0 399012
 0 0
AntiEntropyStage   0 0  0
 0 0
MigrationStage   0 0 30
  0 0
MemtablePostFlusher 0 0 279 0
  0
StreamStage  0 0  0
  0 0
FlushWriter0 0 1846
  0  1052
MiscStage 0 0  0
 0 0
InternalResponseStage   0 0  00
0
HintedHandoff 0 0  5
 0 0

Message type   Dropped
RANGE_SLICE  0
READ_REPAIR  0
BINARY   0
READ 1
MUTATION  1390
REQUEST_RESPONSE 0


 # /opt/cassandra/bin/nodetool -h 127.0.0.1 -p 6080 info
Token: 56713727820156410577229101238628035242
Gossip active: true
Load : 37.57 GB
Generation No: 1339813956
Uptime (seconds) : 120556
Heap Memory (MB) : 3261.14 / 5984.00
Data Center  : datacenter1
Rack : rack1
Exceptions   : 0


 INFO [ScheduledTasks:1] 2012-06-17 19:47:36,633 GCInspector.java (line
123) GC for ParNew: 222 ms for 1 collections, 2046077640 used; max
is 6274678784
 INFO [ScheduledTasks:1] 2012-06-17 19:48:41,714 GCInspector.java (line
123) GC for ParNew: 262 ms for 1 collections, 2228128408 used; max
is 6274678784
 INFO [ScheduledTasks:1] 2012-06-17 19:48:49,717 GCInspector.java (line
123) GC for ParNew: 237 ms for 1 collections, 2390412728 used; max
is 6274678784
 INFO [ScheduledTasks:1] 2012-06-17 19:48:57,719 GCInspector.java (line
123) GC for ParNew: 223 ms for 1 collections, 2508702896 used; max
is 6274678784
 INFO [ScheduledTasks:1] 2012-06-17 19:50:01,988 GCInspector.java (line
123) GC for ParNew: 232 ms for 1 collections, 2864574832 used; max
is 6274678784
 INFO [ScheduledTasks:1] 2012-06-17 19:50:10,075 GCInspector.java (line
123) GC for ParNew: 208 ms for 1 collections, 2964629856 used; max
is 6274678784
 INFO [ScheduledTasks:1] 2012-06-17 19:50:21,078 GCInspector.java (line
123) GC for ParNew: 258 ms for 1 collections, 3149127368 used; max
is 6274678784
 INFO [ScheduledTasks:1] 2012-06-17 19:51:26,095 GCInspector.java (line
123) GC for ParNew: 213 ms for 1 collections, 3421495400 used; max
is 6274678784
 INFO [ScheduledTasks:1] 2012-06-17 19:51:34,097 GCInspector.java (line
123) GC for ParNew: 218 ms for 1 collections, 3543978312 used; max
is 6274678784
 INFO [ScheduledTasks:1] 2012-06-17 19:52:37,229 GCInspector.java (line
123) GC for ParNew: 221 ms for 1 collections, 375229 used; max
is 6274678784
 INFO [ScheduledTasks:1] 2012-06-17 19:52:37,230 GCInspector.java (line
123) GC for ConcurrentMarkSweep: 206 ms for 1 collections, 3752313400 used;
max is 6274678784
 INFO [ScheduledTasks:1] 2012-06-17 19:52:46,507 GCInspector.java (line
123) GC for ParNew: 243 ms for 1 collections, 3663162192 used; max
is 6274678784
 INFO [ScheduledTasks:1] 2012-06-17 19:52:54,510 GCInspector.java (line
123) GC for ParNew: 283 ms for 1 collections, 1582282248 used; max
is 6274678784
 INFO [ScheduledTasks:1] 2012-06-17 19:54:01,704 GCInspector.java (line
123) GC for ParNew: 235 ms for 1 collections, 1935534800 used; max
is 6274678784
 INFO [ScheduledTasks:1] 2012-06-17 19:55:13,747 GCInspector.java (line
123) GC for ParNew: 233 ms for 1 collections, 2356975504 used; max
is 6274678784
 INFO [ScheduledTasks:1] 2012-06-17 19:55:21,749 GCInspector.java (line
123) GC for ParNew: 264 ms for 1 collections, 2530976328 used; max
is 6274678784
 INFO [ScheduledTasks:1] 2012-06-17 19:55:29,794 GCInspector.java (line
123) GC for ParNew: 224 ms for 1 collections, 2592311336 used; max
is 6274678784


BRs
//Ares


Re: Much more native memory used by Cassandra then the configured JVM heap size

2012-06-13 Thread Jason Tang
We suppose the cached memory will be released by OS, but from /proc/meminfo
, the cached memory is in "Active" status, so I am not sure if it will be
release by OS.

And for low memory, because we found "Unable to reduce heap usage since
there are no dirty column families" in system.log, and then Cassandra on
this node marked as "down".

And because we configure JVM heap 6G and memtable 1G, so I don't know why
we have OOMs error.
So we wonder the Cassandra down caused by

   1. Low OS memory
   2. impact by our configuration: memtable_flush_writers=32,
   memtable_flush_queue_size=12
   3. Caused by delete operation (The data in our traffic is dynamical,
   which means each request may be deleted in one hour, new will be inserted)
   https://issues.apache.org/jira/browse/CASSANDRA-3741

So we want to find out why the Cassandra down after 24 hours load test.
(RCA of OOM)

2012/6/12 aaron morton 

> see http://wiki.apache.org/cassandra/FAQ#mmap
>
>  which cause the OS low memory.
>>>>
>>> If the memory is used for mmapped access the os can get it back later.
>
> Is the low free memory causing a problem ?
>
> Cheers
>
>
>   -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 12/06/2012, at 5:52 PM, Jason Tang wrote:
>
> Hi
>
> I found some information of this issue
> And seems we can have other strategy for data access to reduce mmap usage,
> in order to use less memory.
>
> But I didn't find the document to describe the parameters for Cassandra
> 1.x, is it a good way to use this parameter to reduce shared memory usage
> and what's the impact? (btw, our data model is dynamical, which means the
> although the through put is high, but the life cycle of the data is short,
> one hour or less).
>
> "
> # Choices are auto, standard, mmap, and mmap_index_only.
> disk_access_mode: auto
> "
>
> http://comments.gmane.org/gmane.comp.db.cassandra.user/7390
>
> 2012/6/12 Jason Tang 
>
>> See my post, I limit the HVM heap 6G, but actually Cassandra will use
>> more memory which is not calculated in JVM heap.
>>
>> I use top to monitor total memory used by Cassandra.
>>
>> =
>> -Xms6G -Xmx6G -Xmn1600M
>>
>> 2012/6/12 Jeffrey Kesselman 
>>
>>> Btw.  I suggest you spin up JConsole as it will give you much more detai
>>> kon what your VM is actually doing.
>>>
>>>
>>>
>>> On Mon, Jun 11, 2012 at 9:14 PM, Jason Tang  wrote:
>>>
>>>> Hi
>>>>
>>>>  We have some problem with Cassandra memory usage, we configure the
>>>> JVM HEAP 6G, but after runing Cassandra for several hours (insert, update,
>>>> delete). The total memory used by Cassandra go up to 15G, which cause the
>>>> OS low memory.
>>>>  So I wonder if it is normal to have so many memory used by cassandra?
>>>>
>>>> And how to limit the native memory used by Cassandra?
>>>>
>>>>
>>>> ===
>>>> Cassandra 1.0.3, 64 bit jdk.
>>>>
>>>> Memory ocupied by Cassandra 15G
>>>>   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
>>>>  9567 casadm20   0 28.3g  15g 9.1g S  269 65.1 385:57.65 java
>>>>
>>>> =
>>>> -Xms6G -Xmx6G -Xmn1600M
>>>>
>>>>  # ps -ef | grep  9567
>>>> casadm9567 1 55 Jun11 ?05:59:44
>>>> /opt/jdk1.6.0_29/bin/java -ea
>>>> -javaagent:/opt/dve/cassandra/bin/../lib/jamm-0.2.5.jar
>>>> -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms6G -Xmx6G
>>>> -Xmn1600M -XX:+HeapDumpOnOutOfMemoryError -Xss128k -XX:+UseParNewGC
>>>> -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
>>>> -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75
>>>> -XX:+UseCMSInitiatingOccupancyOnly -Djava.net.preferIPv4Stack=true
>>>> -Dcom.sun.management.jmxremote.port=6080
>>>> -Dcom.sun.management.jmxremote.ssl=false
>>>> -Dcom.sun.management.jmxremote.authenticate=false
>>>> -Daccess.properties=/opt/dve/cassandra/conf/access.properties
>>>> -Dpasswd.properties=/opt/dve/cassandra/conf/passwd.properties
>>>> -Dpasswd.mode=MD5 -Dlog4j.configuration=log4j-server.properties
>>>> -Dlog4j.defaultInitOverride=true -cp
>>>> /opt/dve/cassandra/bin/../conf:/opt/dve/cassandra/bin/../build/classes/main:/opt/dve/cassandra/bin/../b

Re: Much more native memory used by Cassandra then the configured JVM heap size

2012-06-11 Thread Jason Tang
Hi

I found some information of this issue
And seems we can have other strategy for data access to reduce mmap usage,
in order to use less memory.

But I didn't find the document to describe the parameters for Cassandra
1.x, is it a good way to use this parameter to reduce shared memory usage
and what's the impact? (btw, our data model is dynamical, which means the
although the through put is high, but the life cycle of the data is short,
one hour or less).

"
# Choices are auto, standard, mmap, and mmap_index_only.
disk_access_mode: auto
"

http://comments.gmane.org/gmane.comp.db.cassandra.user/7390

2012/6/12 Jason Tang 

> See my post, I limit the HVM heap 6G, but actually Cassandra will use more
> memory which is not calculated in JVM heap.
>
> I use top to monitor total memory used by Cassandra.
>
> =
> -Xms6G -Xmx6G -Xmn1600M
>
> 2012/6/12 Jeffrey Kesselman 
>
>> Btw.  I suggest you spin up JConsole as it will give you much more detai
>> kon what your VM is actually doing.
>>
>>
>>
>> On Mon, Jun 11, 2012 at 9:14 PM, Jason Tang  wrote:
>>
>>> Hi
>>>
>>>  We have some problem with Cassandra memory usage, we configure the JVM
>>> HEAP 6G, but after runing Cassandra for several hours (insert, update,
>>> delete). The total memory used by Cassandra go up to 15G, which cause the
>>> OS low memory.
>>>  So I wonder if it is normal to have so many memory used by cassandra?
>>>
>>> And how to limit the native memory used by Cassandra?
>>>
>>>
>>> ===
>>> Cassandra 1.0.3, 64 bit jdk.
>>>
>>> Memory ocupied by Cassandra 15G
>>>   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
>>>  9567 casadm20   0 28.3g  15g 9.1g S  269 65.1 385:57.65 java
>>>
>>> =
>>> -Xms6G -Xmx6G -Xmn1600M
>>>
>>>  # ps -ef | grep  9567
>>> casadm9567 1 55 Jun11 ?05:59:44
>>> /opt/jdk1.6.0_29/bin/java -ea
>>> -javaagent:/opt/dve/cassandra/bin/../lib/jamm-0.2.5.jar
>>> -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms6G -Xmx6G
>>> -Xmn1600M -XX:+HeapDumpOnOutOfMemoryError -Xss128k -XX:+UseParNewGC
>>> -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
>>> -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75
>>> -XX:+UseCMSInitiatingOccupancyOnly -Djava.net.preferIPv4Stack=true
>>> -Dcom.sun.management.jmxremote.port=6080
>>> -Dcom.sun.management.jmxremote.ssl=false
>>> -Dcom.sun.management.jmxremote.authenticate=false
>>> -Daccess.properties=/opt/dve/cassandra/conf/access.properties
>>> -Dpasswd.properties=/opt/dve/cassandra/conf/passwd.properties
>>> -Dpasswd.mode=MD5 -Dlog4j.configuration=log4j-server.properties
>>> -Dlog4j.defaultInitOverride=true -cp
>>> /opt/dve/cassandra/bin/../conf:/opt/dve/cassandra/bin/../build/classes/main:/opt/dve/cassandra/bin/../build/classes/thrift:/opt/dve/cassandra/bin/../lib/Cassandra-Extensions-1.0.0.jar:/opt/dve/cassandra/bin/../lib/antlr-3.2.jar:/opt/dve/cassandra/bin/../lib/apache-cassandra-1.0.3.jar:/opt/dve/cassandra/bin/../lib/apache-cassandra-clientutil-1.0.3.jar:/opt/dve/cassandra/bin/../lib/apache-cassandra-thrift-1.0.3.jar:/opt/dve/cassandra/bin/../lib/avro-1.4.0-fixes.jar:/opt/dve/cassandra/bin/../lib/avro-1.4.0-sources-fixes.jar:/opt/dve/cassandra/bin/../lib/commons-cli-1.1.jar:/opt/dve/cassandra/bin/../lib/commons-codec-1.2.jar:/opt/dve/cassandra/bin/../lib/commons-lang-2.4.jar:/opt/dve/cassandra/bin/../lib/compress-lzf-0.8.4.jar:/opt/dve/cassandra/bin/../lib/concurrentlinkedhashmap-lru-1.2.jar:/opt/dve/cassandra/bin/../lib/guava-r08.jar:/opt/dve/cassandra/bin/../lib/high-scale-lib-1.1.2.jar:/opt/dve/cassandra/bin/../lib/jackson-core-asl-1.4.0.jar:/opt/dve/cassandra/bin/../lib/jackson-mapper-asl-1.4.0.jar:/opt/dve/cassandra/bin/../lib/jamm-0.2.5.jar:/opt/dve/cassandra/bin/../lib/jline-0.9.94.jar:/opt/dve/cassandra/bin/../lib/json-simple-1.1.jar:/opt/dve/cassandra/bin/../lib/libthrift-0.6.jar:/opt/dve/cassandra/bin/../lib/log4j-1.2.16.jar:/opt/dve/cassandra/bin/../lib/servlet-api-2.5-20081211.jar:/opt/dve/cassandra/bin/../lib/slf4j-api-1.6.1.jar:/opt/dve/cassandra/bin/../lib/slf4j-log4j12-1.6.1.jar:/opt/dve/cassandra/bin/../lib/snakeyaml-1.6.jar:/opt/dve/cassandra/bin/../lib/snappy-java-1.0.4.1.jar
>>> org.apache.cassandra.thrift.CassandraDaemon
>>>
>>> ==
>>> # nodetool -h 127.0.0.1 -p 6080 info
>>> Token: 85070591730234615865843651857942052864
>>> Gossip active: true
>>> Load

Re: Much more native memory used by Cassandra then the configured JVM heap size

2012-06-11 Thread Jason Tang
See my post, I limit the HVM heap 6G, but actually Cassandra will use more
memory which is not calculated in JVM heap.

I use top to monitor total memory used by Cassandra.

=
-Xms6G -Xmx6G -Xmn1600M

2012/6/12 Jeffrey Kesselman 

> Btw.  I suggest you spin up JConsole as it will give you much more detai
> kon what your VM is actually doing.
>
>
>
> On Mon, Jun 11, 2012 at 9:14 PM, Jason Tang  wrote:
>
>> Hi
>>
>> We have some problem with Cassandra memory usage, we configure the JVM
>> HEAP 6G, but after runing Cassandra for several hours (insert, update,
>> delete). The total memory used by Cassandra go up to 15G, which cause the
>> OS low memory.
>>  So I wonder if it is normal to have so many memory used by cassandra?
>>
>> And how to limit the native memory used by Cassandra?
>>
>>
>> ===
>> Cassandra 1.0.3, 64 bit jdk.
>>
>> Memory ocupied by Cassandra 15G
>>   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
>>  9567 casadm20   0 28.3g  15g 9.1g S  269 65.1 385:57.65 java
>>
>> =
>> -Xms6G -Xmx6G -Xmn1600M
>>
>>  # ps -ef | grep  9567
>> casadm9567 1 55 Jun11 ?05:59:44 /opt/jdk1.6.0_29/bin/java
>> -ea -javaagent:/opt/dve/cassandra/bin/../lib/jamm-0.2.5.jar
>> -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms6G -Xmx6G
>> -Xmn1600M -XX:+HeapDumpOnOutOfMemoryError -Xss128k -XX:+UseParNewGC
>> -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
>> -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75
>> -XX:+UseCMSInitiatingOccupancyOnly -Djava.net.preferIPv4Stack=true
>> -Dcom.sun.management.jmxremote.port=6080
>> -Dcom.sun.management.jmxremote.ssl=false
>> -Dcom.sun.management.jmxremote.authenticate=false
>> -Daccess.properties=/opt/dve/cassandra/conf/access.properties
>> -Dpasswd.properties=/opt/dve/cassandra/conf/passwd.properties
>> -Dpasswd.mode=MD5 -Dlog4j.configuration=log4j-server.properties
>> -Dlog4j.defaultInitOverride=true -cp
>> /opt/dve/cassandra/bin/../conf:/opt/dve/cassandra/bin/../build/classes/main:/opt/dve/cassandra/bin/../build/classes/thrift:/opt/dve/cassandra/bin/../lib/Cassandra-Extensions-1.0.0.jar:/opt/dve/cassandra/bin/../lib/antlr-3.2.jar:/opt/dve/cassandra/bin/../lib/apache-cassandra-1.0.3.jar:/opt/dve/cassandra/bin/../lib/apache-cassandra-clientutil-1.0.3.jar:/opt/dve/cassandra/bin/../lib/apache-cassandra-thrift-1.0.3.jar:/opt/dve/cassandra/bin/../lib/avro-1.4.0-fixes.jar:/opt/dve/cassandra/bin/../lib/avro-1.4.0-sources-fixes.jar:/opt/dve/cassandra/bin/../lib/commons-cli-1.1.jar:/opt/dve/cassandra/bin/../lib/commons-codec-1.2.jar:/opt/dve/cassandra/bin/../lib/commons-lang-2.4.jar:/opt/dve/cassandra/bin/../lib/compress-lzf-0.8.4.jar:/opt/dve/cassandra/bin/../lib/concurrentlinkedhashmap-lru-1.2.jar:/opt/dve/cassandra/bin/../lib/guava-r08.jar:/opt/dve/cassandra/bin/../lib/high-scale-lib-1.1.2.jar:/opt/dve/cassandra/bin/../lib/jackson-core-asl-1.4.0.jar:/opt/dve/cassandra/bin/../lib/jackson-mapper-asl-1.4.0.jar:/opt/dve/cassandra/bin/../lib/jamm-0.2.5.jar:/opt/dve/cassandra/bin/../lib/jline-0.9.94.jar:/opt/dve/cassandra/bin/../lib/json-simple-1.1.jar:/opt/dve/cassandra/bin/../lib/libthrift-0.6.jar:/opt/dve/cassandra/bin/../lib/log4j-1.2.16.jar:/opt/dve/cassandra/bin/../lib/servlet-api-2.5-20081211.jar:/opt/dve/cassandra/bin/../lib/slf4j-api-1.6.1.jar:/opt/dve/cassandra/bin/../lib/slf4j-log4j12-1.6.1.jar:/opt/dve/cassandra/bin/../lib/snakeyaml-1.6.jar:/opt/dve/cassandra/bin/../lib/snappy-java-1.0.4.1.jar
>> org.apache.cassandra.thrift.CassandraDaemon
>>
>> ==
>> # nodetool -h 127.0.0.1 -p 6080 info
>> Token: 85070591730234615865843651857942052864
>> Gossip active: true
>> Load : 20.59 GB
>> Generation No: 1339423322
>> Uptime (seconds) : 39626
>> Heap Memory (MB) : 3418.42 / 5984.00
>> Data Center  : datacenter1
>> Rack : rack1
>> Exceptions   : 0
>>
>> =
>> All row cache and key cache are disabled by default
>>
>> Key cache: disabled
>> Row cache: disabled
>>
>>
>> ==
>>
>> # pmap 9567
>> 9567: java
>> START   SIZE RSS PSS   DIRTYSWAP PERM MAPPING
>> 4000 36K 36K 36K  0K  0K r-xp
>> /opt/jdk1.6.0_29/bin/java
>> 40108000  8K  8K  8K  8K  0K rwxp
>> /opt/jdk1.6.0_29/bin/java
>> 4010a000  18040K  17988K  17988K  17988K  0K rwxp [heap]
&

Much more native memory used by Cassandra then the configured JVM heap size

2012-06-11 Thread Jason Tang
Hi

We have some problem with Cassandra memory usage, we configure the JVM HEAP
6G, but after runing Cassandra for several hours (insert, update, delete).
The total memory used by Cassandra go up to 15G, which cause the OS low
memory.
 So I wonder if it is normal to have so many memory used by cassandra?

And how to limit the native memory used by Cassandra?


===
Cassandra 1.0.3, 64 bit jdk.

Memory ocupied by Cassandra 15G
  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
 9567 casadm20   0 28.3g  15g 9.1g S  269 65.1 385:57.65 java

=
-Xms6G -Xmx6G -Xmn1600M

 # ps -ef | grep  9567
casadm9567 1 55 Jun11 ?05:59:44 /opt/jdk1.6.0_29/bin/java
-ea -javaagent:/opt/dve/cassandra/bin/../lib/jamm-0.2.5.jar
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms6G -Xmx6G
-Xmn1600M -XX:+HeapDumpOnOutOfMemoryError -Xss128k -XX:+UseParNewGC
-XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly -Djava.net.preferIPv4Stack=true
-Dcom.sun.management.jmxremote.port=6080
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false
-Daccess.properties=/opt/dve/cassandra/conf/access.properties
-Dpasswd.properties=/opt/dve/cassandra/conf/passwd.properties
-Dpasswd.mode=MD5 -Dlog4j.configuration=log4j-server.properties
-Dlog4j.defaultInitOverride=true -cp
/opt/dve/cassandra/bin/../conf:/opt/dve/cassandra/bin/../build/classes/main:/opt/dve/cassandra/bin/../build/classes/thrift:/opt/dve/cassandra/bin/../lib/Cassandra-Extensions-1.0.0.jar:/opt/dve/cassandra/bin/../lib/antlr-3.2.jar:/opt/dve/cassandra/bin/../lib/apache-cassandra-1.0.3.jar:/opt/dve/cassandra/bin/../lib/apache-cassandra-clientutil-1.0.3.jar:/opt/dve/cassandra/bin/../lib/apache-cassandra-thrift-1.0.3.jar:/opt/dve/cassandra/bin/../lib/avro-1.4.0-fixes.jar:/opt/dve/cassandra/bin/../lib/avro-1.4.0-sources-fixes.jar:/opt/dve/cassandra/bin/../lib/commons-cli-1.1.jar:/opt/dve/cassandra/bin/../lib/commons-codec-1.2.jar:/opt/dve/cassandra/bin/../lib/commons-lang-2.4.jar:/opt/dve/cassandra/bin/../lib/compress-lzf-0.8.4.jar:/opt/dve/cassandra/bin/../lib/concurrentlinkedhashmap-lru-1.2.jar:/opt/dve/cassandra/bin/../lib/guava-r08.jar:/opt/dve/cassandra/bin/../lib/high-scale-lib-1.1.2.jar:/opt/dve/cassandra/bin/../lib/jackson-core-asl-1.4.0.jar:/opt/dve/cassandra/bin/../lib/jackson-mapper-asl-1.4.0.jar:/opt/dve/cassandra/bin/../lib/jamm-0.2.5.jar:/opt/dve/cassandra/bin/../lib/jline-0.9.94.jar:/opt/dve/cassandra/bin/../lib/json-simple-1.1.jar:/opt/dve/cassandra/bin/../lib/libthrift-0.6.jar:/opt/dve/cassandra/bin/../lib/log4j-1.2.16.jar:/opt/dve/cassandra/bin/../lib/servlet-api-2.5-20081211.jar:/opt/dve/cassandra/bin/../lib/slf4j-api-1.6.1.jar:/opt/dve/cassandra/bin/../lib/slf4j-log4j12-1.6.1.jar:/opt/dve/cassandra/bin/../lib/snakeyaml-1.6.jar:/opt/dve/cassandra/bin/../lib/snappy-java-1.0.4.1.jar
org.apache.cassandra.thrift.CassandraDaemon

==
# nodetool -h 127.0.0.1 -p 6080 info
Token: 85070591730234615865843651857942052864
Gossip active: true
Load : 20.59 GB
Generation No: 1339423322
Uptime (seconds) : 39626
Heap Memory (MB) : 3418.42 / 5984.00
Data Center  : datacenter1
Rack : rack1
Exceptions   : 0

=
All row cache and key cache are disabled by default

Key cache: disabled
Row cache: disabled


==

# pmap 9567
9567: java
START   SIZE RSS PSS   DIRTYSWAP PERM MAPPING
4000 36K 36K 36K  0K  0K r-xp
/opt/jdk1.6.0_29/bin/java
40108000  8K  8K  8K  8K  0K rwxp
/opt/jdk1.6.0_29/bin/java
4010a000  18040K  17988K  17988K  17988K  0K rwxp [heap]
00067ae0 6326700K 6258664K 6258664K 6258664K  0K rwxp [anon]
0007fd06b000  48724K  0K  0K  0K  0K rwxp [anon]
7fbed153 1331104K  0K  0K  0K  0K r-xs
/var/cassandra/data/drc/queue-hb-219-Data.db
7fbf22918000 2097152K  0K  0K  0K  0K r-xs
/var/cassandra/data/drc/queue-hb-219-Data.db
7fbfa2918000 2097148K 1124464K 1124462K  0K  0K r-xs
/var/cassandra/data/drc/queue-hb-219-Data.db
7fc022917000 2097156K 2096496K 2096492K  0K  0K r-xs
/var/cassandra/data/drc/queue-hb-219-Data.db
7fc0a2918000 2097148K 2097148K 2097146K  0K  0K r-xs
/var/cassandra/data/drc/queue-hb-219-Data.db
7fc1a2917000 733584K   6444K   6444K  0K  0K r-xs
/var/cassandra/data/drc/queue-hb-109-Data.db
7fc1cf57b000 2097148K  20980K  20980K  0K  0K r-xs
/var/cassandra/data/drc/queue-hb-109-Data.db
7fc24f57a000 2097152K 456480K 456478K  0K  0K r-xs
/var/cassandra/data/drc/queue-hb-109-Data.db
7fc2cf57a000 2097156K 1168320K 1168318K  

TimedOutException caused by "Stop the world" activity

2012-05-27 Thread Jason Tang
Hi

My system is 4 nodes 64 bit cassandra cluster, 6G big per node,default
configuration (which means 1/3 heap for memtable), replicate number 3,
write all, read one.
When I run stress load testing, I got this TimedOutException, and some
operation failed, and all traffic hang for a while.

And when I have 1G memory 32 bit cassandra on standalone model, I didn't
find so frequently "Stop the world" behavior.

So I wonder what kind of operation will hang the cassandra system.

How to collect information for tuning.

>From the system log and document, I guess there are three type operations:
1) Flush memtable when meet max size
2) Compact SSTable (why?)
3) Java GC

system.log:
 INFO [main] 2012-05-25 16:12:17,054 ColumnFamilyStore.java (line 688)
Enqueuing flush of Memtable-LocationInfo@1229893321(53/66 serialized/live
bytes, 2 ops)
 INFO [FlushWriter:1] 2012-05-25 16:12:17,054 Memtable.java (line 239)
Writing Memtable-LocationInfo@1229893321(53/66 serialized/live bytes, 2 ops)
 INFO [FlushWriter:1] 2012-05-25 16:12:17,166 Memtable.java (line 275)
Completed flushing
/var/proclog/raw/cassandra/data/system/LocationInfo-hb-2-Data.db (163 bytes)
...

 INFO [CompactionExecutor:441] 2012-05-28 08:02:55,345 CompactionTask.java
(line 112) Compacting
[SSTableReader(path='/var/proclog/raw/cassandra/data/myks/queue-hb-41-Data.db'),
SSTableReader(path='/var/proclog/raw/cassandra/data/
myks /queue-hb-32-Data.db'),
SSTableReader(path='/var/proclog/raw/cassandra/data/
myks /queue-hb-37-Data.db'),
SSTableReader(path='/var/proclog/raw/cassandra/data/
myks /queue-hb-53-Data.db')]
...

 WARN [ScheduledTasks:1] 2012-05-28 08:02:26,619 GCInspector.java (line
146) Heap is 0.7993011015621736 full.  You may need to reduce memtable
and/or cache sizes.  Cassandra will now flush up to the two largest
memtables to free up memory.  Adjust flush_largest_memtables_at threshold
in cassandra.yaml if you don't want Cassandra to do this automatically
 INFO [ScheduledTasks:1] 2012-05-28 08:02:54,980 GCInspector.java (line
123) GC for ConcurrentMarkSweep: 728 ms for 2 collections, 3594946600 used;
max is 6274678784
 INFO [ScheduledTasks:1] 2012-05-28 08:41:34,030 GCInspector.java (line
123) GC for ParNew: 1668 ms for 1 collections, 4171503448 used; max is
6274678784
 INFO [ScheduledTasks:1] 2012-05-28 08:41:48,978 GCInspector.java (line
123) GC for ParNew: 1087 ms for 1 collections, 2623067496 used; max is
6274678784
 INFO [ScheduledTasks:1] 2012-05-28 08:41:48,987 GCInspector.java (line
123) GC for ConcurrentMarkSweep: 3198 ms for 3 collections, 2623361280
used; max is 6274678784


Timeout Exception:
Caused by: org.apache.cassandra.thrift.TimedOutException: null
at
org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.java:19495)
~[na:na]
at
org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:1035)
~[na:na]
at
org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:1009)
~[na:na]
at
me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceServiceImpl.java:95)
~[na:na]
... 64 common frames omitted

BRs
//Tang Weiqiang


Re: Cassandra search performance

2012-05-12 Thread Jason Tang
I try to search one column, this column store the time as the type Long,
1,000,000 data equally distributed in 24 hours, I only want to search
certain time rang, eg from 01:30 to 01:50 or 08:00 to 12:00, but something
stranger happened.

Search 00:00 to 23:59 limit 100
It took less then 1 second scan 100 record

Search 00:00 to 00:20 limit 100
It took more then one minute scan around 2,400 recods

So the result shows it seems cassandra scan one by one to match the
condition, and the data is not ordered in sequence.

One more thing, to have equal condition, I make a redundant column to have
equal condition, the value is same for all records.
The search condition like get record where equal='equal' and time > 00:00
and time < 00:20

Is it the expected behavior of secondary index or I didn't use it correct.

Because I used to have another test, I have one string column most of it is
string 'true' and I add 100 'false' among 1,000,000 'true' , it shows it
only scan 100 records.

So how can I exam what happened inside cassadra, and where I can find out
the detail of how secondary works?

在 2012年5月8日星期二,Maxim Potekhin 写道:

>  Thanks for the comments, much appreciated.
>
> Maxim
>
>
> On 5/7/2012 3:22 AM, David Jeske wrote:
>
> On Sun, Apr 29, 2012 at 4:32 PM, Maxim Potekhin 
> 
> > wrote:
>
>> Looking at your example,as I think you understand, you forgo indexes by
>> combining two conditions in one query, thinking along the lines of what is
>> often done in RDBMS. A scan is expected in this case, and there is no
>> magic to avoid it.
>>
>
>  This sounds like a mis-understanding of how RDBMSs work. If you combine
> two conditions in a single SQL query, the SQL execution optimizer looks at
> the cardinality of any indicies. If it can successfully predict that one of
> the conditions significantly reduces the set of rows that would be
> considered (such as a status match having 200 hits vs 1M rows in the
> table), then it selects this index for the first-iteration, and each index
> hit causes a record lookup which is then tested for the other conditions.
>  (This is one of several query-execution types RDBMS systems use)
>
>  I'm no Cassandra expert, so I don't know what it does WRT
> index-selection, but from the page written on secondary indicies, it seems
> like if you just query on status, and do the other filtering yourself it'll
> probably do what you want...
>
>  http://www.datastax.com/dev/blog/whats-new-cassandra-07-secondary-indexes
>
>
>>  However, if this query is important, you can easily index on two
>> conditions,
>> using a composite type (look it up), or string concatenation for quick and
>> easy solution.
>>
>
>  This is not necessarily a good idea. Creating a composite index explodes
> the index size unnecessarily. If a condition can reduce a query to 200
> records, there is no need to have a composite index including another
> condition.
>
>
>


Re: Cassandra search performance

2012-04-25 Thread Jason Tang
1.0.8

在 2012年4月25日 下午10:38,Philip Shon 写道:

> what version of cassandra are you using.  I found a big performance hit
> when querying on the secondary index.
>
> I came across this bug in versions prior to 1.1
>
> https://issues.apache.org/jira/browse/CASSANDRA-3545
>
> Hope that helps.
>
> 2012/4/25 Jason Tang 
>
>> And I found, if I only have the search condition "status", it only scan
>> 200 records.
>>
>> But if I combine another condition "partition" then it scan all records
>> because "partition" condition match all records.
>>
>> But combine with other condition such as "userName", even all "userName"
>> is same in the 1,000,000 records, it only scan 200 records.
>>
>> So it impacted by scan execution plan, if we have several search
>> conditions, how it works? Do we have the similar execution plan in
>> Cassandra?
>>
>>
>> 在 2012年4月25日 下午9:18,Jason Tang 写道:
>>
>> Hi
>>>
>>>We have the such CF, and use secondary index to search for simple
>>> data "status", and among 1,000,000 row records, we have 200 records with
>>> status we want.
>>>
>>>   But when we start to search, the performance is very poor, and check
>>> with the command "./bin/nodetool -h localhost -p 8199 cfstats" , Cassandra
>>> read 1,000,000 records, and "Read Latency" is 0.2 ms, so totally it used
>>> 200 seconds.
>>>
>>>   It use lots of CPU, and check the stack, all thread in Cassandra is
>>> read from socket.
>>>
>>>   So I wonder, how to really use index to find the 200 records instead
>>> of scan all rows. (Supper Column?)
>>>
>>> *ColumnFamily: queue*
>>> *  Key Validation Class: org.apache.cassandra.db.marshal.BytesType*
>>> *  Default column value validator:
>>> org.apache.cassandra.db.marshal.BytesType*
>>> *  Columns sorted by: org.apache.cassandra.db.marshal.BytesType*
>>> *  Row cache size / save period in seconds / keys to save :
>>> 0.0/0/all*
>>> *  Row Cache Provider:
>>> org.apache.cassandra.cache.ConcurrentLinkedHashCacheProvider*
>>> *  Key cache size / save period in seconds: 0.0/0*
>>> *  GC grace seconds: 0*
>>> *  Compaction min/max thresholds: 4/32*
>>> *  Read repair chance: 0.0*
>>> *  Replicate on write: false*
>>> *  Bloom Filter FP chance: default*
>>> *  Built indexes: [queue.idxStatus]*
>>> *  Column Metadata:*
>>> *Column Name: status (737461747573)*
>>> *  Validation Class: org.apache.cassandra.db.marshal.AsciiType*
>>> *  Index Name: idxStatus*
>>> *  Index Type: KEYS*
>>> *
>>> *
>>> BRs
>>>  //Jason
>>>
>>
>>
>


Re: Cassandra search performance

2012-04-25 Thread Jason Tang
And I found, if I only have the search condition "status", it only scan 200
records.

But if I combine another condition "partition" then it scan all records
because "partition" condition match all records.

But combine with other condition such as "userName", even all "userName" is
same in the 1,000,000 records, it only scan 200 records.

So it impacted by scan execution plan, if we have several search
conditions, how it works? Do we have the similar execution plan in
Cassandra?


在 2012年4月25日 下午9:18,Jason Tang 写道:

> Hi
>
>We have the such CF, and use secondary index to search for simple data
> "status", and among 1,000,000 row records, we have 200 records with status
> we want.
>
>   But when we start to search, the performance is very poor, and check
> with the command "./bin/nodetool -h localhost -p 8199 cfstats" , Cassandra
> read 1,000,000 records, and "Read Latency" is 0.2 ms, so totally it used
> 200 seconds.
>
>   It use lots of CPU, and check the stack, all thread in Cassandra is read
> from socket.
>
>   So I wonder, how to really use index to find the 200 records instead of
> scan all rows. (Supper Column?)
>
> *ColumnFamily: queue*
> *  Key Validation Class: org.apache.cassandra.db.marshal.BytesType*
> *  Default column value validator:
> org.apache.cassandra.db.marshal.BytesType*
> *  Columns sorted by: org.apache.cassandra.db.marshal.BytesType*
> *  Row cache size / save period in seconds / keys to save : 0.0/0/all*
> *  Row Cache Provider:
> org.apache.cassandra.cache.ConcurrentLinkedHashCacheProvider*
> *  Key cache size / save period in seconds: 0.0/0*
> *  GC grace seconds: 0*
> *  Compaction min/max thresholds: 4/32*
> *  Read repair chance: 0.0*
> *  Replicate on write: false*
> *  Bloom Filter FP chance: default*
> *  Built indexes: [queue.idxStatus]*
> *  Column Metadata:*
> *Column Name: status (737461747573)*
> *  Validation Class: org.apache.cassandra.db.marshal.AsciiType*
> *  Index Name: idxStatus*
> *  Index Type: KEYS*
> *
> *
> BRs
> //Jason
>


Cassandra search performance

2012-04-25 Thread Jason Tang
Hi

   We have the such CF, and use secondary index to search for simple data
"status", and among 1,000,000 row records, we have 200 records with status
we want.

  But when we start to search, the performance is very poor, and check with
the command "./bin/nodetool -h localhost -p 8199 cfstats" , Cassandra read
1,000,000 records, and "Read Latency" is 0.2 ms, so totally it used 200
seconds.

  It use lots of CPU, and check the stack, all thread in Cassandra is read
from socket.

  So I wonder, how to really use index to find the 200 records instead of
scan all rows. (Supper Column?)

*ColumnFamily: queue*
*  Key Validation Class: org.apache.cassandra.db.marshal.BytesType*
*  Default column value validator:
org.apache.cassandra.db.marshal.BytesType*
*  Columns sorted by: org.apache.cassandra.db.marshal.BytesType*
*  Row cache size / save period in seconds / keys to save : 0.0/0/all*
*  Row Cache Provider:
org.apache.cassandra.cache.ConcurrentLinkedHashCacheProvider*
*  Key cache size / save period in seconds: 0.0/0*
*  GC grace seconds: 0*
*  Compaction min/max thresholds: 4/32*
*  Read repair chance: 0.0*
*  Replicate on write: false*
*  Bloom Filter FP chance: default*
*  Built indexes: [queue.idxStatus]*
*  Column Metadata:*
*Column Name: status (737461747573)*
*  Validation Class: org.apache.cassandra.db.marshal.AsciiType*
*  Index Name: idxStatus*
*  Index Type: KEYS*
*
*
BRs
//Jason


Consistence for node shutdown and startup

2011-12-11 Thread Jason Tang
Hi

   Here is the case, if we have only two nodes, which share the data (write
one, read one),
   node One  node Two
|  Stopped Continue working and update the
data.
|  stopped  stopped
|  start working   stopped
|  update data stopped
|  startedstart working
v

 How about the conflict data when the two node on line separately. How it
synchronized by two nodes when they both on line finally?

BRs
//Tang Weiqiang