Re: Tuning cassandra (compactions overall)

2012-05-24 Thread Alain RODRIGUEZ
I already had this kind of trouble while repairing a month ago. I have
problems that I am the only one to have. I guess I have something
wrong either in the configuration of my nodes or in my data that makes
them wrong after a restart/repair.

I am planning to try deploying an EC2 cluster with datastax AMI to be
sure of the integrity of the server. I will put my data into it and
try all maintenance reparation (repair, cleanup, rolling restart and
so on).

If you are interested in my start logs, here they are :

https://gist.github.com/2762493
https://gist.github.com/2762495

Thanks for your help Aaron.

Alain

2012/5/23 aaron morton aa...@thelastpickle.com:
 I've not heard of anything like that in the recent versions. There were some
 issues in the early
 0.8 https://github.com/apache/cassandra/blob/trunk/NEWS.txt#L383

 If you are on a recent version can you please create a jira
 ticket https://issues.apache.org/jira/browse/CASSANDRA describing what you
 think happened.

 If you have kept the logs from the startup and can make them available
 please do.

 Thanks

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 23/05/2012, at 12:42 AM, Alain RODRIGUEZ wrote:

 not sure what you mean by
 And after restarting the second one I have lost all the consistency of
 my data. All my statistics since September are totally false now in
 production

 Can you give some examples?

 After restarting my 2 nodes (one after the other), All my counters
 have become wrong. The counters values were modified by the restart.
 Let's say I had a counter column called 20120101#click that value was
 569, after the restart the value has become 751. I think that all the
 values have increased (I'm not sure) but all counters have increased
 in differents way, some values have increased a lot other just a bit.

 Counter are not idempotent so if the client app retries TimedOut
 requests you can get an over count. That should not result in lost
 data.

 Some of these counters haven't be written since September and have
 still been modified by the restart.

 Have you been running repair ?

 Yes, Repair didn't helped. I have the feeling that repairing doesn't
 work on counters.

 I have restored the data now, but I am afraid of restarting any node.
 I can remain in this position too long...




Re: Tuning cassandra (compactions overall)

2012-05-23 Thread aaron morton
I've not heard of anything like that in the recent versions. There were some 
issues in the early 0.8 
https://github.com/apache/cassandra/blob/trunk/NEWS.txt#L383

If you are on a recent version can you please create a jira ticket 
https://issues.apache.org/jira/browse/CASSANDRA describing what you think 
happened. 

If you have kept the logs from the startup and can make them available please 
do. 

Thanks

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 23/05/2012, at 12:42 AM, Alain RODRIGUEZ wrote:

 not sure what you mean by
 And after restarting the second one I have lost all the consistency of
 my data. All my statistics since September are totally false now in
 production
 
 Can you give some examples?
 
 After restarting my 2 nodes (one after the other), All my counters
 have become wrong. The counters values were modified by the restart.
 Let's say I had a counter column called 20120101#click that value was
 569, after the restart the value has become 751. I think that all the
 values have increased (I'm not sure) but all counters have increased
 in differents way, some values have increased a lot other just a bit.
 
 Counter are not idempotent so if the client app retries TimedOut
 requests you can get an over count. That should not result in lost
 data.
 
 Some of these counters haven't be written since September and have
 still been modified by the restart.
 
 Have you been running repair ?
 
 Yes, Repair didn't helped. I have the feeling that repairing doesn't
 work on counters.
 
 I have restored the data now, but I am afraid of restarting any node.
 I can remain in this position too long...



Re: Tuning cassandra (compactions overall)

2012-05-22 Thread aaron morton
not sure what you mean by 
 And after restarting the second one I have lost all the consistency of
 my data. All my statistics since September are totally false now in
 production

Can you give some examples?
Counter are not idempotent so if the client app retries TimedOut requests you 
can get an over count. That should not result in lost data.

 As reminder I'm using a 2 node cluster RF=2, CL.ONE

Have you been running repair ? 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 22/05/2012, at 1:27 AM, Alain RODRIGUEZ wrote:

 Hi Aaron.
 
 I wanted to try the new config. After doing a rolling restart I have
 all my counters false, with wrong values. I stopped my servers with
 the following :
 
 nodetool -h localhost disablegossip
 nodetool -h localhost disablethrift
 nodetool -h localhost drain
 kill cassandra sigterm (15) via htop
 
 And after restarting the second one I have lost all the consistency of
 my data. All my statistics since September are totally false now in
 production.
 
 As reminder I'm using a 2 node cluster RF=2, CL.ONE
 
 1 - How to fix it ? (I have a backup from this morning, but I will
 lose all the data after this date if I restore this backup)
 2 - What happened ? How to avoid it ?
 
 Any Idea would be greatly appreciated, I'm quite desperated.
 
 Alain
 
 2012/5/17 aaron morton aa...@thelastpickle.com:
 What is the the benefit of having more memory ? I mean, I don't
 
 understand why having 1, 2, 4, 8 or 16 GB of memory is so different.
 
 Less frequent and less aggressive garbage collection frees up CPU resources
 to run the database.
 
 Less memory results in frequent and aggressive (i.e. stop the world) GC, and
 increase IO pressure. Which reduces read performance and in the extreme can
 block writes.
 
 The memory used inside
 
 the heap will remains close to the max memory available, therefore
 having more or less memory doesn't matter.
 
 Not an ideal situation. Becomes difficult to find an contiguous region of
 memory to allocate.
 
 Can you enlighten me about this point ?
 
 It's a database server, it is going to work better with more memory. Also
 it's Java and it's designed to run on multiple machines with many GB's of
 ram available. There are better arguments
 here http://wiki.apache.org/cassandra/CassandraHardware
 
 
 I'm interested a lot in learning about some configuration I can use to
 reach better peformance/stability as well as in learning about how
 Cassandra works.
 
 Turn off all caches.
 
 In the schema increase the bloom filter false positive rate (see help in the
 cli for Create column family)
 
 In the yaml experiment with these changes:
 * reduce sliced_buffer_size_in_kb
 * reduce column_index_size_in_kb
 * reduce in_memory_compaction_limit_in_mb
 * increase index_interval
 * set concurrent_compactors to 2
 
 Cheers
 
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 17/05/2012, at 12:40 AM, Alain RODRIGUEZ wrote:
 
 Using c1.medium, we are currently able to deliver the service.
 
 What is the the benefit of having more memory ? I mean, I don't
 understand why having 1, 2, 4, 8 or 16 GB of memory is so different.
 In my mind, Cassandra will fill the heap and from then, start to flush
 and compact to avoid OOMing and fill it again. The memory used inside
 the heap will remains close to the max memory available, therefore
 having more or less memory doesn't matter.
 
 I'm pretty sure I misunderstand or forget something about how the
 memory is used but not sure about what.
 
 Can you enlighten me about this point ?
 
 If I understand why the memory size is that important I will probably
 be able to argue about the importance of having more memory and my
 boss will probably allow me to spend more money to get better servers.
 
 There are some changes you can make to mitigate things (let me know
 if you need help), but this is essentially a memory problem.
 
 I'm interested a lot in learning about some configuration I can use to
 reach better peformance/stability as well as in learning about how
 Cassandra works.
 
 Thanks for the help you give to people and for sharing your knowledge
 with us. I appreciate a lot the Cassandra community and the most
 active people keeping it alive. It's worth being said :).
 
 Alain
 
 



Re: Tuning cassandra (compactions overall)

2012-05-22 Thread Alain RODRIGUEZ
not sure what you mean by
And after restarting the second one I have lost all the consistency of
my data. All my statistics since September are totally false now in
production

Can you give some examples?

After restarting my 2 nodes (one after the other), All my counters
have become wrong. The counters values were modified by the restart.
Let's say I had a counter column called 20120101#click that value was
569, after the restart the value has become 751. I think that all the
values have increased (I'm not sure) but all counters have increased
in differents way, some values have increased a lot other just a bit.

Counter are not idempotent so if the client app retries TimedOut
requests you can get an over count. That should not result in lost
data.

Some of these counters haven't be written since September and have
still been modified by the restart.

Have you been running repair ?

Yes, Repair didn't helped. I have the feeling that repairing doesn't
work on counters.

I have restored the data now, but I am afraid of restarting any node.
I can remain in this position too long...


Re: Tuning cassandra (compactions overall)

2012-05-21 Thread Alain RODRIGUEZ
Hi Aaron.

I wanted to try the new config. After doing a rolling restart I have
all my counters false, with wrong values. I stopped my servers with
the following :

nodetool -h localhost disablegossip
nodetool -h localhost disablethrift
nodetool -h localhost drain
kill cassandra sigterm (15) via htop

And after restarting the second one I have lost all the consistency of
my data. All my statistics since September are totally false now in
production.

As reminder I'm using a 2 node cluster RF=2, CL.ONE

1 - How to fix it ? (I have a backup from this morning, but I will
lose all the data after this date if I restore this backup)
2 - What happened ? How to avoid it ?

Any Idea would be greatly appreciated, I'm quite desperated.

Alain

2012/5/17 aaron morton aa...@thelastpickle.com:
 What is the the benefit of having more memory ? I mean, I don't

 understand why having 1, 2, 4, 8 or 16 GB of memory is so different.

 Less frequent and less aggressive garbage collection frees up CPU resources
 to run the database.

 Less memory results in frequent and aggressive (i.e. stop the world) GC, and
 increase IO pressure. Which reduces read performance and in the extreme can
 block writes.

 The memory used inside

 the heap will remains close to the max memory available, therefore
 having more or less memory doesn't matter.

 Not an ideal situation. Becomes difficult to find an contiguous region of
 memory to allocate.

 Can you enlighten me about this point ?

 It's a database server, it is going to work better with more memory. Also
 it's Java and it's designed to run on multiple machines with many GB's of
 ram available. There are better arguments
 here http://wiki.apache.org/cassandra/CassandraHardware


 I'm interested a lot in learning about some configuration I can use to
 reach better peformance/stability as well as in learning about how
 Cassandra works.

 Turn off all caches.

 In the schema increase the bloom filter false positive rate (see help in the
 cli for Create column family)

 In the yaml experiment with these changes:
 * reduce sliced_buffer_size_in_kb
 * reduce column_index_size_in_kb
 * reduce in_memory_compaction_limit_in_mb
 * increase index_interval
 * set concurrent_compactors to 2

 Cheers

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 17/05/2012, at 12:40 AM, Alain RODRIGUEZ wrote:

 Using c1.medium, we are currently able to deliver the service.

 What is the the benefit of having more memory ? I mean, I don't
 understand why having 1, 2, 4, 8 or 16 GB of memory is so different.
 In my mind, Cassandra will fill the heap and from then, start to flush
 and compact to avoid OOMing and fill it again. The memory used inside
 the heap will remains close to the max memory available, therefore
 having more or less memory doesn't matter.

 I'm pretty sure I misunderstand or forget something about how the
 memory is used but not sure about what.

 Can you enlighten me about this point ?

 If I understand why the memory size is that important I will probably
 be able to argue about the importance of having more memory and my
 boss will probably allow me to spend more money to get better servers.

 There are some changes you can make to mitigate things (let me know
 if you need help), but this is essentially a memory problem.

 I'm interested a lot in learning about some configuration I can use to
 reach better peformance/stability as well as in learning about how
 Cassandra works.

 Thanks for the help you give to people and for sharing your knowledge
 with us. I appreciate a lot the Cassandra community and the most
 active people keeping it alive. It's worth being said :).

 Alain




Re: Tuning cassandra (compactions overall)

2012-05-16 Thread Alain RODRIGUEZ
Using c1.medium, we are currently able to deliver the service.

What is the the benefit of having more memory ? I mean, I don't
understand why having 1, 2, 4, 8 or 16 GB of memory is so different.
In my mind, Cassandra will fill the heap and from then, start to flush
and compact to avoid OOMing and fill it again. The memory used inside
the heap will remains close to the max memory available, therefore
having more or less memory doesn't matter.

I'm pretty sure I misunderstand or forget something about how the
memory is used but not sure about what.

Can you enlighten me about this point ?

If I understand why the memory size is that important I will probably
be able to argue about the importance of having more memory and my
boss will probably allow me to spend more money to get better servers.

There are some changes you can make to mitigate things (let me know
if you need help), but this is essentially a memory problem.

I'm interested a lot in learning about some configuration I can use to
reach better peformance/stability as well as in learning about how
Cassandra works.

Thanks for the help you give to people and for sharing your knowledge
with us. I appreciate a lot the Cassandra community and the most
active people keeping it alive. It's worth being said :).

Alain


Re: Tuning cassandra (compactions overall)

2012-05-16 Thread aaron morton
 What is the the benefit of having more memory ? I mean, I don't
 understand why having 1, 2, 4, 8 or 16 GB of memory is so different.
Less frequent and less aggressive garbage collection frees up CPU resources to 
run the database. 

Less memory results in frequent and aggressive (i.e. stop the world) GC, and 
increase IO pressure. Which reduces read performance and in the extreme can 
block writes. 

 The memory used inside
 the heap will remains close to the max memory available, therefore
 having more or less memory doesn't matter.
Not an ideal situation. Becomes difficult to find an contiguous region of 
memory to allocate.  

 Can you enlighten me about this point ?
It's a database server, it is going to work better with more memory. Also it's 
Java and it's designed to run on multiple machines with many GB's of ram 
available. There are better arguments here 
http://wiki.apache.org/cassandra/CassandraHardware

 
 I'm interested a lot in learning about some configuration I can use to
 reach better peformance/stability as well as in learning about how
 Cassandra works.

Turn off all caches.

In the schema increase the bloom filter false positive rate (see help in the 
cli for Create column family)

In the yaml experiment with these changes:
* reduce sliced_buffer_size_in_kb
* reduce column_index_size_in_kb 
* reduce in_memory_compaction_limit_in_mb
* increase index_interval
* set concurrent_compactors to 2

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 17/05/2012, at 12:40 AM, Alain RODRIGUEZ wrote:

 Using c1.medium, we are currently able to deliver the service.
 
 What is the the benefit of having more memory ? I mean, I don't
 understand why having 1, 2, 4, 8 or 16 GB of memory is so different.
 In my mind, Cassandra will fill the heap and from then, start to flush
 and compact to avoid OOMing and fill it again. The memory used inside
 the heap will remains close to the max memory available, therefore
 having more or less memory doesn't matter.
 
 I'm pretty sure I misunderstand or forget something about how the
 memory is used but not sure about what.
 
 Can you enlighten me about this point ?
 
 If I understand why the memory size is that important I will probably
 be able to argue about the importance of having more memory and my
 boss will probably allow me to spend more money to get better servers.
 
 There are some changes you can make to mitigate things (let me know
 if you need help), but this is essentially a memory problem.
 
 I'm interested a lot in learning about some configuration I can use to
 reach better peformance/stability as well as in learning about how
 Cassandra works.
 
 Thanks for the help you give to people and for sharing your knowledge
 with us. I appreciate a lot the Cassandra community and the most
 active people keeping it alive. It's worth being said :).
 
 Alain



Tuning cassandra (compactions overall)

2012-05-15 Thread Alain RODRIGUEZ
Hi,

I'm using a 2 node cluster in production ( 2 EC2 c1.medium, CL.ONE, RF
= 2, using RP)

1 - I got this kind of message quite often (let's say every 30 seconds) :

WARN [ScheduledTasks:1] 2012-05-15 15:44:53,083 GCInspector.java (line
145) Heap is 0.8081418550931491 full.  You may need to reduce memtable
and/or cache sizes.  Cassandra will now flush up to the two largest
memtables to free up memory.  Adjust flush_largest_memtables_at
threshold in cassandra.yaml if you don't want Cassandra to do this
automatically
 WARN [ScheduledTasks:1] 2012-05-15 15:44:53,084 StorageService.java
(line 2645) Flushing CFS(Keyspace='xxx', ColumnFamily='yyy') to
relieve memory pressure

Is that a problem ?

2 - I shared 2 screenshot the cluster performance (via OpsCenter) and
the hardware metrics (via AWS).

http://img337.imageshack.us/img337/6812/performance.png
http://img256.imageshack.us/img256/9644/aws.png

What do you think of these metrics ? Are frequents compaction normal ?
What about having a 60-70% cpu load for 600 ReadsWrites/sec with this
hardware ? Is there a way to optimize my cluster ?

Here you got the main points of my cassandra.yaml :

flush_largest_memtables_at: 0.75
reduce_cache_sizes_at: 0.85
reduce_cache_capacity_to: 0.6
concurrent_reads: 32
concurrent_writes: 32
commitlog_total_space_in_mb: 4096
rpc_server_type: sync (I am going to switch to hsha, because we are
using ubuntu)
#concurrent_compactors: 1 (commented, so I use default)
multithreaded_compaction: false
compaction_throughput_mb_per_sec: 16
rpc_timeout_in_ms: 1

others tuning options (as many of the ones above) are default.

Any advice or comment would be appreciated :).

Alain