Re: Failing operations repair

2012-06-12 Thread crypto five
It would be really great to look at your slides. Do you have any plans to
share your presentation?

On Sat, Jun 9, 2012 at 1:14 AM, Віталій Тимчишин tiv...@gmail.com wrote:

 Thanks a lot. I was not sure if coordinator somehow tries to roll-back
 transactions that failed to reach it's consistency level.
 (Yet I could not imagine a method to do this, without 2-phase commit :) )


 2012/6/8 aaron morton aa...@thelastpickle.com

 I am making some cassandra presentations in Kyiv and would like to check
 that I am telling people truth :)

 Thanks for spreading the word :)

 1) Failed (from client-side view) operation may still be applied to
 cluster

 Yes.
 If you fail with UnavailableException it's because from the coordinators
 view of the cluster there is less than CL nodes available. So retry.
 Somewhat similar story with TimedOutException.

 2) Coordinator does not try anything to roll-back operation that failed
 because it was processed by less then consitency level number of nodes.

 Correct.

 3) Hinted handoff works only for successfull operations.

 HH will be stored if the coordinator proceeds with the request.
 In 1.X HH is stored on the coordinator if a replica is down when the
 request starts and if the node does not reply in rpc_timeout.

 4) Counters are not reliable because of (1)

 If you get a TimedOutException when writing a counter you should not
 re-send the request.

 5) Read-repair may help to propagate operation that was failed it's
 consistency level, but was persisted to some nodes.

 Yes. It works in the background, by default is only enabled on 10% of
 requests.
 Note that RR is not the same as the Consistent Level for read. If you
 work as a CL  ONE the results from CL nodes are always compared and
 differences resolved. RR is concerned with the replicas not involved in the
 CL read.

 6) Manual repair is still needed because of (2) and (3)

 Manual repair is *the* was to achieve consistency of data on disk. HH and
 RR are optimisations designed to reduce the chance of a Digest Mismatch
 during a read with CL  ONE.
 It is also essential for distributing Tombstones before they are purged
 by compaction.

 P.S. If some points apply only to some cassandra versions, I will be
 happy to know this too.

 Assume everyone for version 1.X

 Thanks

   -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 8/06/2012, at 1:20 AM, Віталій Тимчишин wrote:

 Hello.

 I am making some cassandra presentations in Kyiv and would like to check
 that I am telling people truth :)
 Could community tell me if next points are true:
 1) Failed (from client-side view) operation may still be applied to
 cluster
 2) Coordinator does not try anything to roll-back operation that failed
 because it was processed by less then consitency level number of nodes.
 3) Hinted handoff works only for successfull operations.
 4) Counters are not reliable because of (1)
 5) Read-repair may help to propagate operation that was failed it's
 consistency level, but was persisted to some nodes.
 6) Manual repair is still needed because of (2) and (3)

 P.S. If some points apply only to some cassandra versions, I will be
 happy to know this too.
 --
 Best regards,
  Vitalii Tymchyshyn





 --
 Best regards,
  Vitalii Tymchyshyn



Re: cassandra read latency help

2012-05-31 Thread crypto five
You may also consider disabling key/row cache at all.
1mm rows * 400 bytes = 400MB of data, can easily be in fs cache, and you
will access your hot keys with thousands of qps without hitting disk at all.
Enabling compression can make situation even better.

On Thu, May 31, 2012 at 12:01 PM, Gurpreet Singh
gurpreet.si...@gmail.comwrote:

 Aaron,
 Thanks for your email. The test kinda resembles how the actual application
 will be.
 It is going to be a simple key-value store with 500 million keys per node.
 The traffic will be read heavy in steady state, and there will be some keys
 that will have a lot more traffic than others. The expected hot rows are
 estimated to be anywhere between 50  to 1 million keys.

 I have already populated this test system with 500 million keys, compacted
 it all to 1 file to check the size of the bloom filter and the index.

 This is how i am estimating my memory for 500 million keys. plz correct me
 if i am wrong or if i am missing any step.

 bloom filter: 1 gig
 index samples: Index file is 8.5 gig. I believe this index file is for all
 keys. Index interval is 128. Hence in RAM, this would be (8.5g / 128)*10
 (factor for datastructure overhead) = 664 mb (lets say 1 gig)

 key cache size (3 million): 3 gigs
 memtable_total_space_mb : 2 gigs

 This totals 7 gig.
 my heap size is 8 gigs.
 Is there anything else that i am missing here?
 When i do top right now, it shows java as 96% memory, thats a concern
 because there is no write load. Should i be looking at any other number
 here?

 Off heap row cache: 500,000 - 750,000 ~ 3 and 5 gigs (avg row size =
 250-500 bytes)

 My test system has 16 gigs RAM, production system will mostly have 32 gigs
 RAM and 12 spindles instead of 6 that i am testing with.

 I changed the underneath filesystem from xfs to ext2, and i am seeing
 better results, though not the best.
 The cfstats latency is down to 20 ms for 35 qps read load. row cache hit
 rate is 0.21, key cache = 0.75.
 Measuring from the client side, i am seeing roughly 10-15 ms per key, i
 would want even lesser though, any tips would greatly help.
 In production,  i am hoping the row cache hit rate will be higher.


 The biggest thing that is affecting my system right now is the Invalid
 frame size of 0 error that cassandra server seems to be printing. Its
 causing read timeouts every minute or 2 minutes. I havent been able to
 figure out a way to fix this one. I see someone else also reported seeing
 this, but not sure where the problem is hector, cassandra or thrift.

 Thanks
 Gurpreet






 On Wed, May 30, 2012 at 4:38 PM, aaron morton aa...@thelastpickle.comwrote:

 80 ms per request

 sounds high.

 I'm doing some guessing here, i am guessing memory usage is the problem..

 * I assume you are not longer seeing excessive GC activity.
 * The key cache will not get used when you hit the row cache. I would
 disable the row cache if you have a random workload, which it looks like
 you do.
 * 500 million is a lot of keys to have on a single node. At the default
 index sample of every 128 keys it will have about 4 million samples, which
 is probably taking up a lot of memory.

 Is this testing a real world scenario or an abstract benchmark ? IMHO you
 will get more insight from testing something that resembles your
 application.

 Cheers

   -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 26/05/2012, at 8:48 PM, Gurpreet Singh wrote:

 Hi Aaron,
 Here is the latest on this..
 i switched to a node with 6 disks and running some read tests, and i am
 seeing something weird.

 setup:
 1 node, cassandra 1.0.9, 8 cpu, 16 gig RAM, 6 7200 rpm SATA data disks
 striped 512 kb, commitlog mirrored.
 1 keyspace with just 1 column family
 random partitioner
 total number of keys: 500 million (the keys are just longs from 1 to 500
 million)
 avg key size: 8 bytes
 bloom filter size: 1 gig
 total disk usage: 70 gigs compacted 1 sstable
 mean compacted row size: 149 bytes
 heap size: 8 gigs
 keycache size: 2 million (takes around 2 gigs in RAM)
 rowcache size: 1 million (off-heap)
 memtable_total_space_mb : 2 gigs

 test:
 Trying to do 5 reads per second. Each read is a multigetslice query for
 just 1 key, 2 columns.

 observations:
  row cache hit rate: 0.4
 key cache hit rate: 0.0 (this will increase later on as system moves to
 steady state)
 cfstats - 80 ms

 iostat (every 5 seconds):

 r/s : 400
 %util: 20%  (all disks are at equal utilization)
 await: 65-70 ms (for each disk)
 svctm : 2.11 ms (for each disk)
 r-kB/s - 35000

 why this is weird is because..
 5 reads per second is causing a latency of 80 ms per request (according
 to cfstats). isnt this too high?
 35 MB/s is being read from the disk. That is again very weird. This
 number is way too high, avg row size is just 149 bytes. Even index reads
 should not cause this high data being read from the disk.

 what i understand is that each read request translates to 2 disk accesses
 

Re: cassandra read latency help

2012-05-31 Thread crypto five
But I think it's bad idea, since hot data will be evenly distributed
between multiple sstables and filesystem pages.

On Thu, May 31, 2012 at 1:08 PM, crypto five cryptof...@gmail.com wrote:

 You may also consider disabling key/row cache at all.
 1mm rows * 400 bytes = 400MB of data, can easily be in fs cache, and you
 will access your hot keys with thousands of qps without hitting disk at all.
 Enabling compression can make situation even better.


 On Thu, May 31, 2012 at 12:01 PM, Gurpreet Singh gurpreet.si...@gmail.com
  wrote:

 Aaron,
 Thanks for your email. The test kinda resembles how the actual
 application will be.
 It is going to be a simple key-value store with 500 million keys per
 node. The traffic will be read heavy in steady state, and there will be
 some keys that will have a lot more traffic than others. The expected hot
 rows are estimated to be anywhere between 50  to 1 million keys.

 I have already populated this test system with 500 million keys,
 compacted it all to 1 file to check the size of the bloom filter and the
 index.

 This is how i am estimating my memory for 500 million keys. plz correct
 me if i am wrong or if i am missing any step.

 bloom filter: 1 gig
 index samples: Index file is 8.5 gig. I believe this index file is for
 all keys. Index interval is 128. Hence in RAM, this would be (8.5g /
 128)*10 (factor for datastructure overhead) = 664 mb (lets say 1 gig)

 key cache size (3 million): 3 gigs
 memtable_total_space_mb : 2 gigs

 This totals 7 gig.
 my heap size is 8 gigs.
 Is there anything else that i am missing here?
 When i do top right now, it shows java as 96% memory, thats a concern
 because there is no write load. Should i be looking at any other number
 here?

 Off heap row cache: 500,000 - 750,000 ~ 3 and 5 gigs (avg row size =
 250-500 bytes)

 My test system has 16 gigs RAM, production system will mostly have 32
 gigs RAM and 12 spindles instead of 6 that i am testing with.

 I changed the underneath filesystem from xfs to ext2, and i am seeing
 better results, though not the best.
 The cfstats latency is down to 20 ms for 35 qps read load. row cache hit
 rate is 0.21, key cache = 0.75.
 Measuring from the client side, i am seeing roughly 10-15 ms per key, i
 would want even lesser though, any tips would greatly help.
 In production,  i am hoping the row cache hit rate will be higher.


 The biggest thing that is affecting my system right now is the Invalid
 frame size of 0 error that cassandra server seems to be printing. Its
 causing read timeouts every minute or 2 minutes. I havent been able to
 figure out a way to fix this one. I see someone else also reported seeing
 this, but not sure where the problem is hector, cassandra or thrift.

 Thanks
 Gurpreet






 On Wed, May 30, 2012 at 4:38 PM, aaron morton aa...@thelastpickle.comwrote:

 80 ms per request

 sounds high.

 I'm doing some guessing here, i am guessing memory usage is the problem..

 * I assume you are not longer seeing excessive GC activity.
 * The key cache will not get used when you hit the row cache. I would
 disable the row cache if you have a random workload, which it looks like
 you do.
 * 500 million is a lot of keys to have on a single node. At the default
 index sample of every 128 keys it will have about 4 million samples, which
 is probably taking up a lot of memory.

 Is this testing a real world scenario or an abstract benchmark ? IMHO
 you will get more insight from testing something that resembles your
 application.

 Cheers

   -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 26/05/2012, at 8:48 PM, Gurpreet Singh wrote:

 Hi Aaron,
 Here is the latest on this..
 i switched to a node with 6 disks and running some read tests, and i am
 seeing something weird.

 setup:
 1 node, cassandra 1.0.9, 8 cpu, 16 gig RAM, 6 7200 rpm SATA data disks
 striped 512 kb, commitlog mirrored.
 1 keyspace with just 1 column family
 random partitioner
 total number of keys: 500 million (the keys are just longs from 1 to 500
 million)
 avg key size: 8 bytes
 bloom filter size: 1 gig
 total disk usage: 70 gigs compacted 1 sstable
 mean compacted row size: 149 bytes
 heap size: 8 gigs
 keycache size: 2 million (takes around 2 gigs in RAM)
 rowcache size: 1 million (off-heap)
 memtable_total_space_mb : 2 gigs

 test:
 Trying to do 5 reads per second. Each read is a multigetslice query for
 just 1 key, 2 columns.

 observations:
  row cache hit rate: 0.4
 key cache hit rate: 0.0 (this will increase later on as system moves to
 steady state)
 cfstats - 80 ms

 iostat (every 5 seconds):

 r/s : 400
 %util: 20%  (all disks are at equal utilization)
 await: 65-70 ms (for each disk)
 svctm : 2.11 ms (for each disk)
 r-kB/s - 35000

 why this is weird is because..
 5 reads per second is causing a latency of 80 ms per request (according
 to cfstats). isnt this too high?
 35 MB/s is being read from the disk. That is again very weird. This
 number

Re: Cassandra dying when gets many deletes

2012-04-25 Thread crypto five
I agree with your observations.
From another hand I found that ColumnFamily.size() doesn't calculate object
size correctly. It doesn't count two fields Objects sizes and returns 0 if
there is no object in columns container.
I increased initial size variable value to 24 which is size of two
objects(I didn't now what's correct value), and cassandra started
calculating live ratio correctly, increasing trhouhput value and flushing
memtables.

On Tue, Apr 24, 2012 at 2:00 AM, Vitalii Tymchyshyn tiv...@gmail.comwrote:

 **
 Hello.

 For me  there are no dirty column families in your message tells it's
 possibly the same problem.
 The issue is that column families that gets full row deletes only do not
 get ANY SINGLE dirty byte accounted and so can't be picked by flusher. Any
 ratio can't help simply because it is multiplied by 0. Check your cfstats.

 24.04.12 09:54, crypto five написав(ла):

 Thank you Vitalii.

  Looking at the Jonathan's answer to your patch I think it's probably not
 my case. I see that LiveRatio is calculated in my case, but calculations
 look strange:

  WARN [MemoryMeter:1] 2012-04-23 23:29:48,430 Memtable.java (line 181)
 setting live ratio to maximum of 64 instead of Infinity
  INFO [MemoryMeter:1] 2012-04-23 23:29:48,432 Memtable.java (line 186)
 CFS(Keyspace='lexems', ColumnFamily='countersCF') liveRatio is 64.0
 (just-counted was 64.0).  calculation took 63355ms for 0 columns

  Looking at the comments in the code: If it gets higher than 64
 something is probably broken., looks like it's probably the problem.
 Not sure how to investigate it.

 2012/4/23 Віталій Тимчишин tiv...@gmail.com

 See https://issues.apache.org/jira/browse/CASSANDRA-3741
 I did post a fix there that helped me.


 2012/4/24 crypto five cryptof...@gmail.com

 Hi,

  I have 50 millions of rows in column family on 4G RAM box. I
 allocatedf 2GB to cassandra.
 I have program which is traversing this CF and cleaning some data there,
 it generates about 20k delete statements per second.
 After about of 3 millions deletions cassandra stops responding to
 queries: it doesn't react to CLI, nodetool etc.
 I see in the logs that it tries to free some memory but can't even if I
 wait whole day.
 Also I see following in  the logs:

  INFO [ScheduledTasks:1] 2012-04-23 18:38:13,333 StorageService.java
 (line 2647) Unable to reduce heap usage since there are no dirty column
 families

  When I am looking at memory dump I see that memory goes to
 ConcurrentSkipListMap(10%), HeapByteBuffer(13%), DecoratedKey(6%),
 int[](6%), BigInteger(8.2%), ConcurrentSkipListMap$HeadIndex(7.2%),
 ColumnFamily(6.5%), ThreadSafeSortedColumns(13.7%), long[](5.9%).

  What can I do to make cassandra stop dying?
 Why it can't free the memory?
 Any ideas?

  Thank you.




   --
 Best regards,
  Vitalii Tymchyshyn






Re: Cassandra dying when gets many deletes

2012-04-24 Thread crypto five
Thank you Vitalii.

Looking at the Jonathan's answer to your patch I think it's probably not my
case. I see that LiveRatio is calculated in my case, but calculations look
strange:

WARN [MemoryMeter:1] 2012-04-23 23:29:48,430 Memtable.java (line 181)
setting live ratio to maximum of 64 instead of Infinity
 INFO [MemoryMeter:1] 2012-04-23 23:29:48,432 Memtable.java (line 186)
CFS(Keyspace='lexems', ColumnFamily='countersCF') liveRatio is 64.0
(just-counted was 64.0).  calculation took 63355ms for 0 columns

Looking at the comments in the code: If it gets higher than 64 something
is probably broken., looks like it's probably the problem.
Not sure how to investigate it.

2012/4/23 Віталій Тимчишин tiv...@gmail.com

 See https://issues.apache.org/jira/browse/CASSANDRA-3741
 I did post a fix there that helped me.


 2012/4/24 crypto five cryptof...@gmail.com

 Hi,

 I have 50 millions of rows in column family on 4G RAM box. I allocatedf
 2GB to cassandra.
 I have program which is traversing this CF and cleaning some data there,
 it generates about 20k delete statements per second.
 After about of 3 millions deletions cassandra stops responding to
 queries: it doesn't react to CLI, nodetool etc.
 I see in the logs that it tries to free some memory but can't even if I
 wait whole day.
 Also I see following in  the logs:

 INFO [ScheduledTasks:1] 2012-04-23 18:38:13,333 StorageService.java (line
 2647) Unable to reduce heap usage since there are no dirty column families

 When I am looking at memory dump I see that memory goes to
 ConcurrentSkipListMap(10%), HeapByteBuffer(13%), DecoratedKey(6%),
 int[](6%), BigInteger(8.2%), ConcurrentSkipListMap$HeadIndex(7.2%),
 ColumnFamily(6.5%), ThreadSafeSortedColumns(13.7%), long[](5.9%).

 What can I do to make cassandra stop dying?
 Why it can't free the memory?
 Any ideas?

 Thank you.




 --
 Best regards,
  Vitalii Tymchyshyn



Cassandra dying when gets many deletes

2012-04-23 Thread crypto five
Hi,

I have 50 millions of rows in column family on 4G RAM box. I allocatedf 2GB
to cassandra.
I have program which is traversing this CF and cleaning some data there, it
generates about 20k delete statements per second.
After about of 3 millions deletions cassandra stops responding to queries:
it doesn't react to CLI, nodetool etc.
I see in the logs that it tries to free some memory but can't even if I
wait whole day.
Also I see following in  the logs:

INFO [ScheduledTasks:1] 2012-04-23 18:38:13,333 StorageService.java (line
2647) Unable to reduce heap usage since there are no dirty column families

When I am looking at memory dump I see that memory goes to
ConcurrentSkipListMap(10%), HeapByteBuffer(13%), DecoratedKey(6%),
int[](6%), BigInteger(8.2%), ConcurrentSkipListMap$HeadIndex(7.2%),
ColumnFamily(6.5%), ThreadSafeSortedColumns(13.7%), long[](5.9%).

What can I do to make cassandra stop dying?
Why it can't free the memory?
Any ideas?

Thank you.