Re: Cassandra stress test and max vs. average read/write latency.

2011-12-23 Thread Peter Fales
Peter,

Thanks for your response. I'm looking into some of the ideas in your
other recent mail, but I had another followup question on this one...

Is there any way to control the CPU load when using the stress benchmark?
I have some control over that with our home-grown benchmark, but I
thought it made sense to use the official benchmark tool as people might
more readily believe those results and/or be able to reproduce them.  But
offhand, I don't see any to throttle back the load created by the 
stress test.

On Mon, Dec 19, 2011 at 09:47:32PM -0800, Peter Schuller wrote:
  I'm trying to understand if this is expected or not, and if there is
 
 Without careful tuning, outliers around a couple of hundred ms are
 definitely expected in general (not *necessarily*, depending on
 workload) as a result of garbage collection pauses. The impact will be
 worsened a bit if you are running under high CPU load (or even maxing
 it out with stress) because post-pause, if you are close to max CPU
 usage you will take considerably longer to catch up.
 
 Personally, I would just log each response time and feed it to gnuplot
 or something. It should be pretty obvious whether or not the latencies
 are due to periodic pauses.
 
 If you are concerned with eliminating or reducing outliers, I would:
 
 (1) Make sure that when you're benchmarking, that you're putting
 Cassandra under a reasonable amount of load. Latency benchmarks are
 usually useless if you're benchmarking against a saturated system. At
 least, start by achieving your latency goals at 25% or less CPU usage,
 and then go from there if you want to up it.
 
 (2) One can affect GC pauses, but it's non-trivial to eliminate the
 problem completely. For example, the length of frequent young-gen
 pauses can typically be decreased by decreasing the size of the young
 generation, leading to more frequent shorter GC pauses. But that
 instead causes more promotion into the old generation, which will
 result in more frequent very long pauses (relative to normal; they
 would still be infrequent relative to young gen pauses) - IF your
 workload is such that you are suffering from fragmentation and
 eventually seeing Cassandra fall back to full compacting GC:s
 (stop-the-world) for the old generation.
 
 I would start by adjusting young gen so that your frequent pauses are
 at an acceptable level, and then see whether or not you can sustain
 that in terms of old-gen.
 
 Start with this in any case: Run Cassandra with -XX:+PrintGC
 -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps
 
 -- 
 / Peter Schuller (@scode, http://worldmodscode.wordpress.com)

-- 
Peter Fales
Alcatel-Lucent
Member of Technical Staff
1960 Lucent Lane
Room: 9H-505
Naperville, IL 60566-7033
Email: peter.fa...@alcatel-lucent.com
Phone: 630 979 8031


Re: Cassandra stress test and max vs. average read/write latency.

2011-12-22 Thread Peter Fales
Peter,

Thanks for your input.  Can you tell me more about what we should be
looking for in the gc log?   We've already got the gc logging turned
on and, and we've already done the plotting to show that in most 
cases the outliers are happening periodically (with a period of 
10s of seconds to a few minutes, depnding on load and tuning)

I've tried to correlate the times of the outliers with messages either
in the system log or the gc log.   There seemms to be some (but not
complete) correlation between the outliers and system log messages about
memtable flushing.   I can not find anything in the gc log that 
seems to be an obvious problem, or that matches up with the time 
times of the outliers.


On Mon, Dec 19, 2011 at 09:47:32PM -0800, Peter Schuller wrote:
  I'm trying to understand if this is expected or not, and if there is
 
 Without careful tuning, outliers around a couple of hundred ms are
 definitely expected in general (not *necessarily*, depending on
 workload) as a result of garbage collection pauses. The impact will be
 worsened a bit if you are running under high CPU load (or even maxing
 it out with stress) because post-pause, if you are close to max CPU
 usage you will take considerably longer to catch up.
 
 Personally, I would just log each response time and feed it to gnuplot
 or something. It should be pretty obvious whether or not the latencies
 are due to periodic pauses.
 
 If you are concerned with eliminating or reducing outliers, I would:
 
 (1) Make sure that when you're benchmarking, that you're putting
 Cassandra under a reasonable amount of load. Latency benchmarks are
 usually useless if you're benchmarking against a saturated system. At
 least, start by achieving your latency goals at 25% or less CPU usage,
 and then go from there if you want to up it.
 
 (2) One can affect GC pauses, but it's non-trivial to eliminate the
 problem completely. For example, the length of frequent young-gen
 pauses can typically be decreased by decreasing the size of the young
 generation, leading to more frequent shorter GC pauses. But that
 instead causes more promotion into the old generation, which will
 result in more frequent very long pauses (relative to normal; they
 would still be infrequent relative to young gen pauses) - IF your
 workload is such that you are suffering from fragmentation and
 eventually seeing Cassandra fall back to full compacting GC:s
 (stop-the-world) for the old generation.
 
 I would start by adjusting young gen so that your frequent pauses are
 at an acceptable level, and then see whether or not you can sustain
 that in terms of old-gen.
 
 Start with this in any case: Run Cassandra with -XX:+PrintGC
 -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps
 
 -- 
 / Peter Schuller (@scode, http://worldmodscode.wordpress.com)

-- 
Peter Fales
Alcatel-Lucent
Member of Technical Staff
1960 Lucent Lane
Room: 9H-505
Naperville, IL 60566-7033
Email: peter.fa...@alcatel-lucent.com
Phone: 630 979 8031


Re: Cassandra nodes on EC2 in two different regions not communicating

2011-02-23 Thread Peter Fales
 contained in this e-mail
  message and/or attachments to it may contain
  confidential or privileged information. If you are
  not the intended recipient, any dissemination, use,
  review, distribution, printing or copying of the
  information contained in this e-mail message
  and/or attachments to it are strictly prohibited. If
  you have received this communication in error,
  please notify us by reply e-mail or telephone and
  immediately and permanently delete the message
  and any attachments. Thank you
  --
  Sasha Dolgy
  [13]sasha.do...@gmail.com
 =-=-=
 Notice: The information contained in this e-mail
 message and/or attachments to it may contain
 confidential or privileged information. If you are
 
 
 
 not the intended recipient, any dissemination, use,
 review, distribution, printing or copying of the
 information contained in this e-mail message
 and/or attachments to it are strictly prohibited. If
 you have received this communication in error,
 
 
 
 please notify us by reply e-mail or telephone and
 immediately and permanently delete the message
 and any attachments. Thank you
 
  --
 
Frank LoVecchio
 
Senior Software Engineer | Isidorey, LLC
 
Google Voice +1.720.295.9179
 
[14]isidorey.com | [15]facebook.com/franklovecchio |
[16]franklovecchio.com
 
 References
 
1. mailto:fr...@isidorey.com
2. mailto:davevi...@gmail.com
3. mailto:himanshi.sha...@tcs.com
4. mailto:sdo...@gmail.com
5. mailto:user@cassandra.apache.org
6. mailto:himanshi.sha...@tcs.com
7. http://ec2-50-18-60-117.us-west-1.compute.amazonaws.com/
8. mailto:davevi...@gmail.com
9. mailto:user@cassandra.apache.org
   10. mailto:himanshi.sha...@tcs.com
   11. http://169.254.169.254/latest/meta-data/security-groups
   12. mailto:himanshi.sha...@tcs.com
   13. mailto:sasha.do...@gmail.com
   14. http://isidorey.com/
   15. http://facebook.com/franklovecchio
   16. http://franklovecchio.com/

-- 
Peter Fales
Alcatel-Lucent
Member of Technical Staff
1960 Lucent Lane
Room: 9H-505
Naperville, IL 60566-7033
Email: peter.fa...@alcatel-lucent.com
Phone: 630 979 8031


Re: servers for cassandra

2010-09-08 Thread Peter Fales
Have you looked into Amazon's EC2?   That's worked quite well for us.
I haven't looked into other alternatives enough to know how it would 
compare for a 24/7 production application, but Amazon's pay as you go
system is really nice when you just need to bunch of machines for a 
few hours of testing.


On Sat, Sep 04, 2010 at 01:35:36AM -0500, vineet daniel wrote:
 Hi
 
 I am just curious to know if there is any hosting company that provides 
 servers at a very low cost, wherein I can install cassandra on WAN. I have 
 cassandra setup in my LAN and want to test it in real conditions, taking 
 dedicated servers just for testing purposes is not at all feasible for me not 
 even pay-as-you go types. I'd really appreciate if anybody can share 
 information on such hosting providers.
 
 Vineet Daniel
 Cell  : +918106217121
 Websites :
 Bloghttp://vinetedaniel.blogspot.com   |   
 Linkedinhttp://in.linkedin.com/in/vineetdaniel  |  
 Twitterhttps://twitter.com/vineetdaniel
 
 

-- 
Peter Fales
Alcatel-Lucent
Member of Technical Staff
1960 Lucent Lane
Room: 9H-505
Naperville, IL 60566-7033
Email: peter.fa...@alcatel-lucent.com
Phone: 630 979 8031


Re: Data Center Move

2010-09-02 Thread Peter Fales
Anthony,

I'm just getting my feet wet with Cassandra, so I'm far from an
expert, but I'm curious whether you saw my posting a few days ago
about using the EC2 public IP addreses with cassandra:
http://www.mail-archive.com/user@cassandra.apache.org/msg05692.html

*If* I understand the problem correctly, it seems like you could create
some new EC2 nodes using this patched version of the code, then 
migrate your existing nodes to new EC2 nodes, but giving each new node a 
public IP.   Once your entire EC2 cluster was up and running on the
public addresses, you should be able to use those public addresses 
to migrate to some other site outside of EC2.  

Am I missing something obvious?   (Quite possible, since I haven't actually
tested this)

On Thu, Sep 02, 2010 at 01:09:46PM -0500, Anthony Molinaro wrote:
 Hi,
 
   We're running cassandra 0.6.4, and need to do a data center move of
 a cluster (from EC2 to our own data center).   Because of the way the
 networks are set up we can't actually connect these boxes directly, so
 the original plan of add some nodes in the new colo, let them bootstrap
 then decommission nodes in the old colo until the data is all transfered
 will not work.
 
 So I'm wondering if the following will work
 
 1. take a snapshot on the source cluster
 2. rsync all the files from the old machines to the new machines (we'd most
likely be reducing the total number of machines, so would do things like
take 4-5 machines worth of data and put it onto 1 machine)
 3. bring up the new machines in the new colo
 4. run cleanup on all new nodes?
 5. run repair on all new nodes?
 
 So will this work?  If so, are steps 4 and 5 correct?
 
 I realize we will miss any new data that happens between the snapshot
 and turning on writes on the new cluster, but I think we might be able
 to just tune compaction such that it doesn't happen, then just sync
 the files that change while the data transfers happen?
 
 Thanks,
 
 -Anthony
 
 -- 
 
 Anthony Molinaro   antho...@alumni.caltech.edu

-- 
Peter Fales
Alcatel-Lucent
Member of Technical Staff
1960 Lucent Lane
Room: 9H-505
Naperville, IL 60566-7033
Email: peter.fa...@alcatel-lucent.com
Phone: 630 979 8031