Re: How to measure the write amplification of C*?

2016-03-10 Thread Jeff Ferland
Compaction logs show the number of bytes written and the level written to. Base write load = table flushed to L0. Write amplification = sum of all compactions written to disk for the table. On Thu, Mar 10, 2016 at 9:44 AM, Dikang Gu wrote: > Hi Matt, > > Thanks for the

System block cache vs. disk access and metrics

2016-02-04 Thread Jeff Ferland
We struggled for a while to upgrade due to an out of order SStables bug. During this time, load continued to increase and we were eventually accessing the disk a lot. When we could finally expand the cluster, the went down by an order of magnitude. This leads me to conclude that we had blown

Re: compaction throughput

2016-01-15 Thread Jeff Ferland
Compaction is generally CPU bound and relatively slow. Exactly why that is I’m uncertain. > On Jan 15, 2016, at 12:53 PM, Kai Wang wrote: > > Hi, > > I am trying to figure out the bottleneck of compaction on my node. The node > is CentOS 7 and has SSDs installed. The table

Re: New node has high network and disk usage.

2016-01-06 Thread Jeff Ferland
What’s your output of `nodetool compactionstats`? > On Jan 6, 2016, at 7:26 AM, Vickrum Loi wrote: > > Hi, > > We recently added a new node to our cluster in order to replace a node that > died (hardware failure we believe). For the next two weeks it had high

Unable to add nodes / awaiting patch.

2015-12-02 Thread Jeff Ferland
Looks like we’re hit by https://issues.apache.org/jira/browse/CASSANDRA-10012 . Not knowing a better place to ask, when will the next version of 2.1.x Cassandra be cut and the following DSE fix cut from there? Could DSE cut an in-between

Re: Cassandra stalls and dropped messages not due to GC

2015-11-02 Thread Jeff Ferland
Having caught a node in an undesirable state, many of my threads are reading like this: "SharedPool-Worker-5" #875 daemon prio=5 os_prio=0 tid=0x7f3e14196800 nid=0x96ce waiting on condition [0x7f3ddb835000] java.lang.Thread.State: WAITING (parking) at

Re: Cassandra stalls and dropped messages not due to GC

2015-11-02 Thread Jeff Ferland
> On Nov 2, 2015, at 11:35 AM, Nate McCall wrote: > Forgive me, but what is CMS? > > Sorry - ConcurrentMarkSweep garbage collector. Ah, my brain was trying to think in terms of something Cassandra specific. I have full GC logging on and since moving to G1, I haven’t

Cassandra stalls and dropped messages not due to GC

2015-10-29 Thread Jeff Ferland
Using DSE 4.8.1 / 2.1.11.872, Java version 1.8.0_66 We upgraded our cluster this weekend and have been having issues with dropped mutations since then. Intensely investigating a single node and toying with settings has revealed that GC stalls don’t make up enough time to explain the 10 seconds

Re: Cassandra stalls and dropped messages not due to GC

2015-10-29 Thread Jeff Ferland
TATIONS under load > and your machines are overloaded, you’d be doing more READ_REPAIR than usual > probably) > >> On Oct 29, 2015, at 8:12 PM, Jeff Ferland <j...@tubularlabs.com >> <mailto:j...@tubularlabs.com>> wrote: >> >> Using DSE 4.8.1 / 2.1.11.

Re: Snapshots - Backup/Restore requirements

2015-10-12 Thread Jeff Ferland
I have a semi-hacky Python script I’ve written up. It needs refining for public use, but I’ll put it in Github later today and send you a link as I work on it. It uses boto to do concurrent multi-part uploads to S3 with retry and resume recording function if it gets interrupted while uploading

Re: A number of questions on LeveledCompactionStrategy

2015-10-12 Thread Jeff Ferland
> On Oct 10, 2015, at 9:24 AM, San Luoji wrote: > > Hi, > > I've got a number of questions when looking into LCS in Cassandra. Could > somebody help to enlighten me? > > 1. Will LCS always strive to clean up L0 sstable? i.e. whenever a new L0 > sstable shows up, it will

Re: Snapshots - Backup/Restore requirements

2015-10-12 Thread Jeff Ferland
n, Oct 12, 2015 at 9:41 AM, Jeff Ferland <j...@tubularlabs.com > <mailto:j...@tubularlabs.com>> wrote: > I have a semi-hacky Python script I’ve written up. It needs refining for > public use, but I’ll put it in Github later today and send you a link as I > work on it. I

Re: when a node is dead in Cassandra cluster

2015-09-21 Thread Jeff Ferland
A dead node should exist in the ring until it is replaced. If you remove a node without a replacement, you’ll end up with that replica’s ownership being placed onto another node without the data having been transferred, and queries against that range will falsely empty records until a repair is

Re: Reduced write performance when reading

2015-07-23 Thread Jeff Ferland
...@liveramp.com wrote: I set up RAID0 after experiencing highly imbalanced disk usage with a JBOD setup so my transaction logs are indeed on the same media as the sstables. Is there any alternative to setting up RAID0 that doesn't have this issue? On Thu, Jul 23, 2015 at 4:03 PM, Jeff Ferland j

Re: Reduced write performance when reading

2015-07-23 Thread Jeff Ferland
My immediate guess: your transaction logs are on the same media as your sstables and your OS prioritizes read requests. -Jeff On Jul 23, 2015, at 2:51 PM, Soerian Lieve sli...@liveramp.com wrote: Hi, I am currently performing benchmarks on Cassandra. Independently from each other I am

Re: Compaction issues, 2.0.12

2015-07-06 Thread Jeff Ferland
I’ve seen the same thing: https://issues.apache.org/jira/browse/CASSANDRA-9577 https://issues.apache.org/jira/browse/CASSANDRA-9577 I’ve had cases where a restart clears the old tables, and I’ve had cases where a restart considers the old tables to be live. On Jul 6, 2015, at 1:51 PM, Robert

Re: Files not removed after compaction

2015-06-10 Thread Jeff Ferland
Cassandra 2.0.12-200 / DSE 4.6.1 https://issues.apache.org/jira/browse/CASSANDRA-9577 https://issues.apache.org/jira/browse/CASSANDRA-9577 -Jeff On Jun 10, 2015, at 4:36 PM, Robert Coli rc...@eventbrite.com wrote: On Wed, Jun 10, 2015 at 4:15 PM, Jeff Ferland j...@tubularlabs.com

Files not removed after compaction

2015-06-10 Thread Jeff Ferland
Compaction finished yesterday, but I still have this going on: Space used (live), bytes: 878681716067 Space used (total), bytes: 2227857083852 jbf@ip-10-0-2-98:/ebs/cassandra/data/trends/trends$ sudo lsof *-Data.db COMMAND PID USER FD TYPE DEVICE

Offline Compaction and Token Splitting

2015-05-07 Thread Jeff Ferland
I have an ideal for backups in my mind with Cassandra to dump each columnfamily to a directory and use an offline process to compact them all into one sstable (or max sstable size set). I have an ideal for restoration which involves a streaming read an sstable set and output based on whether

Re: Adding New Node Issue

2015-04-23 Thread Jeff Ferland
Sounds to me like your stream throughput value is too high. `notetool getstreamthroughput` and `notetool setstreamthroughput` will update this value live. Limit it to something lower so that the system isn’t overloaded by streaming. The bottleneck that slows things down is mostly to be disk or

Re: Do I need to run repair and compaction every node?

2015-04-14 Thread Jeff Ferland
is getting a little bit of love on improving repairs and communications / logging about them. -Jeff On Apr 13, 2015, at 3:45 PM, Robert Coli rc...@eventbrite.com wrote: On Mon, Apr 13, 2015 at 3:33 PM, Jeff Ferland j...@tubularlabs.com mailto:j...@tubularlabs.com wrote: Nodetool repair -par

Re: nodetool cleanup error

2015-03-30 Thread Jeff Ferland
Code problem that was patched in https://issues.apache.org/jira/browse/CASSANDRA-8716 https://issues.apache.org/jira/browse/CASSANDRA-8716. Upgrade to 2.0.13 On Mar 30, 2015, at 1:12 PM, Amlan Roy amlan@cleartrip.com wrote: Hi, I have added new nodes to an existing cluster and ran