Re: Compaction issues, 2.0.12

2015-07-06 Thread Jeff Ferland
I’ve seen the same thing: https://issues.apache.org/jira/browse/CASSANDRA-9577 I’ve had cases where a restart clears the old tables, and I’ve had cases where a restart considers the old tables to be live. > On Jul 6, 2015, at 1:51 PM, Rober

Re: Reduced write performance when reading

2015-07-23 Thread Jeff Ferland
My immediate guess: your transaction logs are on the same media as your sstables and your OS prioritizes read requests. -Jeff > On Jul 23, 2015, at 2:51 PM, Soerian Lieve wrote: > > Hi, > > I am currently performing benchmarks on Cassandra. Independently from each > other I am seeing ~100k w

Re: Reduced write performance when reading

2015-07-23 Thread Jeff Ferland
ve wrote: > > I set up RAID0 after experiencing highly imbalanced disk usage with a JBOD > setup so my transaction logs are indeed on the same media as the sstables. > Is there any alternative to setting up RAID0 that doesn't have this issue? > > On Thu, Jul 23, 2015 at

Re: when a node is dead in Cassandra cluster

2015-09-21 Thread Jeff Ferland
A dead node should exist in the ring until it is replaced. If you remove a node without a replacement, you’ll end up with that replica’s ownership being placed onto another node without the data having been transferred, and queries against that range will falsely empty records until a repair is

Re: Snapshots - Backup/Restore requirements

2015-10-12 Thread Jeff Ferland
I have a semi-hacky Python script I’ve written up. It needs refining for public use, but I’ll put it in Github later today and send you a link as I work on it. It uses boto to do concurrent multi-part uploads to S3 with retry and resume recording function if it gets interrupted while uploading t

Re: A number of questions on LeveledCompactionStrategy

2015-10-12 Thread Jeff Ferland
> On Oct 10, 2015, at 9:24 AM, San Luoji wrote: > > Hi, > > I've got a number of questions when looking into LCS in Cassandra. Could > somebody help to enlighten me? > > 1. Will LCS always strive to clean up L0 sstable? i.e. whenever a new L0 > sstable shows up, it will trigger LCS compacti

Re: Snapshots - Backup/Restore requirements

2015-10-12 Thread Jeff Ferland
and I’ll focus what time I would have spent enhancing my code to test it, put up a minor diff for a single-shot flag, and get some documentation / examples on snapshot and backup directories. -Jeff > On Oct 12, 2015, at 2:30 PM, Robert Coli wrote: > > On Mon, Oct 12, 2015 at 9:41

Cassandra stalls and dropped messages not due to GC

2015-10-29 Thread Jeff Ferland
Using DSE 4.8.1 / 2.1.11.872, Java version 1.8.0_66 We upgraded our cluster this weekend and have been having issues with dropped mutations since then. Intensely investigating a single node and toying with settings has revealed that GC stalls don’t make up enough time to explain the 10 seconds

Re: Cassandra stalls and dropped messages not due to GC

2015-10-29 Thread Jeff Ferland
gt; and your machines are overloaded, you’d be doing more READ_REPAIR than usual > probably) > >> On Oct 29, 2015, at 8:12 PM, Jeff Ferland > <mailto:j...@tubularlabs.com>> wrote: >> >> Using DSE 4.8.1 / 2.1.11.872, Java version 1.8.0_66 >> >> We upgr

Re: Cassandra stalls and dropped messages not due to GC

2015-11-02 Thread Jeff Ferland
Having caught a node in an undesirable state, many of my threads are reading like this: "SharedPool-Worker-5" #875 daemon prio=5 os_prio=0 tid=0x7f3e14196800 nid=0x96ce waiting on condition [0x7f3ddb835000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Nativ

Re: Cassandra stalls and dropped messages not due to GC

2015-11-02 Thread Jeff Ferland
> On Nov 2, 2015, at 11:35 AM, Nate McCall wrote: > Forgive me, but what is CMS? > > Sorry - ConcurrentMarkSweep garbage collector. Ah, my brain was trying to think in terms of something Cassandra specific. I have full GC logging on and since moving to G1, I haven’t had any >500ms GC cycles

Unable to add nodes / awaiting patch.

2015-12-02 Thread Jeff Ferland
Looks like we’re hit by https://issues.apache.org/jira/browse/CASSANDRA-10012 . Not knowing a better place to ask, when will the next version of 2.1.x Cassandra be cut and the following DSE fix cut from there? Could DSE cut an in-between ver

Re: New node has high network and disk usage.

2016-01-06 Thread Jeff Ferland
What’s your output of `nodetool compactionstats`? > On Jan 6, 2016, at 7:26 AM, Vickrum Loi wrote: > > Hi, > > We recently added a new node to our cluster in order to replace a node that > died (hardware failure we believe). For the next two weeks it had high disk > and network activity. We r

Re: compaction throughput

2016-01-15 Thread Jeff Ferland
Compaction is generally CPU bound and relatively slow. Exactly why that is I’m uncertain. > On Jan 15, 2016, at 12:53 PM, Kai Wang wrote: > > Hi, > > I am trying to figure out the bottleneck of compaction on my node. The node > is CentOS 7 and has SSDs installed. The table is configured to us

System block cache vs. disk access and metrics

2016-02-04 Thread Jeff Ferland
We struggled for a while to upgrade due to an out of order SStables bug. During this time, load continued to increase and we were eventually accessing the disk a lot. When we could finally expand the cluster, the went down by an order of magnitude. This leads me to conclude that we had blown out

Re: How to measure the write amplification of C*?

2016-03-10 Thread Jeff Ferland
Compaction logs show the number of bytes written and the level written to. Base write load = table flushed to L0. Write amplification = sum of all compactions written to disk for the table. On Thu, Mar 10, 2016 at 9:44 AM, Dikang Gu wrote: > Hi Matt, > > Thanks for the detailed explanation! Yes,

Re: nodetool cleanup error

2015-03-30 Thread Jeff Ferland
Code problem that was patched in https://issues.apache.org/jira/browse/CASSANDRA-8716 . Upgrade to 2.0.13 > On Mar 30, 2015, at 1:12 PM, Amlan Roy wrote: > > Hi, > > I have added new nodes to an existing cluster and ran the “nodetool clea

Re: How much disk is needed to compact Leveled compaction?

2015-04-07 Thread Jeff Ferland
Check the size of your individual files. If your largest file is already more than half then you can’t compact it using leveled compaction either. You can take the system offline, split the largest file (I believe there is an sstablesplit utility and I imagine it allows you to take off the tail

Re: Do I need to run repair and compaction every node?

2015-04-13 Thread Jeff Ferland
Nodetool repair: The basic default sequential repair covers all nodes, computes merkle trees in sequence one node at a time. Only need to run the command one node. Nodetool repair -par: covers all nodes, computes merkle trees for each node at the same time. Much higher IO load as every copy of a

Re: Do I need to run repair and compaction every node?

2015-04-13 Thread Jeff Ferland
getting a little bit of love on improving repairs and communications / logging about them. -Jeff > On Apr 13, 2015, at 3:45 PM, Robert Coli wrote: > > On Mon, Apr 13, 2015 at 3:33 PM, Jeff Ferland <mailto:j...@tubularlabs.com>> wrote: > Nodetool repair -par: covers all n

Re: Adding New Node Issue

2015-04-23 Thread Jeff Ferland
Sounds to me like your stream throughput value is too high. `notetool getstreamthroughput` and `notetool setstreamthroughput` will update this value live. Limit it to something lower so that the system isn’t overloaded by streaming. The bottleneck that slows things down is mostly to be disk or

Confirming Repairs

2015-04-24 Thread Jeff Ferland
The short answer is I used a logstash query to get a list of all repair ranges started and all ranges completed. I then matched the UUID of the start message to the end message and printed out all the ranges that didn't succeed. Then one needs to go a step further than I've coded and match the rema

Offline Compaction and Token Splitting

2015-05-07 Thread Jeff Ferland
I have an ideal for backups in my mind with Cassandra to dump each columnfamily to a directory and use an offline process to compact them all into one sstable (or max sstable size set). I have an ideal for restoration which involves a streaming read an sstable set and output based on whether the

Files not removed after compaction

2015-06-10 Thread Jeff Ferland
Compaction finished yesterday, but I still have this going on: Space used (live), bytes: 878681716067 Space used (total), bytes: 2227857083852 jbf@ip-10-0-2-98:/ebs/cassandra/data/trends/trends$ sudo lsof *-Data.db COMMAND PID USER FD TYPE DEVICE S

Re: Files not removed after compaction

2015-06-10 Thread Jeff Ferland
Cassandra 2.0.12-200 / DSE 4.6.1 https://issues.apache.org/jira/browse/CASSANDRA-9577 <https://issues.apache.org/jira/browse/CASSANDRA-9577> -Jeff > On Jun 10, 2015, at 4:36 PM, Robert Coli wrote: > > On Wed, Jun 10, 2015 at 4:15 PM, Jeff Ferland <mailto:j...@tubul