Re: Node failure Due To Very high GC pause time

2017-07-03 Thread Bryan Cheng
This is a very antagonistic use case for Cassandra :P I assume you're familiar with Cassandra and deletes? (eg. http://thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html, http://docs.datastax.com/en/cassandra/2.1/cassandra/dml/dml_about_deletes_c.html ) That being said, are you

Re: Does Too many GC pauses can cause cassandra service DOWN.

2017-02-14 Thread Bryan Cheng
GC can absolutely cause a server to get marked down by a peer. See https://support.datastax.com/hc/en-us/articles/204226199-Common-Causes-of-GC-pauses As for tuning again we use CMS but this thread has some good G1 info that I looked at while evaluating it:

Re: inconsistent results

2017-02-14 Thread Bryan Cheng
Change your consistency levels in the cqlsh shell while you query, from ONE to QUORUM to ALL. If you see your results change that's a consistency issue. (Assuming these are simple inserts, if there's deletes, potentially update collections, etc. in the mix then things get a bit more complex.) To

Re: Metric to monitor partition size

2017-01-13 Thread Bryan Cheng
We're on 2.X so this information may not apply to your version, but you should see: 1) A log statement upon compaction, like "Writing large partition", including the primary partition key (see https://issues.apache.org/jira/browse/CASSANDRA-9643). Configurable threshold in cassandra.yaml 2)

Re: Backup restore with a different name

2016-11-02 Thread Bryan Cheng
Hi Jens, When you refer to restoring a snapshot for a developer to look at, do you mean restoring the cluster to that state, or just exposing that state for reference while keeping the (corrupt) current state in the live cluster? You may find these useful:

Re: Incremental repairs in 3.0

2016-09-06 Thread Bryan Cheng
gt; Thanks for answer! >> >> >It may still be a good idea to manually migrate if you have a sizable >> amount of data >> No, it would be brand new ;-) 3.0 cluster >> >> >> >> On Tuesday, June 21, 2016 1:21 AM, Bryan Cheng <br...@blockcypher.co

Re: Corrupt SSTABLE over and over

2016-08-15 Thread Bryan Cheng
fresh, and still I am getting corruption. > > and Still nothing that indicate there is a HW issue? > All other nodes are fine > > Regards, > Alaa > > > On Fri, Aug 12, 2016 at 12:00 PM, Bryan Cheng <br...@blockcypher.com> > wrote: > >> Should also

Re: Corrupt SSTABLE over and over

2016-08-12 Thread Bryan Cheng
Should also add that if the scope of corruption is _very_ large, and you have a good, aggressive repair policy (read: you are confident in the consistency of the data elsewhere in the cluster), you may just want to decommission and rebuild that node. On Fri, Aug 12, 2016 at 11:55 AM, Bryan Cheng

Re: Corrupt SSTABLE over and over

2016-08-12 Thread Bryan Cheng
Looks like you're doing the offline scrub- have you tried online? Here's my typical process for corrupt SSTables. With disk_failure_policy set to stop, examine the failing sstables. If they are very small (in the range of kbs), it is unlikely that there is any salvageable data there. Just delete

Re: Debugging high tail read latencies (internal timeout)

2016-07-07 Thread Bryan Cheng
Hi Nimi, My suspicions would probably lie somewhere between GC and large partitions. The first tool would probably be a trace but if you experience full client timeouts from dropped messages you may find it hard to find the issue. You can try running the trace with cqlsh's timeouts cranked all

Re: Cluster not working after upgrade from 2.1.12 to 3.5.0

2016-06-21 Thread Bryan Cheng
Hi Oskar, I know this won't help you as quickly as you would like but please consider updating the JIRA issue with details of your environment as it may help move the investigation along. Good luck! On Tue, Jun 21, 2016 at 12:21 PM, Julien Anguenot wrote: > You could try

Re: Incremental repairs in 3.0

2016-06-20 Thread Bryan Cheng
Sorry, meant to say "therefore manual migration procedure should be UNnecessary" On Mon, Jun 20, 2016 at 3:21 PM, Bryan Cheng <br...@blockcypher.com> wrote: > I don't use 3.x so hopefully someone with operational experience can chime > in, however my understanding is:

Re: Incremental repairs in 3.0

2016-06-20 Thread Bryan Cheng
I don't use 3.x so hopefully someone with operational experience can chime in, however my understanding is: 1) Incremental repairs should be the default in the 3.x release branch and 2) sstable repairedAt is now properly set in all sstables as of 2.2.x for standard repairs and therefore manual

Re: OOM under high write throughputs on 2.2.5

2016-05-24 Thread Bryan Cheng
Hi Zhiyan, Silly question but are you sure your heap settings are actually being applied? "697,236,904 (51.91%)" would represent a sub-2GB heap. What's the real memory usage for Java when this crash happens? Other thing to look into might be memtable_heap_space_in_mb, as it looks like you're

Re: Increasing replication factor and repair doesn't seem to work

2016-05-24 Thread Bryan Cheng
Hi Luke, I've never found nodetool status' load to be useful beyond a general indicator. You should expect some small skew, as this will depend on your current compaction status, tombstones, etc. IIRC repair will not provide consistency of intermediate states nor will it remove tombstones, it

Re: Limit 1

2016-04-21 Thread Bryan Cheng
As far as I know, the answer is yes, however it is unlikely that the cursor will have to probe very far to find a valid row unless your data is highly bursty. The key cache (assuming you have it enabled) will allow the query to skip unrelated rows in its search. However I would caution against

Re: Cassandra Golang Driver and Support

2016-04-13 Thread Bryan Cheng
Hi Yawei, While you're right that there's no first-party driver, we've had good luck using gocql (https://github.com/gocql/gocql) in production at moderate scale. What features in particular are you looking for that are missing? --Bryan On Tue, Apr 12, 2016 at 10:06 PM, Yawei Li

Re: Unable to connect to CQLSH or Launch SparkContext

2016-04-11 Thread Bryan Cheng
Check your environment variables, looks like JAVA_HOME is not properly set On Mon, Apr 11, 2016 at 9:07 AM, Lokesh Ceeba - Vendor < lokesh.ce...@walmart.com> wrote: > Hi Team, > > Help required > > > > cassandra:/app/cassandra $ nodetool status > > > > Cassandra 2.0 and later

Re: Large primary keys

2016-04-11 Thread Bryan Cheng
While large primary keys (within reason) should work, IMO anytime you're doing equality testing you are really better off minimizing the size of the key. Huge primary keys will also have very negative impacts on your key cache. I would err on the side of the digest, but I've never had a need for

Re: Cassandra sstable to Mysql

2016-04-02 Thread Bryan Cheng
You have SSTables and you want to get importable data? You could use a tool like sstabletojson to get json formatted data directly from the sstables; however, unless they've been perfectly compacted, there will be duplicates and updates interleaved that will be properly ordered. If this is a

Re: cassandra disks cache on SSD

2016-04-02 Thread Bryan Cheng
Hi Vincent, have you already tried the more common tuning operations like row cache? I haven't done any disk level caching like this (we use SSD's exclusively), but you may see some benefit from putting your commitlog on a separate conventional HDD if you haven't tried this already. This may push

Re: Multi DC setup for analytics

2016-03-31 Thread Bryan Cheng
I'm jumping into this thread late, so sorry if this has been covered before. But am I correct in reading that you have two different Cassandra rings, not talking to each other at all, and you want to have a shared DC with a third Cassandra ring? I'm not sure what you want to do is possible. If I

Re: Cassandra Upgrade 3.0.x vs 3.x (Tick-Tock Release)

2016-03-14 Thread Bryan Cheng
Hi Kathir, The specific version will depend on your needs (eg. libraries) and risk/stability profile. Personally, I generally go with the oldest branch with still active maintenance (which would be 2.2.x or 2.1.x if you only need critical fixes), but there's lots of good stuff in 3.x if you're

Re: Unexplainably large reported partition sizes

2016-03-07 Thread Bryan Cheng
Hi Tom, Do you use any collections on this column family? We've had issues in the past with unexpectedly large partitions reported on data models with collections, which can also generate tons of tombstones on UPDATE ( https://issues.apache.org/jira/browse/CASSANDRA-10547) --Bryan On Mon, Mar

Re: Modeling transactional messages

2016-03-04 Thread Bryan Cheng
I think most people will tell you what Sean did- queues are considered an anti-pattern for many reasons in Cassandra, and while it's possible, you may want to consider something more suited for the job (RabbitMQ, redis queues are just a few ideas that come to mind). If you're sold on the idea of

Re: Lot of GC on two nodes out of 7

2016-03-03 Thread Bryan Cheng
Hi Anishek, In addition to the good advice others have given, do you notice any abnormally large partitions? What does cfhistograms report for 99% partition size? A few huge partitions will cause very disproportionate load on your cluster, including high GC. --Bryan On Wed, Mar 2, 2016 at 9:28

Re: Checking replication status

2016-03-01 Thread Bryan Cheng
ems to indicate repairing nodes within a datacenter, but for across DC > network outage, we want to repair nodes across DCs right? > > thanks > > > > On Fri, Feb 26, 2016 at 3:38 PM, Bryan Cheng <br...@blockcypher.com> > wrote: > >> Hi Jimmy, >> >

Re: Checking replication status

2016-02-26 Thread Bryan Cheng
Hi Jimmy, If you sustain a long downtime, repair is almost always the way to go. It seems like you're asking to what extent a cluster is able to recover/resync a downed peer. A peer will not attempt to reacquire all the data it has missed while being down. Recovery happens in a few ways: 1)

Re: Cassandra Multi DC (Active-Active) Setup - Measuring latency & throughput performance

2016-02-26 Thread Bryan Cheng
Hi Chandra, For write latency, etc. the tools are still largely the same set of tools you'd use for single-DC- stuff like tracing, cfhistograms, cassandra-stress come to mind. The exact results are going to differ based on your consistency tuning (can you get away with LOCAL_QUORUM vs QUORUM?)

Re: "Not enough replicas available for query" after reboot

2016-02-04 Thread Bryan Cheng
Hey Flavien! Did your reboot come with any other changes (schema, configuration, topology, version)? On Thu, Feb 4, 2016 at 2:06 PM, Flavien Charlon wrote: > I'm using the C# driver 2.5.2. I did try to restart the client > application, but that didn't make any

Re: EC2 storage options for C*

2016-02-03 Thread Bryan Cheng
ke any difference? >>>>>>> >>>>>>> What info is available on EBS performance at peak times, when >>>>>>> multiple AWS customers have spikes of demand? >>>>>>> >>>>>>> Is RAID much of a factor or help at all

Re: Any tips on how to track down why Cassandra won't cluster?

2016-02-03 Thread Bryan Cheng
> On Wed, 3 Feb 2016 at 11:49 Richard L. Burton III > wrote: > >> >> Any suggestions on how to track down what might trigger this problem? I'm >> not receiving any exceptions. >> > You're not getting "Unable to gossip with any seeds" on the second node? What does nodetool

Re: EC2 storage options for C*

2016-01-30 Thread Bryan Cheng
Yep, that motivated my question "Do you have any idea what kind of disk performance you need?". If you need the performance, its hard to beat ephemeral SSD in RAID 0 on EC2, and its a solid, battle tested configuration. If you don't, though, EBS GP2 will save a _lot_ of headache. Personally, on

Re: Session timeout

2016-01-29 Thread Bryan Cheng
To throw my (unsolicited) 2 cents into the ring, Oleg, you work for a well-funded and fairly large company. You are certainly free to continue using the list and asking for community support (I am definitely not in any position to tell you otherwise, anyway), but that community support is by

Re: EC2 storage options for C*

2016-01-29 Thread Bryan Cheng
Do you have any idea what kind of disk performance you need? Cassandra with RAID 0 is a fairly common configuration (Al's awesome tuning guide has a blurb on it https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html), so if you feel comfortable with the operational overhead it seems

Help debugging a very slow query

2016-01-13 Thread Bryan Cheng
Hi list, Would appreciate some insight into some irregular performance we're seeing. We have a column family that has become problematic recently. We've noticed a few queries take enormous amounts of time, and seem to clog up read resources on the machine (read pending tasks pile up, then

Re: max connection per user

2016-01-13 Thread Bryan Cheng
Are you actively exposing your database to users outside of your organization, or are you just asking about security best practices? If you mean the former, this isn't really a common use case and there isn't a huge amount out of the box that Cassandra will do to help. If you're just asking

Re: Rebuilding a new Cassandra node at 100Mb/s

2015-12-03 Thread Bryan Cheng
Jonathan: Have you changed stream_throughput_outbound_megabits_per_sec in cassandra.yaml? # Throttles all outbound streaming file transfers on this node to the # given total throughput in Mbps. This is necessary because Cassandra does # mostly sequential IO when streaming data during bootstrap or

Re: Transitioning to incremental repair

2015-12-02 Thread Bryan Cheng
Ah Marcus, that looks very promising- unfortunately we have already switched back to full repairs and our test cluster has been re-purposed for other tasks atm. I will be sure to apply the patch/try a fixed version of Cassandra if we attempt to migrate to incremental repair again.

Re: Issues on upgrading from 2.2.3 to 3.0

2015-12-02 Thread Bryan Cheng
Has your configuration changed? This is a new check- https://issues.apache.org/jira/browse/CASSANDRA-10242. It seems likely either your snitch changed, your properties changed, or something caused Cassandra to think one of the two happened... What's your node layout? On Fri, Nov 27, 2015 at

Re: Transitioning to incremental repair

2015-12-01 Thread Bryan Cheng
Sorry if I misunderstood, but are you asking about the LCS case? Based on our experience, I would absolutely recommend you continue with the migration procedure. Even if the compaction strategy is the same, the process of anticompaction is incredibly painful. We observed our test cluster running

Generalized download link?

2015-11-16 Thread Bryan Cheng
Hey list, Is there a URL available for downloading Cassandra that abstracts away the mirror selection (eg. just 302's to a mirror URL?) We've got a few self-configuring Cassandras (for example, the Docker container our devs use), and using the same mirror for the containers or for any bulk

Re: Repair Hangs while requesting Merkle Trees

2015-11-16 Thread Bryan Cheng
Hi Anuj, Did you mean streaming_socket_timeout_in_ms? If not, then you definitely want that set. Even the best network connections will break occasionally, and in Cassandra < 2.1.10 (I believe) this would leave those connections hanging indefinitely on one end. How far away are your two DC's

Re: Too many open files Cassandra 2.1.11.872

2015-11-06 Thread Bryan Cheng
Is your compaction progressing as expected? If not, this may cause an excessive number of tiny db files. Had a node refuse to start recently because of this, had to temporarily remove limits on that process. On Fri, Nov 6, 2015 at 10:09 AM, Jason Lewis wrote: > I'm

Re: Insertion Delay Cassandra 2.1.9

2015-11-06 Thread Bryan Cheng
Your experience, then, is expected (although 20m delay seems excessive, and is a sign you may be overloading your cluster, which may be expected with an unthrottled bulk load like that). When you insert with consistency ONE on RF > 1, that means your query returns after one node confirms the

What are the repercussions of a restart during anticompaction?

2015-11-05 Thread Bryan Cheng
Hey list, Tried to find an answer to this elsewhere, but turned up nothing. We ran our first incremental repair after a large dc migration two days ago; the cluster had been running full repairs prior to this during the migration. Our nodes are currently going through anticompaction, as

Re: Two node cassandra cluster doubts

2015-11-04 Thread Bryan Cheng
I believe what's going on here is this step: Select Count (*) From MYTABLE;---> 15 rows Shut down Node B. Start Up Node B. Select Count (*) From MYTABLE;---> 15 rows To understand why this is an issue, consider the way that consistency is attempted within Cassandra. With RF=2, (You should

Re: Doubt regarding consistency-level in Cassandra-2.1.10

2015-11-03 Thread Bryan Cheng
What Eric means is that SERIAL consistency is a special type of consistency that is only invoked for a subset of operations: those that use CAS/lightweight transactions, for example "IF NOT EXISTS" queries. The differences between CAS operations and standard operations are significant and there

Re: Maximum node decommission // bootstrap at once.

2015-10-06 Thread Bryan Cheng
Honestly, we've had more luck bootstrapping in our old DC (defining topology properties as the new DC) and using rsync to migrate the data files to new machines in the new datacenter. We had 10gig within the datacenter but significantly less than this cross-DC, which lead to a lot of broken

Re: Maximum node decommission // bootstrap at once.

2015-10-06 Thread Bryan Cheng
Robert, I might be misinterpreting you but I *think* your link is talking about bootstrapping a new node by bulk loading replica data from your existing cluster? I was referring to using Cassandra's bootstrap to get the node to join and run (as a member of DC2 but with physical residence in DC1),

Re: broadcast address on EC2 without Elastic IPs.

2015-10-01 Thread Bryan Cheng
Hey Renato, As far as I can tell, the reason you're getting private IP addresses back is that the node you're connecting to is relaying back the way that _it_ knows where to find other nodes, which is a function of the gossip state. This is expected behavior. Mixed Private/Public IP spaces

Re: Trace evidence for LOCAL_QUORUM ending up in remote DC

2015-09-08 Thread Bryan Cheng
Tom, I don't believe so; it seems the symptom would be an indefinite (or very long) hang. To clarify, is this issue restricted to LOCAL_QUORUM? Can you issue a LOCAL_ONE SELECT and retrieve the expected data back? On Tue, Sep 8, 2015 at 12:02 PM, Tom van den Berge < tom.vandenbe...@gmail.com>

Re: How to prevent queries being routed to new DC?

2015-09-03 Thread Bryan Cheng
Hey Tom, What's your replication strategy look like? When your new nodes join the ring, can you verify that they show up under a new DC and not as part of the old? --Bryan On Thu, Sep 3, 2015 at 11:27 AM, Tom van den Berge < tom.vandenbe...@gmail.com> wrote: > I want to start using vnodes in

Re: How to prevent queries being routed to new DC?

2015-09-03 Thread Bryan Cheng
gt; With the first approach I described, the new nodes join the cluster, and > show up correctly under the new DC, so all seems to be fine. > With the second approach (join_ring=false), they don't show up at all, > which is also what I expected. > > > On Thu, Sep 3, 2015 at 8:44

Re: How to prevent queries being routed to new DC?

2015-09-03 Thread Bryan Cheng
t; > It does not generate any errors. A query for a specific row simply does > not return the row if it is sent to a node in the new DC. This makes sense, > because the node is still empty. > > On Thu, Sep 3, 2015 at 9:03 PM, Bryan Cheng <br...@blockcypher.com> wrote: > >> This

Rebuild new DC nodes against new DC?

2015-08-31 Thread Bryan Cheng
Hi list, We're bringing up a second DC, and following the procedure outlined here: http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_add_dc_to_cluster_t.html We have three nodes in the new DC that are members of the cluster and indicate that they are running normally. We have

Re: Incremental, Sequential repair?

2015-08-25 Thread Bryan Cheng
: On Tue, Aug 25, 2015 at 2:44 PM, Bryan Cheng br...@blockcypher.com wrote: [2015-08-25 21:36:43,433] It is not possible to mix sequential repair and incremental repairs. Is this a limitation around a specific configuration? Or is it generally true that incremental and sequential repairs

Incremental, Sequential repair?

2015-08-25 Thread Bryan Cheng
Hey all, Got a question about incremental repairs, a quick google search turned up nothing conclusive. In the docs, in a few places, sequential, incremental repairs are mentioned. From http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_repair_nodes_c.html (indirectly): You can

Re: Change from single region EC2 to multi-region

2015-08-11 Thread Bryan Cheng
broadcast_address to public ip should be the correct configuration. Assuming your firewall rules are all kosher, you may need to clear gossip state? http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_gossip_purge.html -- Forwarded message -- From: Asher Newcomer

Cassandra compaction appears to stall, node becomes partially unresponsive

2015-07-22 Thread Bryan Cheng
Hi there, Within our Cassandra cluster, we're observing, on occasion, one or two nodes at a time becoming partially unresponsive. We're running 2.1.7 across the entire cluster. nodetool still reports the node as being healthy, and it does respond to some local queries; however, the CPU is

Re: Cassandra compaction appears to stall, node becomes partially unresponsive

2015-07-22 Thread Bryan Cheng
22, 2015 at 2:55 PM, Bryan Cheng br...@blockcypher.com wrote: nodetool still reports the node as being healthy, and it does respond to some local queries; however, the CPU is pegged at 100%. One common thread (heh) each time this happens is that there always seems to be one of more compaction

Re: Cassandra compaction appears to stall, node becomes partially unresponsive

2015-07-22 Thread Bryan Cheng
300ms collection time when it runs. On Wed, Jul 22, 2015 at 3:22 PM, Aiman Parvaiz ai...@flipagram.com wrote: Hi Bryan How's GC behaving on these boxes? On Wed, Jul 22, 2015 at 2:55 PM, Bryan Cheng br...@blockcypher.com wrote: Hi there, Within our Cassandra cluster, we're observing

Re: Cassandra compaction appears to stall, node becomes partially unresponsive

2015-07-22 Thread Bryan Cheng
of the cluster. On Wed, Jul 22, 2015 at 3:35 PM, Bryan Cheng br...@blockcypher.com wrote: Hi Aiman, We previously had issues with GC, but since upgrading to 2.1.7 things seem a lot healthier. We collect GC statistics through collectd via the garbage collector mbean, ParNew GC's report sub 500ms