回复: Data Loss irreparabley so
Due to the tombstone,we have set GC_GRACE_SECONDS to 6 hours.And for a huge table with 4T size,repair is a hard thing for us. -- 原始邮件 -- 发件人: "kurt";; 发送时间: 2017年8月3日(星期四) 中午12:08 收件人: "User" ; 主题: Re: Data Loss irreparabley so You should run repairs every GC_GRACE_SECONDS. If a node is overloaded/goes down, you should run repairs. LOCAL_QUORUM will somewhat maintain consistency within a DC, but certainly doesn't mean you can get away without running repairs. You need to run repairs even if you are using QUORUM or ONE.
回复: Data Loss irreparabley so
Hi, We are also experiencing the same issue.we have 3 DCs(DC1 RF=3,DC2 RF=3,DC3,RF=1),if we use local_quorum,we are not meant to loss any data,right? if we use local_one, maybe loss data? then we need to run repair regularly? Could anyone advise? Thanks -- 原始邮件 -- 发件人: "Jon Haddad";; 发送时间: 2017年7月28日(星期五) 凌晨1:37 收件人: "user" ; 主题: Re: Data Loss irreparabley so We (The Last Pickle) maintain an open source tool to help manage repairs across your clusters called Reaper. It’s a lot easier to set up and manage than trying to manage it through cron. http://thelastpickle.com/reaper.html On Jul 27, 2017, at 12:38 AM, Daniel Hölbling-Inzko wrote: In that vein, Cassandra support Auto compaction and incremental repair. Does this mean I have to set up cron jobs on each node to do a nodetool repair or is this taken care of by Cassandra anyways? How often should I run nodetool repair Greetings Daniel Jeff Jirsa schrieb am Do. 27. Juli 2017 um 07:48: On 2017-07-25 15:49 (-0700), Roger Warner wrote: > This is a quick informational question. I know that Cassandra can detect > failures of nodes and repair them given replication and multiple DC. > > My question is can Cassandra tell if data was lost after a failure and > node(s) “fixed” and resumed operation? > Sorta concerned by the way you're asking this - Cassandra doesn't "fix" failed nodes. It can route requests around a down node, but the "fixing" is entirely manual. If you have a node go down temporarily, and it comes back up (with it's disk intact), you can see it "repair" data with a combination of active (anti-entropy) repair via nodetool repair, or by watching 'nodetool netstats' and see the read repair counters increase over time (which will happen naturally as data is requested and mismatches are detected in the data, based on your consistency level). - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Cassandra data loss in come DC
Hi there, We have a three DCs Cluster (two DCs with RF=3,one remote DC with RF=1),we currently find that in DC1/DC2 select count(*) from t=1250,while in DC3 select count(*) from t=750. looks some data is missing in DC3(remote DC).there are no node down or anything exceptional. we only upgrade this DC from 2.1.13 to 2.1.18,but this seems won't cause data loss. Could anyone please advise? Thanks, Peng
?????? ?????? tolerate how many nodes down in the cluster
Thanks All for your reply.We will begin using RACs in our C* cluster. Thanks. -- -- ??: "kurt greaves";<k...@instaclustr.com>; : 2017??7??25??(??) 6:27 ??: "User"<user@cassandra.apache.org>; "anujw_2...@yahoo.co.in"<anujw_2...@yahoo.co.in>; : "Peng Xiao"<2535...@qq.com>; : Re: ?? tolerate how many nodes down in the cluster I've never really understood why Datastax recommends against racks. In those docs they make it out to be much more difficult than it actually is to configure and manage racks. The important thing to keep in mind when using racks is that your # of racks should be equal to your RF. If you have keyspaces with different RF, then it's best to have the same # as the RF of your most important keyspace, but in this scenario you lose some of the benefits of using racks. As Anuj has described, if you use RF # of racks, you can lose up to an entire rack without losing availability. Note that this entirely depends on the situation. When you take a node down, the other nodes in the cluster require capacity to be able to handle the extra load that node is no longer handling. What this means is that if your cluster will require the other nodes to store hints for that node (equivalent to the amount of writes made to that node), and also handle its portion of READs. You can only take out as many nodes from a rack as the capacity of your cluster allows. I also strongly disagree that using racks makes operations tougher. If anything, it makes them considerably easier (especially when using vnodes). The only difficulty is the initial setup of racks, but for all the possible benefits it's certainly worth it. As well as the fact that you can lose up to an entire rack (great for AWS AZ's) without affecting availability, using racks also makes operations on large clusters much smoother. For example, when upgrading a cluster, you can now do it a rack at a time, or some portion of a rack at a time. Same for OS upgrades or any other operation that could happen in your environment. This is important if you have lots of nodes. Also it makes coordinating repairs easier, as you now only need to repair a single rack to ensure you've repaired all the data. Basically any operation/problem where you need to consider the distribution of data, racks are going to help you.
回复: 回复: tolerate how many nodes down in the cluster
Thanks for the remind,we will setup a new DC as suggested. -- 原始邮件 -- 发件人: "kurt greaves";; 发送时间: 2017年7月26日(星期三) 上午10:30 收件人: "User" ; 抄送: "anujw_2...@yahoo.co.in" ; 主题: Re: 回复: tolerate how many nodes down in the cluster Keep in mind that you shouldn't just enable multiple racks on an existing cluster (this will lead to massive inconsistencies). The best method is to migrate to a new DC as Brooke mentioned.
回复: tolerate how many nodes down in the cluster
Hi Bhuvan, From the following link,it doesn't suggest us to use RAC and it looks reasonable. http://www.datastax.com/dev/blog/multi-datacenter-replication Defining one rack for the entire cluster is the simplest and most common implementation. Multiple racks should be avoided for the following reasons: • Most users tend to ignore or forget rack requirements that state racks should be in an alternating order to allow the data to get distributed safely and appropriately. • Many users are not using the rack information effectively by using a setup with as many racks as they have nodes, or similar non-beneficial scenarios. • When using racks correctly, each rack should typically have the same number of nodes. • In a scenario that requires a cluster expansion while using racks, the expansion procedure can be tedious since it typically involves several node moves and has has to ensure to ensure that racks will be distributing data correctly and evenly. At times when clusters need immediate expansion, racks should be the last things to worry about. -- 原始邮件 -- 发件人: "Bhuvan Rawal";<bhu1ra...@gmail.com>; 发送时间: 2017年7月24日(星期一) 晚上7:17 收件人: "user"<user@cassandra.apache.org>; 主题: Re: tolerate how many nodes down in the cluster Hi Peng , This really depends on how you have configured your topology. Say if you have segregated your dc into 3 racks with 10 servers each. With RF of 3 you can safely assume your data to be available if one rack goes down. But if different servers amongst the racks fail then i guess you are not guaranteeing data integrity with RF of 3 in that case you can at max lose 2 servers to be available. Best idea would be to plan failover modes appropriately and letting cassandra know of the same. Regards, Bhuvan On Mon, Jul 24, 2017 at 3:28 PM, Peng Xiao <2535...@qq.com> wrote: Hi, Suppose we have a 30 nodes cluster in one DC with RF=3, how many nodes can be down?can we tolerate 10 nodes down? it seems that we are not able to avoid the data distribution 3 replicas in the 10 nodes?, then we can only tolerate 1 node down even we have 30 nodes? Could anyone please advise? Thanks
?????? ?????? ?????? t olerate how many nodes down in the cluster
Thanks all for your thorough explanation. -- -- ??: "Anuj Wadehra";<anujw_2...@yahoo.co.in.INVALID>; : 2017??7??28??(??) 0:49 ??: "User cassandra.apache.org"<user@cassandra.apache.org>; "Peng Xiao"<2535...@qq.com>; : Re: ?? ?? t olerate how many nodes down in the cluster Hi Peng, Racks can be logical (as defined with RAC attribute in Cassandra configuration files) or physical (racks in server rooms). In my view, for leveraging racks in your case, its important to understand the implication of following decisions: 1. Number of distinct logical RACs defined in Cassandra:If you want to leverage RACs optimally for operational efficiencies (like Brooke explained), you need to make sure that logical RACs are ALWAYS equal to RF irrespective of the fact whether physical Racks are equal to or greater than RF. Keeping logical Racks=RF, ensures that nodes allocated to a logical rack have exactly 1 replicas of the entire 100% data set. So, if your have RF=3 and you use QUORUM for read/write, you can bring down ALL nodes allocated to a logical rack for maintenance activity and still achieve 100% availability. This makes operations faster and cuts down the risk involved. For example, imagine taking a Cassandra restart of entire cluster. If one node takes 3 minutes, a rolling restart of 30 nodes would take 90 minutes. But, if you use 3 logical RACs with RF=3 and assign 10 nodes to each logical RAC, you can restart 10 nodes within a RAC simultaneously (of course in off-peak hours so that remaining 20 nodes can take the load). Staring Cassandra on all RACs one by one will just take 9 minutes rather than 90 minutes. If there are any issues during restart/maintenance, you can take all the nodes on a Logical RAC down, fix them and bring them back without affecting availability 2.Number of physical Racks : As per historical data, there are instances when more than one nodes in a physical rack fail together. When you are using VMs, there are three levels instead of two. VMs on a single physical machine are likely to fail together too due to hardware failure. Physical Racks > Physical Machines > VMs Ensure that all VMs on a physical machine map to single logical RAC. If you want to afford failure of physical racks in the server room, you also need to ensure that all physical servers on a physical rack must map to just one logical RAC. This way, you can afford failure of ALL VMs on ALL physical machines mapped to a single logical RAC and still be 100% available. For Example: RF=3 , 6 physical racks, 2 physical servers per physical rack and 3 VMs per physical server. Setup would be- Physical Rack1 = [Physical1 (3 VM) + Physical2 (3 VM) ]= LogicalRAC1 Physical Rack2 = [Physical3 (3 VM) + Physical4 (3 VM) ]= LogicalRAC1 Physical Rack3 = [Physical5 (3 VM) + Physical6 (3 VM) ]= LogicalRAC2 Physical Rack4 = [Physical7 (3 VM) + Physical8 (3 VM) ]= LogicalRAC2 Physical Rack5 = [Physical9 (3 VM) + Physical10 (3 VM) ]= LogicalRAC3 Physical Rack6 = [Physical11 (3 VM) + Physical12 (3 VM) ]= LogicalRAC3 Problem with this approach is scaling. What if you want to add a single physical server? If you do that and allocate it to one existing logical RAC, your cluster wont be balanced properly because the logical RAC to which the server is added will have additional capacity for same data as other two logical RACs.To keep your cluster balanced, you need to add at least 3 physical servers in 3 different physical Racks and assign each physical server to different logical RAC. This is wastage of resources and hard to digest. If you have physical machines < logical RACs, every physical machine may have more than 1 replica. If entire physical machine fails, you will NOT have 100% availability as more than 1 replica may be unavailable. Similarly, if you have physical racks < logical RACs, every physical rack may have more than 1 replica. If entire physical rack fails, you will NOT have 100% availability as more than 1 replica may be unavailable. Coming back to your example: RF=3 per DC (total RF=6), CL=QUORUM, 2 DCs, 6 physical machines, 8 VMs per physical machine: My Recommendation : 1. In each DC, assign 3 physical machines in a DC to 3 logical RACs in Cassandra configuration . 2 DCs can have same RAC names as RACs are uniquely identified with their DC names. So, these are 6 different logical RACs (multiple of RF) (i.e. 1 physical machine per logical RAC) 2. Add 6 physical machines (3 physical machines per DC) to scale the cluster and assign every machine to different logical RAC within the DC. This way, even if you have Active-Passive DC setup, you can afford failure of any physical machine or physical rack in Active DC and still ensure 100% availability. You would also achieve operational benefits explained
回复: 回复: tolerate how many nodes down in the cluster
as per Brooke suggests,RACs a multipile of RF. https://www.youtube.com/watch?v=QrP7G1eeQTI if we have 6 machines with RF=3,then we can set up 6 RACs or setup 3RACs,which will be better? Could you please further advise? Many thanks -- 原始邮件 -- 发件人: "我自己的邮箱";<2535...@qq.com>; 发送时间: 2017年7月26日(星期三) 晚上7:31 收件人: "user"; 抄送: "anujw_2...@yahoo.co.in" ; 主题: 回复: 回复: tolerate how many nodes down in the cluster One more question.why the # of racks should be equal to RF? For example,we have 4 machines,each virtualized to 8 vms ,can we set 4 RACs with RF3?I mean one machine one RAC. Thanks -- 原始邮件 -- 发件人: "我自己的邮箱";<2535...@qq.com>; 发送时间: 2017年7月26日(星期三) 上午10:32 收件人: "user" ; 抄送: "anujw_2...@yahoo.co.in" ; 主题: 回复: 回复: tolerate how many nodes down in the cluster Thanks for the remind,we will setup a new DC as suggested. -- 原始邮件 -- 发件人: "kurt greaves"; ; 发送时间: 2017年7月26日(星期三) 上午10:30 收件人: "User" ; 抄送: "anujw_2...@yahoo.co.in" ; 主题: Re: 回复: tolerate how many nodes down in the cluster Keep in mind that you shouldn't just enable multiple racks on an existing cluster (this will lead to massive inconsistencies). The best method is to migrate to a new DC as Brooke mentioned.
回复: 回复: tolerate how many nodes down in the cluster
One more question.why the # of racks should be equal to RF? For example,we have 4 machines,each virtualized to 8 vms ,can we set 4 RACs with RF3?I mean one machine one RAC. Thanks -- 原始邮件 -- 发件人: "我自己的邮箱";<2535...@qq.com>; 发送时间: 2017年7月26日(星期三) 上午10:32 收件人: "user"; 抄送: "anujw_2...@yahoo.co.in" ; 主题: 回复: 回复: tolerate how many nodes down in the cluster Thanks for the remind,we will setup a new DC as suggested. -- 原始邮件 -- 发件人: "kurt greaves"; ; 发送时间: 2017年7月26日(星期三) 上午10:30 收件人: "User" ; 抄送: "anujw_2...@yahoo.co.in" ; 主题: Re: 回复: tolerate how many nodes down in the cluster Keep in mind that you shouldn't just enable multiple racks on an existing cluster (this will lead to massive inconsistencies). The best method is to migrate to a new DC as Brooke mentioned.
Re: Timeout while setting keyspace
https://datastax-oss.atlassian.net/browse/JAVA-1002 This one says it's the driver issue,we will have a try. -- Original -- From: "";<2535...@qq.com>; Date: Wed, Jul 26, 2017 04:12 PM To: "user"; Subject: Timeout while setting keyspace Dear All, We are expericencing a strange issue.Currently we have a Cluster with Cassandra 2.1.13. when the applications start,it will print the following warings.And it takes long time for applications to start. Could you please advise ? 2017-07-26 15:49:20.676 WARN 11706 --- [-] [cluster1-nio-worker-2] com.datastax.driver.core.Connection : Timeout while setting keyspace on Connection[/172.16.42.138:9042-3, inFlight=5, closed=false]. This should not happen but is not critical (it will be retried) 2017-07-26 15:49:20.677 WARN 11706 --- [-] [cluster1-nio-worker-3] com.datastax.driver.core.Connection : Timeout while setting keyspace on Connection[/xxx.16.42.xxx:9042-3, inFlight=5, closed=false]. This should not happen but is not critical (it will be retried) 2017-07-26 15:49:20.676 WARN 11706 --- [-] [cluster1-nio-worker-0] com.datastax.driver.core.Connection : Timeout while setting keyspace on Connection[/xxx.16.42.xxx:9042-3, inFlight=5, closed=false]. This should not happen but is not critical (it will be retried) 2017-07-26 15:49:20.676 WARN 11706 --- [-] [cluster1-nio-worker-1] com.datastax.driver.core.Connection : Timeout while setting keyspace on Connection[/xxx.16.42.xxx:9042-3, inFlight=5, closed=false]. This should not happen but is not critical (it will be retried) 2017-07-26 15:49:32.777 WARN 11706 --- [-] [main] com.datastax.driver.core.Connection : Timeout while setting keyspace on Connection[/172.16.42.113:9042-3, inFlight=1, closed=false]. This should not happen but is not critical (it will be retried) Thanks
Timeout while setting keyspace
Dear All, We are expericencing a strange issue.Currently we have a Cluster with Cassandra 2.1.13. when the applications start,it will print the following warings.And it takes long time for applications to start. Could you please advise ? 2017-07-26 15:49:20.676 WARN 11706 --- [-] [cluster1-nio-worker-2] com.datastax.driver.core.Connection : Timeout while setting keyspace on Connection[/172.16.42.138:9042-3, inFlight=5, closed=false]. This should not happen but is not critical (it will be retried) 2017-07-26 15:49:20.677 WARN 11706 --- [-] [cluster1-nio-worker-3] com.datastax.driver.core.Connection : Timeout while setting keyspace on Connection[/xxx.16.42.xxx:9042-3, inFlight=5, closed=false]. This should not happen but is not critical (it will be retried) 2017-07-26 15:49:20.676 WARN 11706 --- [-] [cluster1-nio-worker-0] com.datastax.driver.core.Connection : Timeout while setting keyspace on Connection[/xxx.16.42.xxx:9042-3, inFlight=5, closed=false]. This should not happen but is not critical (it will be retried) 2017-07-26 15:49:20.676 WARN 11706 --- [-] [cluster1-nio-worker-1] com.datastax.driver.core.Connection : Timeout while setting keyspace on Connection[/xxx.16.42.xxx:9042-3, inFlight=5, closed=false]. This should not happen but is not critical (it will be retried) 2017-07-26 15:49:32.777 WARN 11706 --- [-] [main] com.datastax.driver.core.Connection : Timeout while setting keyspace on Connection[/172.16.42.113:9042-3, inFlight=1, closed=false]. This should not happen but is not critical (it will be retried) Thanks
?????? ?????? tolerate how many nodes down in the cluster
Kurt/All, why the # of racks should be equal to RF? For example,we have 2 DCs each 6 machines with RF=3,each machine virtualized to 8 vms , can we set 6 racs with RF3? I mean one machine one RAC to avoid hardware errors or only set 3 racs,1 rac with 2 machines,which is better? Thanks -- -- ??: "Anuj Wadehra";<anujw_2...@yahoo.co.in.INVALID>; : 2017??7??27??(??) 1:41 ??: "Brooke Thorley"<bro...@instaclustr.com>; "user@cassandra.apache.org"<user@cassandra.apache.org>; : "Peng Xiao"<2535...@qq.com>; : Re: ?? tolerate how many nodes down in the cluster Hi Brooke, Very nice presentation: https://www.youtube.com/watch?v=QrP7G1eeQTI !! Good to know that you are able to leverage Racks for gaining operational efficiencies. I think vnodes have made life easier. I still see some concerns with Racks: 1. Usually scaling needs are driven by business requirements. Customers want value for every penny they spend. Adding 3 or 5 servers (because you have RF=3 or 5) instead of 1 server costs them dearly. It's difficult to justify the additional cost as fault tolerance can only be improved but not guaranteed with racks. 2. You need to maintain mappings of Logical Racks (=RF) and physical racks (multiple of RFs) for large clusters. 3. Using racks tightly couples your hardware (rack size, rack count) / virtualization decisions (VM Size, VM count per physical node) with application RF. Thanks Anuj On Tuesday, 25 July 2017 3:56 AM, Brooke Thorley <bro...@instaclustr.com> wrote: Hello Peng. I think spending the time to set up your nodes into racks is worth it for the benefits that it brings. With RF3 and NTS you can tolerate the loss of a whole rack of nodes without losing QUORUM as each rack will contain a full set of data. It makes ongoing cluster maintenance easier, as you can perform upgrades, repairs and restarts on a whole rack of nodes at once. Setting up racks or adding nodes is not difficult particularly if you are using vnodes. You would simply add nodes in multiples of to keep the racks balanced. This is how we run all our managed clusters and it works very well. You may be interested to watch my Cassandra Summit presentation from last year in which I discussed this very topic: https://www.youtube.com/watch?v=QrP7G1eeQTI (from 4:00) If you were to consider changing your rack topology, I would recommend that you do this by DC migration rather than "in place". Kind Regards,Brooke Thorley VP Technical Operations & Customer Services supp...@instaclustr.com | support.instaclustr.com Read our latest technical blog posts here. This email has been sent on behalf of Instaclustr Pty. Limited (Australia) and Instaclustr Inc (USA). This email and any attachments may contain confidential and legally privileged information. If you are not the intended recipient, do not copy or disclose its content, but please reply to this email immediately and highlight the error to the sender and then immediately delete the message. On 25 July 2017 at 03:06, Anuj Wadehra <anujw_2...@yahoo.co.in.invalid> wrote: Hi Peng, Three things are important when you are evaluating fault tolerance and availability for your cluster: 1. RF 2. CL 3. Topology - how data is replicated in racks. If you assume that N nodes from ANY rack may fail at the same time, then you can afford failure of RF-CL nodes and still be 100% available. E. g. If you are reading at quorum and RF=3, you can only afford one (3-2) node failure. Thus, even if you have a 30 node cluster, 10 node failure can not provide you 100% availability. RF impacts availability rather than total number of nodes in a cluster. If you assume that N nodes failing together will ALWAYS be from the same rack, you can spread your servers in RF physical racks and use NetworkTopologyStrategy. While allocating replicas for any data, Cassandra will ensure that 3 replicas are placed in 3 different racks E.g. you can have 10 nodes in 3 racks and then even a 10 node failure within SAME rack shall ensure that you have 100% availability as two replicas are there for 100% data and CL=QUORUM can be met. I have not tested this but that how the rack concept is expected to work. I agree, using racks generally makes operations tougher. Thanks Anuj On Mon, 24 Jul 2017 at 20:10, Peng Xiao <2535...@qq.com> wrote: Hi Bhuvan, From the following link,it doesn't suggest us to use RAC and it looks reasonable. http://www.datastax.com/dev/ blog/multi-datacenter- replication Defining one rack for the entire cluster is the simplest and most common implementation. Multiple racks should be avoided for the following reasons: ?6?1Most users tend to ignore or forget rack requirements
tolerate how many nodes down in the cluster
Hi, Suppose we have a 30 nodes cluster in one DC with RF=3, how many nodes can be down?can we tolerate 10 nodes down? it seems that we are not able to avoid the data distribution 3 replicas in the 10 nodes?, then we can only tolerate 1 node down even we have 30 nodes? Could anyone please advise? Thanks
How do you monitoring Cassandra Cluster?
Dear All, we are currently using Cassandra 2.1.13,and it has grown to 5TB size with 32 nodes in one DC. For monitoring,opsCenter does not send alarm and not free in higher version.so we have to use a simple JMX+Zabbix template.And we plan to use Jolokia+JMX2Graphite to draw the metrics chart now. Could you please advise? Thanks, Henry
gossip down failure detected
Hi, We are experiencing the following issue,the rt will fly to 15s sometime.and after adjusting the batch size, it looks better,but still have the following issue.Could any one advise? INFO [GossipTasks:1] 2017-07-07 08:56:33,410 Gossiper.java:1009 - InetAddress /172.16.xx.39 is now DOWN on 172.16.xx.39,we can see the following log: WARN [SharedPool-Worker-18] 2017-07-07 08:56:12,049 BatchStatement.java:255 - Batch of prepared statements for [ecommercedata.ecommerce_baitiao_reco rd_by_order_no] is of size 9470, exceeding specified threshold of 5120 by 4350. WARN [GossipTasks:1] 2017-07-07 08:56:44,052 FailureDetector.java:258 - Not marking nodes down due to local pause of 31835321522 > 50 INFO [ScheduledTasks:1] 2017-07-07 08:56:44,055 MessagingService.java:929 - READ messages were dropped in last 5000 ms: 48 for internal timeout and 0 for cross node timeout INFO [ScheduledTasks:1] 2017-07-07 08:56:44,055 MessagingService.java:929 - RANGE_SLICE messages were dropped in last 5000 ms: 12 for internal timeo ut and 0 for cross node timeout INFO [ScheduledTasks:1] 2017-07-07 08:56:44,055 StatusLogger.java:51 - Pool NameActive Pending Completed Blocked All T ime Blocked INFO [ScheduledTasks:1] 2017-07-07 08:56:44,056 StatusLogger.java:66 - MutationStage 7 1380 618231255 0 0 Thanks
Re: MUTATION messages were dropped in last 5000 ms for cross nodetimeout
hi?? Does message drop mean data loss? Thanks -- Original -- From: Akhil MehraDate: ,8?? 4,2017 16:00 To: user Subject: Re: MUTATION messages were dropped in last 5000 ms for cross nodetimeout Glad I could be of help :) Hopefully the partition size resize goes smoothly. Regards, Akhil On 4/08/2017, at 5:41 AM, ZAIDI, ASAD A wrote: Hi Akhil, Thank you for your reply. I kept testing different timeout numbers over last week and eventually settled at setting *_request_timeout_in_ms parameters at 1.5minutes for coordinator wait time. That is the number where I donot see any dropped mutations. Also asked developers to tweak data model where we saw bunch of tables with really large partition size , some are ranging Partition-key size around ~6.6GB.. we??re now working to reduce the partition size of the tables. I am hoping corrected data model will help reduce coordinator wait time (get back to default number!) again. Thank again/Asad From: Akhil Mehra [mailto:akhilme...@gmail.com] Sent: Friday, July 21, 2017 4:24 PM To: user@cassandra.apache.org Subject: Re: MUTATION messages were dropped in last 5000 ms for cross node timeout Hi Asad, The 5000 ms is not configurable (https://github.com/apache/cassandra/blob/8b3a60b9a7dbefeecc06bace617279612ec7092d/src/java/org/apache/cassandra/net/MessagingService.java#L423). This just the time after which the number of dropped messages are reported. Thus dropped messages are reported every 5000ms. If you are looking to tweak the number of ms after which a message is considered dropped then you need to use the write_request_timeout_in_ms. The write_request_timeout_in_ms (http://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configCassandra_yaml_r.html) can be used to increase the mutation timeout. By default it is set to 2000ms. I hope that helps. Regards, Akhil On 22/07/2017, at 2:46 AM, ZAIDI, ASAD A wrote: Hi Akhil, Thank you for your reply. Previously, I did ??tune?? various timeouts ?C basically increased them a bit but none of those parameter listed in the link matches with that ??were dropped in last 5000 ms??. I was wondering from where that [5000ms] number is coming from when, like I mentioned before, none of any timeout parameter settings matches that #! Load is intermittently high but again cpu queue length never goes beyond medium depth. I wonder if there is some internal limit that I??m still not aware of. Thanks/Asad From: Akhil Mehra [mailto:akhilme...@gmail.com] Sent: Thursday, July 20, 2017 3:47 PM To: user@cassandra.apache.org Subject: Re: MUTATION messages were dropped in last 5000 ms for cross node timeout Hi Asad, http://cassandra.apache.org/doc/latest/faq/index.html#why-message-dropped As mentioned in the link above this is a load shedding mechanism used by Cassandra. Is you cluster under heavy load? Regards, Akhil On 21/07/2017, at 3:27 AM, ZAIDI, ASAD A wrote: Hello Folks ?C I??m using apache-cassandra 2.2.8. I see many messages like below in my system.log file. In Cassandra.yaml file [ cross_node_timeout: true] is set and NTP server is also running correcting clock drift on 16node cluster. I do not see pending or blocked HintedHandoff in tpstats output though there are bunch of MUTATIONS dropped observed. INFO [ScheduledTasks:1] 2017-07-20 08:02:52,511 MessagingService.java:946 - MUTATION messages were dropped in last 5000 ms: 822 for internal timeout and 2152 for cross node timeout I??m seeking help here if you please let me know what I need to check in order to address these cross node timeouts. Thank you, Asad
optimal value for native_transport_max_threads
Dear All, any suggestion for optimal value for native_transport_max_threads? as per https://issues.apache.org/jira/browse/CASSANDRA-11363,max_queued_native_transport_requests=4096,how about native_transport_max_threads? Thanks, Peng Xiao
Re: Row Cache hit issue
And we are using C* 2.1.18. -- Original -- From: "";<2535...@qq.com>; Date: Wed, Sep 20, 2017 11:27 AM To: "user"<user@cassandra.apache.org>; Subject: Row Cache hit issue Dear All, The default row_cache_save_period=0,looks Row Cache does not work in this situation? but we can still see the row cache hit. Row Cache : entries 202787, size 100 MB, capacity 100 MB, 3095293 hits, 6796801 requests, 0.455 recent hit rate, 0 save period in seconds Could anyone please explain this? Thanks, Peng Xiao
Row Cache hit issue
Dear All, The default row_cache_save_period=0,looks Row Cache does not work in this situation? but we can still see the row cache hit. Row Cache : entries 202787, size 100 MB, capacity 100 MB, 3095293 hits, 6796801 requests, 0.455 recent hit rate, 0 save period in seconds Could anyone please explain this? Thanks, Peng Xiao
??????RE: Row Cache hit issue
Thanks All. -- -- ??: "Steinmaurer, Thomas";<thomas.steinmau...@dynatrace.com>; : 2017??9??20??(??) 1:38 ??: "user@cassandra.apache.org"<user@cassandra.apache.org>; : RE: Row Cache hit issue Hi, additionally, with saved (key) caches, we had some sort of corruption (I think, for whatever reason) once. So, if you see something like that upon Cassandra startup: INFO [main] 2017-01-04 15:38:58,772 AutoSavingCache.java (line 114) reading saved cache /var/opt/xxx/cassandra/saved_caches/ks-cf-KeyCache-b.db ERROR [main] 2017-01-04 15:38:58,891 CassandraDaemon.java (line 571) Exception encountered during startup java.lang.OutOfMemoryError: Java heap space at java.util.ArrayList.(ArrayList.java:152) at org.apache.cassandra.db.RowIndexEntry$Serializer.deserialize(RowIndexEntry.java:132) at org.apache.cassandra.service.CacheService$KeyCacheSerializer.deserialize(CacheService.java:365) at org.apache.cassandra.cache.AutoSavingCache.loadSaved(AutoSavingCache.java:119) at org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:276) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:435) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:406) at org.apache.cassandra.db.Keyspace.initCf(Keyspace.java:322) at org.apache.cassandra.db.Keyspace.(Keyspace.java:268) at org.apache.cassandra.db.Keyspace.open(Keyspace.java:110) at org.apache.cassandra.db.Keyspace.open(Keyspace.java:88) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:364) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:554) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:643) resulting in Cassandra going OOM, with a ??reading saved cache?? log entry close before the OOM, you may have hit some sort of corruption. Workaround is to physically delete the saved cache file and Cassandra will start up just fine. Regards, Thomas From: Dikang Gu [mailto:dikan...@gmail.com] Sent: Mittwoch, 20. September 2017 06:06 To: cassandra <user@cassandra.apache.org> Subject: Re: Row Cache hit issue Hi Peng, C* periodically saves cache to disk, to solve cold start problem. If row_cache_save_period=0, it means C* does not save cache to disk. But the cache is still working, if it's enabled in table schema, just the cache will be empty after restart. --Dikang. On Tue, Sep 19, 2017 at 8:27 PM, Peng Xiao <2535...@qq.com> wrote: And we are using C* 2.1.18. -- Original -- From: "";<2535...@qq.com>; Date: Wed, Sep 20, 2017 11:27 AM To: "user"<user@cassandra.apache.org>; Subject: Row Cache hit issue Dear All, The default row_cache_save_period=0,looks Row Cache does not work in this situation? but we can still see the row cache hit. Row Cache : entries 202787, size 100 MB, capacity 100 MB, 3095293 hits, 6796801 requests, 0.455 recent hit rate, 0 save period in seconds Could anyone please explain this? Thanks, Peng Xiao -- Dikang The contents of this e-mail are intended for the named addressee only. It contains information that may be confidential. Unless you are the named addressee or an authorized designee, you may not copy or use it, or disclose it to anyone else. If you received it in error please notify us immediately and then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a company registered in Linz whose registered office is at 4040 Linz, Austria, Freist?0?1dterstra?0?8e 313
Pending-range-calculator during bootstrapping
Dear All, when we are bootstrapping a new node,we are experiencing high cpu load which affect the rt ,and we noticed that it's mainly costing on Pending-range-calculator ,this did not happen before. We are using C* 2.1.13 in one DC,2.1.18 in another DC. Could anyone please advise on this? Thanks, Peng Xiao
add new nodes in two DCs at the same time
Hi, as Datastax suggests,we should only bootstrap one new node one time. but can we add new nodes in two DCs at the same time? Thanks, Peng Xiao
Pending-range-calculator during bootstrapping
Dear All, when we are bootstrapping a new node,we are experiencing high cpu load and this affect the rt ,and we noticed that it's mainly costing on Pending-range-calculator ,this did not happen before. We are using C* 2.1.13. Could anyone please advise on this? Thanks, Peng Xiao
network down between DCs
Hi there, We have two DCs for a Cassandra Cluster,if the network is down less than 3 hours(default hint window),with my understanding,it will recover automatically,right?Do we need to run repair manually? Thanks, Peng Xiao
回复:RE: network down between DCs
Thanks Thomas for the reminder,we will watch the system log. -- 原始邮件 -- 发件人: "Steinmaurer, Thomas";<thomas.steinmau...@dynatrace.com>; 发送时间: 2017年9月21日(星期四) 下午5:17 收件人: "user@cassandra.apache.org"<user@cassandra.apache.org>; 主题: RE: network down between DCs Hi, within the default hint window of 3 hours, the hinted handoff mechanism should take care of that, but we have seen that failing from time to time (depending on the load) in 2.1 with some sort of tombstone related issues causing failing requests on the system hints table. So, watch out any sign of hinted handoff troubles in the Cassandra log. Hint storage has been re-written in 3.0+ to flat files, thus tombstone related troubles in that area should be gone. Thomas From: Hannu Kröger [mailto:hkro...@gmail.com] Sent: Donnerstag, 21. September 2017 10:32 To: Peng Xiao <2535...@qq.com>; user@cassandra.apache.org Subject: Re: network down between DCs Hi, That’s correct. You need to run repairs only after a node/DC/connection is down for more then max_hint_window_in_ms. Cheers, Hannu On 21 September 2017 at 11:30:44, Peng Xiao (2535...@qq.com) wrote: Hi there, We have two DCs for a Cassandra Cluster,if the network is down less than 3 hours(default hint window),with my understanding,it will recover automatically,right?Do we need to run repair manually? Thanks, Peng Xiao The contents of this e-mail are intended for the named addressee only. It contains information that may be confidential. Unless you are the named addressee or an authorized designee, you may not copy or use it, or disclose it to anyone else. If you received it in error please notify us immediately and then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a company registered in Linz whose registered office is at 4040 Linz, Austria, Freistädterstraße 313
split one keyspace from one cluster to another
Dear All, We'd like to migrate one keyspace from one cluster to another,the keyspace is about 100G. If we use sstableloader,we have to stop the application during the migration.any good idea? Thanks, Peng Xiao
cassandra hardware requirements (STAT/SSD)
Hi there, we are struggling on hardware selection,we all know that ssd is good,and Datastax suggests us to use ssd,as Cassandra is a CPU bound db,we are considering to use sata disk,we noticed that the normal IO throughput is 7MB/s. Could anyone give some advice? Thanks, Peng Xiao
space left for compaction
Dear All, As for STCS,datastax suggest us to keep half of the free space for compaction,this is not strict,could anyone advise how many space should we left for one node? Thanks, Peng Xiao
limit the sstable file size
Dear All, Can we limit the sstable file size?as we have a huge cluster,the sstable file is too large for ETL to extract,Could you please advise? Thanks, Peng Xiao
?????? limit the sstable file size
Thanks Jeff for the quick reply. -- -- ??: "Jeff Jirsa";<jji...@gmail.com>; : 2017??9??30??(??) 11:45 ??: "cassandra"<user@cassandra.apache.org>; : Re: limit the sstable file size There's no way to limit file size in STCS. If you use LCS, it will default to 160MB (except in cases where you have a very large partition - in those cases, the sstable will scale with your partition size, but you really shouldn't have partitions larger than 160MB) On Fri, Sep 29, 2017 at 8:41 PM, Peng Xiao <2535...@qq.com> wrote: Dear All, Can we limit the sstable file size?as we have a huge cluster,the sstable file is too large for ETL to extract,Could you please advise? Thanks, Peng Xiao
?????? data loss in different DC
Thanks All -- -- ??: "Jeff Jirsa";<jji...@gmail.com>; : 2017??9??28??(??) 9:16 ??: "user"<user@cassandra.apache.org>; : Re: data loss in different DC Your quorum writers are only guaranteed to be on half+1 nodes - there??s no guarantee which nodes those will be. For strong consistency with multiple DCs, You can either: - write at quorum and read at quorum from any dc, or - write each_quorum and read local_quorum from any dc, or - write at local_quorum and read local_quorum from the same DC only -- Jeff Jirsa > On Sep 28, 2017, at 2:41 AM, Peng Xiao <2535...@qq.com> wrote: > > Dear All, > > We have a cluster with one DC1:RF=3,another DC DC2:RF=1 only for ETL,but we > found that sometimes we can query records in DC1,while not able not find the > same record in DC2 with local_quorum.How it happens? > Could anyone please advise? > looks we can only run repair to fix it. > > Thanks, > Peng Xiao - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
data loss in different DC
Dear All, We have a cluster with one DC1:RF=3,another DC DC2:RF=1,DC2 only for ETL,but we found that sometimes we can query records in DC1,while not able not find the same record in DC2 with local_quorum.How it happens?looks data loss in DC2. Could anyone please advise? looks we can only run repair to fix it. Thanks, Peng Xiao
?????? data loss in different DC
even with CL=QUORUM,there is no guarantee to be sure to read the same data in DC2,right? then multi DCs looks make no sense? -- -- ??: "DuyHai Doan";<doanduy...@gmail.com>; : 2017??9??28??(??) 5:45 ??: "user"<user@cassandra.apache.org>; : Re: data loss in different DC If you're writing into DC1 with CL = LOCAL_xxx, there is no guarantee to be sure to read the same data in DC2. Only repair will help you On Thu, Sep 28, 2017 at 11:41 AM, Peng Xiao <2535...@qq.com> wrote: Dear All, We have a cluster with one DC1:RF=3,another DC DC2:RF=1 only for ETL,but we found that sometimes we can query records in DC1,while not able not find the same record in DC2 with local_quorum.How it happens? Could anyone please advise? looks we can only run repair to fix it. Thanks, Peng Xiao
data loss in different DC
Dear All, We have a cluster with one DC1:RF=3,another DC DC2:RF=1 only for ETL,but we found that sometimes we can query records in DC1,while not able not find the same record in DC2 with local_quorum.How it happens? Could anyone please advise? looks we can only run repair to fix it. Thanks, Peng Xiao
Re: data loss in different DC
very sorry for the duplicate mail. -- Original -- From: "";<2535...@qq.com>; Date: Thu, Sep 28, 2017 07:41 PM To: "user"<user@cassandra.apache.org>; Subject: data loss in different DC Dear All, We have a cluster with one DC1:RF=3,another DC DC2:RF=1,DC2 only for ETL,but we found that sometimes we can query records in DC1,while not able not find the same record in DC2 with local_quorum.How it happens?looks data loss in DC2. Could anyone please advise? looks we can only run repair to fix it. Thanks, Peng Xiao
nodetool cleanup in parallel
hi, nodetool cleanup will only remove those keys which no longer belong to those nodes,than theoretically we can run nodetool cleanup in parallel,right?the document suggests us to run this one by one,but it's too slow. Thanks, Peng Xiao
?????? nodetool cleanup in parallel
Thanks Kurt. -- -- ??: "kurt";<k...@instaclustr.com>; : 2017??9??27??(??) 11:57 ??: "User"<user@cassandra.apache.org>; : Re: nodetool cleanup in parallel correct. you can run it in parallel across many nodes if you have capacity. generally see about a 10% CPU increase from cleanups which isn't a big deal if you have the capacity to handle it + the io. on that note on later versions you can specify -j to run multiple cleanup compactions at the same time on a single node, and also increase compaction throughput to speed the process up. On 27 Sep. 2017 13:20, "Peng Xiao" <2535...@qq.com> wrote: hi, nodetool cleanup will only remove those keys which no longer belong to those nodes,than theoretically we can run nodetool cleanup in parallel,right?the document suggests us to run this one by one,but it's too slow. Thanks, Peng Xiao
split one DC from a cluster
Hi, We want to split one DC from a cluster and make this DC a new cluster(rename this DC to a new cluster name). Could you please advise? Thanks, Peng Xiao
can repair and bootstrap run simultaneously
Hi there, Can we add a new node (bootstrap) and run repair on another DC in the cluster or even run repair in the same DC? Thanks, Peng Xiao
best practice for repair
Hi there, we need to repair a huge CF,just want to clarify 1.nodetool repair -pr keyspace cf 2.nodetool repair -st -et -dc which will be better? or any other advice? Thanks, Peng Xiao
Re: best practice for repair
sub-range repair is much like primary range repair, except that each sub-range repair operation focuses even smaller subset of data. repair is a tough process.any advise? Thanks -- Original -- From: "";<2535...@qq.com>; Date: Mon, Nov 13, 2017 06:51 PM To: "user"<user@cassandra.apache.org>; Subject: best practice for repair Hi there, we need to repair a huge CF,just want to clarify 1.nodetool repair -pr keyspace cf 2.nodetool repair -st -et -dc which will be better? or any other advice? Thanks, Peng Xiao
consistency against rebuild a new DC
Hi there, We know that we need to run repair regularly to make data consistency,suppose we have DC1 & DC2, if we add a new DC3 and rebuild from DC1,can we suppose the DC3 is consistency with DC1 at least at the time when DC3 is rebuild successfully? Thanks, Peng Xiao,
rebuild in the new DC always failed
Hi there, We need to rebuild a new DC,but the stream is always failed with the following errors. we are using C* 2.1.18.Could anyone please advise? error: null -- StackTrace -- java.io.EOFException at java.io.DataInputStream.readByte(DataInputStream.java:267) at sun.rmi.transport.StreamRemoteCall.executeCall(StreamRemoteCall.java:222) at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:161) at com.sun.jmx.remote.internal.PRef.invoke(Unknown Source) at javax.management.remote.rmi.RMIConnectionImpl_Stub.invoke(Unknown Source) at javax.management.remote.rmi.RMIConnector$RemoteMBeanServerConnection.invoke(RMIConnector.java:1020) at javax.management.MBeanServerInvocationHandler.invoke(MBeanServerInvocationHandler.java:298) at com.sun.proxy.$Proxy8.rebuild(Unknown Source) at org.apache.cassandra.tools.NodeProbe.rebuild(NodeProbe.java:1057) at org.apache.cassandra.tools.NodeTool$Rebuild.execute(NodeTool.java:1825) at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:292) at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:206) in the source side,the error is: ERROR [GossipStage:217] 2017-12-17 00:28:02,030 CassandraDaemon.java:231 - Exception in thread Thread[GossipStage:217,5,main] java.lang.NullPointerException: null Thanks
???????????? rebuild in the new DC always failed
Jeff?? it's already start sending files??and if i only rebuild one node??it's ok??more than one node??it will fail. thanks -- -- ??: Jeff Jirsa <jji...@gmail.com> : 2017??12??17?? 09:32 ??: user <user@cassandra.apache.org> : rebuild in the new DC always failed That NPE in gossip is probably a missing endpointstate, which is probably fixed by the gossip patch (#13700) in 2.1.19 I??m not sure it??s related to the rebuild failing - I??m not actually sure if that??s a server failure or jmx timing out. Do you see logs on the servers indicating that it was ever invoked? Did it calculate a streaming plan? Did it start sending files? -- Jeff Jirsa On Dec 16, 2017, at 4:56 PM, Peng Xiao <2535...@qq.com> wrote: Hi Jeff, This is the only informaiton we found from system.log <2e06f...@9f7caf6b.d1c0355a.jpg> Thanks -- -- ??: "Jeff Jirsa";<jji...@gmail.com>; : 2017??12??17??(??) 8:13 ??: "user"<user@cassandra.apache.org>; : Re: rebuild in the new DC always failed What??s the rest of the stack beneath the null pointer exception? -- Jeff Jirsa > On Dec 16, 2017, at 4:11 PM, Peng Xiao <2535...@qq.com> wrote: > > Hi there, > > We need to rebuild a new DC,but the stream is always failed with the > following errors. > we are using C* 2.1.18.Could anyone please advise? > > error: null > -- StackTrace -- > java.io.EOFException > at java.io.DataInputStream.readByte(DataInputStream.java:267) > at > sun.rmi.transport.StreamRemoteCall.executeCall(StreamRemoteCall.java:222) > at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:161) > at com.sun.jmx.remote.internal.PRef.invoke(Unknown Source) > at javax.management.remote.rmi.RMIConnectionImpl_Stub.invoke(Unknown > Source) > at > javax.management.remote.rmi.RMIConnector$RemoteMBeanServerConnection.invoke(RMIConnector.java:1020) > at > javax.management.MBeanServerInvocationHandler.invoke(MBeanServerInvocationHandler.java:298) > at com.sun.proxy.$Proxy8.rebuild(Unknown Source) > at org.apache.cassandra.tools.NodeProbe.rebuild(NodeProbe.java:1057) > at > org.apache.cassandra.tools.NodeTool$Rebuild.execute(NodeTool.java:1825) > at > org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:292) > at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:206) > > in the source side,the error is: > > ERROR [GossipStage:217] 2017-12-17 00:28:02,030 CassandraDaemon.java:231 - > Exception in thread Thread[GossipStage:217,5,main] > java.lang.NullPointerException: null > > Thanks - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
nodetool rebuild data size
Hi there, if we have a Cassandra DC1 with data size 60T,RF=3,then we rebuild a new DC2(RF=3),how much data will stream to DC2?20T or 60T? Thanks, Peng Xiao
?????? rebuild in the new DC always failed
Hi Jeff, This is the only informaiton we found from system.log Thanks -- -- ??: "Jeff Jirsa";<jji...@gmail.com>; : 2017??12??17??(??) 8:13 ??: "user"<user@cassandra.apache.org>; : Re: rebuild in the new DC always failed What??s the rest of the stack beneath the null pointer exception? -- Jeff Jirsa > On Dec 16, 2017, at 4:11 PM, Peng Xiao <2535...@qq.com> wrote: > > Hi there, > > We need to rebuild a new DC,but the stream is always failed with the > following errors. > we are using C* 2.1.18.Could anyone please advise? > > error: null > -- StackTrace -- > java.io.EOFException > at java.io.DataInputStream.readByte(DataInputStream.java:267) > at > sun.rmi.transport.StreamRemoteCall.executeCall(StreamRemoteCall.java:222) > at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:161) > at com.sun.jmx.remote.internal.PRef.invoke(Unknown Source) > at javax.management.remote.rmi.RMIConnectionImpl_Stub.invoke(Unknown > Source) > at > javax.management.remote.rmi.RMIConnector$RemoteMBeanServerConnection.invoke(RMIConnector.java:1020) > at > javax.management.MBeanServerInvocationHandler.invoke(MBeanServerInvocationHandler.java:298) > at com.sun.proxy.$Proxy8.rebuild(Unknown Source) > at org.apache.cassandra.tools.NodeProbe.rebuild(NodeProbe.java:1057) > at > org.apache.cassandra.tools.NodeTool$Rebuild.execute(NodeTool.java:1825) > at > org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:292) > at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:206) > > in the source side,the error is: > > ERROR [GossipStage:217] 2017-12-17 00:28:02,030 CassandraDaemon.java:231 - > Exception in thread Thread[GossipStage:217,5,main] > java.lang.NullPointerException: null > > Thanks - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
decommissioned node still in gossip
Dear All, We have decommisioned a DC,but from system.log,it'still gossiping INFO [GossipStage:1] 2017-11-01 17:21:36,310 Gossiper.java:1008 - InetAddress /x.x.x.x is now DOWN Could you please advise? Thanks, Peng Xiao
?????? decommissioned node still in gossip
Thanks Kurt. -- -- ??: "kurt";<k...@instaclustr.com>; : 2017??11??1??(??) 7:22 ??: "User"<user@cassandra.apache.org>; : Re: decommissioned node still in gossip It will likely hang around in gossip for 3-15 days but then should disappear. As long as it's not showing up in the cluster it should be OK. On 1 Nov. 2017 20:25, "Peng Xiao" <2535...@qq.com> wrote: Dear All, We have decommisioned a DC,but from system.log,it'still gossiping INFO [GossipStage:1] 2017-11-01 17:21:36,310 Gossiper.java:1008 - InetAddress /x.x.x.x is now DOWN Could you please advise? Thanks, Peng Xiao
cassandra gc issue
All, We noticed that the response time jumps very high sometime. The following is from the cassandra gc log. [Eden: 760.0M(760.0M)->0.0B(11.2G) Survivors: 264.0M->96.0M Heap: 7657.7M(20.0G)->6893.3M(20.0G)] Heap after GC invocations=43481 (full 0): garbage-first heap total 20971520K, used 7058765K [0x0002c000, 0x0002c0805000, 0x0007c000) region size 8192K, 12 young (98304K), 12 survivors (98304K) Metaspace used 37426K, capacity 37810K, committed 38144K, reserved 1083392K class spaceused 3930K, capacity 4025K, committed 4096K, reserved 1048576K Could anyone please advise? Thanks, Peng Xiao
?????? run Cassandra on physical machine
Thanks All. -- -- ??: "Jeff Jirsa";<jji...@gmail.com>; : 2017??12??8??(??) 3:19 ??: "cassandra"<user@cassandra.apache.org>; : Re: run Cassandra on physical machine Which is to say, right now you can't run them on different ports, but you can run them on different IPs on the same machine (and different IPs dont need different physical NICs, you can bind multiple IPs to a given physical NIC). On Thu, Dec 7, 2017 at 10:54 AM, Dikang Gu <dikan...@gmail.com> wrote: @Peng, how many network interfaces do you have on your machine? If you just have one NIC, you probably need to wait this storage port patch. https://issues.apache.org/jira/browse/CASSANDRA-7544 . On Thu, Dec 7, 2017 at 7:01 AM, Oliver Ruebenacker <cur...@gmail.com> wrote: Hello, Yes, you can. Best, Oliver On Thu, Dec 7, 2017 at 7:12 AM, Peng Xiao <2535...@qq.com> wrote: Dear All, Can we run Cassandra on physical machine directly? we all know that vm can reduce the performance.For instance,we have a machine with 56 core,8 ssd disks. Can we run 8 cassandra instance in the same machine within one rack with different port? Could anyone please advise? Thanks, Peng Xiao -- Oliver Ruebenacker Senior Software Engineer, Diabetes Portal, Broad Institute -- Dikang
Re: update a record which does not exists
After test,it do will insert a new record. -- Original -- From: "";<2535...@qq.com>; Date: Mon, Dec 4, 2017 11:13 AM To: "user"<user@cassandra.apache.org>; Subject: update a record which does not exists Dear All,If we update a record which actually does not exist in Cassandra,will it generate a new record or exit? UPDATE columnfamily SET data = 'test data' WHERE key = 'row1'; as in CQL Update and insert are semantically the same.Could anyone please advise? Thanks, Peng Xiao
update a record which does not exists
Dear All,If we update a record which actually does not exist in Cassandra,will it generate a new record or exit? UPDATE columnfamily SET data = 'test data' WHERE key = 'row1'; as in CQL Update and insert are semantically the same.Could anyone please advise? Thanks, Peng Xiao
run Cassandra on physical machine
Dear All, Can we run Cassandra on physical machine directly? we all know that vm can reduce the performance.For instance,we have a machine with 56 core,8 ssd disks. Can we run 8 cassandra instance in the same machine within one rack with different port? Could anyone please advise? Thanks, Peng Xiao
?????? rebuild stream issue
Then,how can we restore the rebuild?we are using C* 3.11.0 Can we just delete the data files and rerun the rebuild?looks there will be some errors. -- -- ??: "Jeff Jirsa";<jji...@gmail.com>; : 2017??12??11??(??) 1:07 ??: "user"<user@cassandra.apache.org>; : Re: rebuild stream issue The streams fail, the rebuild times out if you??ve set a timeout. Or you??ll need to restart the nodes if you didn??t set a streaming timeout. -- Jeff Jirsa > On Dec 10, 2017, at 9:05 PM, Peng Xiao <2535...@qq.com> wrote: > > Dear All, > > We are rebuilding a new DC,if one of the source node was restarted,what will > happed with the rebuild? > > Thanks, > Peng Xiao - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
rebuild stream issue
Dear All, We are rebuilding a new DC,if one of the source node was restarted,what will happed with the rebuild? Thanks, Peng Xiao
?????? ?????? rebuild stream issue
Thanks Jeff,we will have a try. -- -- ??: "Jeff Jirsa";<jji...@gmail.com>; : 2017??12??11??(??) 1:20 ??: "user"<user@cassandra.apache.org>; : Re: ?? rebuild stream issue Just restart the rebuild - it??ll stream some duplicate data but it??ll compact away when it??s done You can also use subrange repair instead of rebuild if you??re short on disk space -- Jeff Jirsa On Dec 10, 2017, at 9:14 PM, Peng Xiao <2535...@qq.com> wrote: Then,how can we restore the rebuild?we are using C* 3.11.0 Can we just delete the data files and rerun the rebuild?looks there will be some errors. -- -- ??: "Jeff Jirsa";<jji...@gmail.com>; : 2017??12??11??(??) 1:07 ??: "user"<user@cassandra.apache.org>; : Re: rebuild stream issue The streams fail, the rebuild times out if you??ve set a timeout. Or you??ll need to restart the nodes if you didn??t set a streaming timeout. -- Jeff Jirsa > On Dec 10, 2017, at 9:05 PM, Peng Xiao <2535...@qq.com> wrote: > > Dear All, > > We are rebuilding a new DC,if one of the source node was restarted,what will > happed with the rebuild? > > Thanks, > Peng Xiao - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
rt jump during new node bootstrap
Dear All, We are using C* 2.1.18,when we bootstrap a new node,the rt will jump when the new node start up,then it back to normal.Could anyone please advise? Thanks, Peng Xiao
Re: Tuning bootstrap new node
Can we stop the compaction during the new node bootstraping and enable it after the new node joined? Thanks -- Original -- From: "";<2535...@qq.com>; Date: Tue, Oct 31, 2017 07:18 PM To: "user"<user@cassandra.apache.org>; Subject: Tuning bootstrap new node Dear All, Can we make some tuning to make bootstrap new node more quick?We have a three DC cluster(RF=3 in two DCs,RF=1 in another ,48 nodes in the DC with RF=3).As the Cluster is becoming larger and larger,we need to spend more than 24 hours to bootstrap a new node. Could you please advise how to tune this ? Many Thanks, Peng Xiao
?????? Tuning bootstrap new node
We noticed that streaming will take about 10 hours,all the left is compaction,we will improve concurrent_compactors first. Thanks all for your reply. -- -- ??: "Jon Haddad";<j...@jonhaddad.com>; : 2017??11??1??(??) 4:06 ??: "user"<user@cassandra.apache.org>; : Re: Tuning bootstrap new node Of all the settings you could change, why one that??s related to memtables? Streaming doesn??t go through the write path, memtables aren??t involved unless you??re using materialized views or CDC. On Oct 31, 2017, at 11:44 AM, Anubhav Kale <anubhav.k...@microsoft.com.INVALID> wrote: You can change YAML setting of memtable_cleanup_threshold to 0.7 (from the default of 0.3). This will push SSTables to disk less often and will reduce the compaction time. While this won??t change the streaming time, it will reduce the overall time for your node to be healthy. From: Harikrishnan Pillai [mailto:hpil...@walmartlabs.com] Sent: Tuesday, October 31, 2017 11:28 AM To: user@cassandra.apache.org Subject: Re: Re: Tuning bootstrap new node There is no magic in speeding up the node addition other than increasing stream throughput and compaction throughput. it has been noticed that with heavy compactions the latency may go up if the node also start serving data. if you really don't want this node to service traffic till all compactions settle down, you can disable gossip and binary protocol using the nodetool command. This will allow compactions to continue but requires a repair to fix the stale data later. Regards Hari From: Nitan Kainth <nitankai...@gmail.com> Sent: Tuesday, October 31, 2017 5:47 AM To: user@cassandra.apache.org Subject: EXT: Re: Tuning bootstrap new node Do not stop compaction, you will end up with thousands of sstables. You increase stream throughput from default 200 to a heifer value if your network can handle it. Sent from my iPhone On Oct 31, 2017, at 6:35 AM, Peng Xiao <2535...@qq.com> wrote: Can we stop the compaction during the new node bootstraping and enable it after the new node joined? Thanks -- Original -- From: "";<2535...@qq.com>; Date: Tue, Oct 31, 2017 07:18 PM To: "user"<user@cassandra.apache.org>; Subject: Tuning bootstrap new node Dear All, Can we make some tuning to make bootstrap new node more quick?We have a three DC cluster(RF=3 in two DCs,RF=1 in another ,48 nodes in the DC with RF=3).As the Cluster is becoming larger and larger,we need to spend more than 24 hours to bootstrap a new node. Could you please advise how to tune this ? Many Thanks, Peng Xiao
回复: split one DC from a cluster
Thanks Kurt,we may will still use snapshot and sstableloader to split this schema to another cluster. -- 原始邮件 -- 发件人: "kurt";; 发送时间: 2017年10月19日(星期四) 晚上6:11 收件人: "User" ; 主题: Re: split one DC from a cluster Easiest way is to separate them via firewall/network partition so the DC's can't talk to each other, ensure each DC sees the other DC as DOWN, then remove the other DC from replication, then remove all the nodes in the opposite DC using removenode.
how to identify the root cause of cassandra hang
Hi, We have a cluster with 48 nodes configured with RACK,sometimes it's hang for even 2 minutes.the response time jump from 300ms to 15s. Could anyone please advise how to identified the root cause ? The following is from the system log INFO [Service Thread] 2017-10-26 21:45:46,796 GCInspector.java:258 - G1 Young Generation GC in 222ms. G1 Eden Space: 939524096 -> 0; G1 Old Gen: 6652738584 -> 6662878232; G1 Survivor Space: 134217728 -> 109051904; INFO [Service Thread] 2017-10-26 21:45:46,796 StatusLogger.java:51 - Pool Name Active Pending Completed Blocked All Time Blocked INFO [Service Thread] 2017-10-26 21:45:46,796 StatusLogger.java:66 - MutationStage 0 3 3612475121 0 0 INFO [Service Thread] 2017-10-26 21:45:46,796 StatusLogger.java:66 - RequestResponseStage 0 0 6333593550 0 0 INFO [Service Thread] 2017-10-26 21:45:46,796 StatusLogger.java:66 - ReadRepairStage 0 02773154 0 0 INFO [Service Thread] 2017-10-26 21:45:46,796 StatusLogger.java:66 - CounterMutationStage 0 0 0 0 0 INFO [Service Thread] 2017-10-26 21:45:46,796 StatusLogger.java:66 - ReadStage 0 4 417419357 0 0 Thanks.
Re: run cleanup and rebuild simultaneously
Thanks Jeff -- Original -- From: Jeff Jirsa <jji...@gmail.com> Date: ,12?? 23,2017 09:28 To: user <user@cassandra.apache.org> Subject: Re: run cleanup and rebuild simultaneously Should be fine, though it will increase disk usage in dc1 for a while - a reference to the the cleaned up sstables will be held by the rebuild streams, causing you to temporarily increase disk usage until rebuild finishes streaming -- Jeff Jirsa On Dec 22, 2017, at 4:30 PM, Peng Xiao <2535...@qq.com> wrote: Hi there,Can we run nodetool cleanup in DC1,and run rebuild in DC2 against DC1 simultaneously? in C* 2.1.18 Thanks, Peng Xiao
Gossip stage pending tasks cause application rt jump
Hi, We noticed that there will be the following warnings from the system.log and it will cause C* very slow. WARN [GossipTasks:1] 2017-12-23 16:52:17,818 Gossiper.java:748 - Gossip stage has 4 pending tasks; skipping status check (no nodes will be marked down) Could anyone please advise? Thanks, Peng Xiao
how to check C* partition size
Hi guys, Could anyone please help on this simple question? How to check C* partition size and related information. looks nodetool ring only shows the token distribution. Thanks
?????? secondary index creation causes C* oom
Thanks Kurt. -- -- ??: "kurt";; : 2018??1??11??(??) 11:46 ??: "User" ; : Re: secondary index creation causes C* oom 1.not sure if secondary index creation is the same as index rebuild Fairly sure they are the same. 2.we noticed that the memory table flush looks still working,not the same as CASSANDRA-12796 mentioned,but the compactionExecutor pending is increasing. Do you by chance have concurrent_compactors only set to 2? Seems like your index builds are blocking other compactions from taking place. Seems that maybe postflush is backed up because it's blocked on writes generated from the rebuild? Maybe, anyway. 3.I'm wondering if the block only blocks the specified table which is creating secondary index? If all your flush writers/post flushers are blocked I assume no other flushes will be able to take place, regardless of table. Seems like CASSANDRA-12796 is related but not sure why it didn't get fixed in 2.1.
C* keyspace layout
Hi there, We plan to set keyspace1 in DC1 and DC2,keyspace2 in DC3 and DC4,all still in the same cluster,to avoid the interrupt.Is there any potential risk for this architecture? Thanks, Peng Xiao
secondary index creation causes C* oom
Dear All, We met some C* nodes oom during secondary index creation with C* 2.1.18. As per https://issues.apache.org/jira/browse/CASSANDRA-12796,the flush writer will be blocked by index rebuild.but we still have some confusions: 1.not sure if secondary index creation is the same as index rebuild 2.we noticed that the memory table flush looks still working,not the same as CASSANDRA-12796 mentioned,but the compactionExecutor pending is increasing. 3.I'm wondering if the block only blocks the specified table which is creating secondary index? Could anyone please explain? Thanks
does copy command will clear all the old data?
Dear All, I'm trying to import csv file to a table with copy command?The question is: will the copy command clear all the old data in this table?as we only want to append the csv file to this table Thanks
?????? does copy command will clear all the old data?
Thanks Nandan for the confirmation.I also did the test. -- -- ??: "@Nandan@"<nandanpriyadarshi...@gmail.com>; : 2018??2??13??(??) 12:52 ??: "user"<user@cassandra.apache.org>; : Re: does copy command will clear all the old data? Hi Peng,Copy command will append upsert all data [based on size] to your existing Cassandra table. Just for testing, I executed before for 50 data by using COPY command into small cql table and it work very fine. Point to make sure :- Please check your primary key before play with COPY command. Thanks On Tue, Feb 13, 2018 at 12:49 PM, Peng Xiao <2535...@qq.com> wrote: Dear All, I'm trying to import csv file to a table with copy command?The question is: will the copy command clear all the old data in this table?as we only want to append the csv file to this table Thanks
run cleanup and rebuild simultaneously
Hi there,Can we run nodetool cleanup in DC1,and run rebuild in DC2 against DC1 simultaneously? in C* 2.1.18 Thanks, Peng Xiao
Re: auto_bootstrap for seed node
We followed this https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_add_dc_to_cluster_t.html, but it does not mention that change bootstrap for seed nodes after the rebuild. Thanks, Peng Xiao -- Original -- From: "Ali Hubail"<ali.hub...@petrolink.com>; Date: Wed, Mar 28, 2018 10:48 AM To: "user"<user@cassandra.apache.org>; Subject: Re: auto_bootstrap for seed node You might want to follow DataStax docs on this one: For adding a DC to an existing cluster: https://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/operations/opsAddDCToCluster.html For adding a new node to an existing cluster: https://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/operations/opsAddNodeToCluster.html briefly speaking, adding one node to an existing cluster --> use auto_bootstrap adding a DC to an existing cluster --> rebuild You need to check the version of c* that you're running, and make sure you pick the right doc version for that. Most of my colleagues miss very important steps while adding/removing nodes/cluster, but if they stick to the docs, they always get it done right. Hope this helps Ali Hubail Confidentiality warning: This message and any attachments are intended only for the persons to whom this message is addressed, are confidential, and may be privileged. If you are not the intended recipient, you are hereby notified that any review, retransmission, conversion to hard copy, copying, modification, circulation or other use of this message and any attachments is strictly prohibited. If you receive this message in error, please notify the sender immediately by return email, and delete this message and any attachments from your system. Petrolink International Limited its subsidiaries, holding companies and affiliates disclaims all responsibility from and accepts no liability whatsoever for the consequences of any unauthorized person acting, or refraining from acting, on any information contained in this message. For security purposes, staff training, to assist in resolving complaints and to improve our customer service, email communications may be monitored and telephone calls may be recorded. "Peng Xiao" <2535...@qq.com> 03/27/2018 09:39 PM Please respond to user@cassandra.apache.org To "user" <user@cassandra.apache.org>, cc Subject auto_bootstrap for seed node Dear All, For adding a new DC ,we need to set auto_bootstrap: false and then run the rebuild,finally we need to change auto_bootstrap: true,but for seed nodes,it seems that we still need to keep bootstrap false? Could anyone please confirm? Thanks, Peng Xiao
auto_bootstrap for seed node
Dear All, For adding a new DC ,we need to set auto_bootstrap: false and then run the rebuild,finally we need to change auto_bootstrap: true,but for seed nodes,it seems that we still need to keep bootstrap false? Could anyone please confirm? Thanks, Peng Xiao
replace dead node vs remove node
Dear All, when one node failure with hardware errors,it will be in DN status in the cluster.Then if we are not able to handle this error in three hours(max hints window),we will loss data,right?we have to run repair to keep the consistency. And as per https://blog.alteroot.org/articles/2014-03-12/replace-a-dead-node-in-cassandra.html,we can replace this dead node,is it the same as bootstrap new node?that means we don't need to remove node and rejoin? Could anyone please advise? Thanks, Peng Xiao
disable compaction in bootstrap process
Dear All, We noticed that when bootstrap new node,the source node is also quite busy doing compactions which impact the rt severely.Is it reasonable to disable compaction on all the source node? Thanks, Peng Xiao
?????? replace dead node vs remove node
Hi Anthony, there is a problem with replacing dead node as per the blog,if the replacement process takes longer than max_hint_window_in_ms,we must run repair to make the replaced node consistent again, since it missed ongoing writes during bootstrapping.but for a great cluster,repair is a painful process. Thanks, Peng Xiao -- -- ??: "Anthony Grasso"<anthony.gra...@gmail.com>; : 2018??3??22??(??) 7:13 ??: "user"<user@cassandra.apache.org>; : Re: replace dead node vs remove node Hi Peng, Depending on the hardware failure you can do one of two things: 1. If the disks are intact and uncorrupted you could just use the disks with the current data on them in the new node. Even if the IP address changes for the new node that is fine. In that case all you need to do is run repair on the new node. The repair will fix any writes the node missed while it was down. This process is similar to the scenario in this blog post: http://thelastpickle.com/blog/2018/02/21/replace-node-without-bootstrapping.html 2. If the disks are inaccessible or corrupted, then use the method as described in the blogpost you linked to. The operation is similar to bootstrapping a new node. There is no need to perform any other remove or join operation on the failed or new nodes. As per the blog post, you definitely want to run repair on the new node as soon as it joins the cluster. In this case here, the data on the failed node is effectively lost and replaced with data from other nodes in the cluster. Hope this helps. Regards, Anthony On Thu, 22 Mar 2018 at 20:52, Peng Xiao <2535...@qq.com> wrote: Dear All, when one node failure with hardware errors,it will be in DN status in the cluster.Then if we are not able to handle this error in three hours(max hints window),we will loss data,right?we have to run repair to keep the consistency. And as per https://blog.alteroot.org/articles/2014-03-12/replace-a-dead-node-in-cassandra.html,we can replace this dead node,is it the same as bootstrap new node?that means we don't need to remove node and rejoin? Could anyone please advise? Thanks, Peng Xiao
?????? disable compaction in bootstrap process
Sorry Alain,maybe some misunderstanding here,I mean to disable Compaction in the bootstrapping process,then enable it after the bootstrapping. -- -- ??: ""<2535...@qq.com>; : 2018??3??23??(??) 10:54 ??: "user"<user@cassandra.apache.org>; : ?? disable compaction in bootstrap process Thanks Alain.We are using C* 2.1.18,7core/30G/1.5T ssd,as the cluster is growing too fast,we are painful in bootstrap/rebuild/remove node. Thanks, Peng Xiao -- -- ??: "Alain RODRIGUEZ"<arodr...@gmail.com>; : 2018??3??22??(??) 7:31 ??: "user cassandra.apache.org"<user@cassandra.apache.org>; : Re: disable compaction in bootstrap process Hello, Is it reasonable to disable compaction on all the source node? I would say no, as a short answer. You can, I did it for some operations in the past. Technically no problem you can do that. It will most likely improve the response time of the queries immediately as it seems that in your cluster compactions are impacting the transactions. That being said, the impact in the middle/long term will be substantially worst. Compactions allow fragments of rows to be merged so the reads can be more efficient, hitting the disk just once ideally (at least to reach a reasonably low number of hits on the disk). Also, when enabling compactions back you might have troubles again as compaction will have to catch up. Imho, disabling compaction should be an action to take unless your understanding about compaction is good enough and you are in a very specific case that requires it. In any case, I would recommend you to stay away from using this solution as a quick workaround. It could lead to really wrong situations. Without mentioning tombstones that would stack there. Plus, doing this on all the nodes at once is really calling for troubles as all the nodes performances might degrade at the same pace, roughly. I would suggest a troubleshooting on why compactions are actually impacting the read/write performances. We probably can help with this here as I believe all the Cassandra users had to deal with this at some point (at least people running with 'limited' hardware compared to the needs). Here are some questions that I believe might be useful for us to help you or even for you to troubleshoot. - Is Cassandra limiting thing or resources reaching a limit? - Is the cluster CPU or Disk bounded? - What are the number of concurrent compactors and compaction speed in use? - What hardware are you relying on? - What version are you using? - Is compaction keeping up? What compactions strategy are you using? - 'nodetool tpstats' might also give information on pending and dropped tasks. It might be useful. C*heers, --- Alain Rodriguez - @arodream - al...@thelastpickle.com France / Spain The Last Pickle - Apache Cassandra Consulting http://www.thelastpickle.com 2018-03-22 9:09 GMT+00:00 Peng Xiao <2535...@qq.com>: Dear All, We noticed that when bootstrap new node,the source node is also quite busy doing compactions which impact the rt severely.Is it reasonable to disable compaction on all the source node? Thanks, Peng Xiao
?????? disable compaction in bootstrap process
Thanks Alain.We are using C* 2.1.18,7core/30G/1.5T ssd,as the cluster is growing too fast,we are painful in bootstrap/rebuild/remove node. Thanks, Peng Xiao -- -- ??: "Alain RODRIGUEZ"<arodr...@gmail.com>; : 2018??3??22??(??) 7:31 ??: "user cassandra.apache.org"<user@cassandra.apache.org>; : Re: disable compaction in bootstrap process Hello, Is it reasonable to disable compaction on all the source node? I would say no, as a short answer. You can, I did it for some operations in the past. Technically no problem you can do that. It will most likely improve the response time of the queries immediately as it seems that in your cluster compactions are impacting the transactions. That being said, the impact in the middle/long term will be substantially worst. Compactions allow fragments of rows to be merged so the reads can be more efficient, hitting the disk just once ideally (at least to reach a reasonably low number of hits on the disk). Also, when enabling compactions back you might have troubles again as compaction will have to catch up. Imho, disabling compaction should be an action to take unless your understanding about compaction is good enough and you are in a very specific case that requires it. In any case, I would recommend you to stay away from using this solution as a quick workaround. It could lead to really wrong situations. Without mentioning tombstones that would stack there. Plus, doing this on all the nodes at once is really calling for troubles as all the nodes performances might degrade at the same pace, roughly. I would suggest a troubleshooting on why compactions are actually impacting the read/write performances. We probably can help with this here as I believe all the Cassandra users had to deal with this at some point (at least people running with 'limited' hardware compared to the needs). Here are some questions that I believe might be useful for us to help you or even for you to troubleshoot. - Is Cassandra limiting thing or resources reaching a limit? - Is the cluster CPU or Disk bounded? - What are the number of concurrent compactors and compaction speed in use? - What hardware are you relying on? - What version are you using? - Is compaction keeping up? What compactions strategy are you using? - 'nodetool tpstats' might also give information on pending and dropped tasks. It might be useful. C*heers, --- Alain Rodriguez - @arodream - al...@thelastpickle.com France / Spain The Last Pickle - Apache Cassandra Consulting http://www.thelastpickle.com 2018-03-22 9:09 GMT+00:00 Peng Xiao <2535...@qq.com>: Dear All, We noticed that when bootstrap new node,the source node is also quite busy doing compactions which impact the rt severely.Is it reasonable to disable compaction on all the source node? Thanks, Peng Xiao
?????? disable compaction in bootstrap process
Many thanks Alain for the thorough explanation,we will not disable compaction for now. Thanks, Peng Xiao -- -- ??: "arodrime"<arodr...@gmail.com>; : 2018??3??23??(??) ????8:57 ??: "Peng Xiao"<2535...@qq.com>; : "user"<user@cassandra.apache.org>; : Re: disable compaction in bootstrap process I mean to disable Compaction in the bootstrapping process,then enable it after the bootstrapping. That's how I understood it :-). Bootstrap can take a relatively long time and could affect all the nodes when using vnodes. Disabling compactions for hours is risky, even more, if the cluster is somewhat under pressure already. My point is, it might work for you, but it might also bring a whole lot of other issues, starting with increasing latencies. Plus all this compaction work on hold will have to be performed, at some point, later. You asked if it is 'reasonable', I would say no unless you know for sure the cluster will handle it properly. Here is what I think would be a reasonable approach: Before going for solutions, and especially this solution, it is important to understand the limitations, to find the bottleneck or the root cause of the troubles. In a healthy cluster, a node can handle streaming the data, compacting and answering client requests. Once what is wrong is clear, it will be way easier to think about possible solutions and pick the best one. For now, we are only making guesses. Taking quick actions after making wrong guesses and without fully understanding consequences is where I saw the most damages being done to Cassandra clusters. I did that too, I don't recommend :-). we are painful in bootstrap/rebuild/remove node. As you express the cluster is having troubles with streaming operations ('bootstrap/rebuild/remove node'), you can try reducing the streaming throughput. There is no rush in adding the new nodes as long as the other nodes are healthier meanwhile. Thus reducing the speed will reduce the pressure on the disk (mostly). This change should not harm in any case, just make things slower. This can be a reasonable try (keeping in mind it is a workaround and there is probably an underlying issue if you are using defaults). 'nodetool getstreamthroughput' 'nodetool setstreamthroughput x' Default x is 200 (I believe). If using vnodes, do not be afraid to lower this quite a lot, as all the nodes are probably involved in the streaming process. But again, until we know more about the metrics or the context, we are mostly guessing. With the following information, we could probably help more efficiently: - What are the values of 'concurrent_compactors' and 'compaction_throughput_in_mb' in use? (cassandra.yaml) - Is the cluster CPU or Disk bounded? (system tools htop / charts, etc. What is the cpu load, % of cpu used, some io_wait ?) - Is compaction keeping up? ('nodetool compactionstats -H') - What compactions strategy are you using? (Table definition - ie 'echo "DESCRIBE TABLE keyspace.table;" | grep -i compaction') - 'nodetool tpstats' might also give information on pending and dropped tasks. - 'nodetool cfstats' could help as well In any case, good luck ;-) C*heers, --- Alain Rodriguez - @arodream - al...@thelastpickle.com France / Spain The Last Pickle - Apache Cassandra Consulting http://www.thelastpickle.com 2018-03-23 2:59 GMT+00:00 Peng Xiao <2535...@qq.com>: Sorry Alain,maybe some misunderstanding here,I mean to disable Compaction in the bootstrapping process,then enable it after the bootstrapping. -- -- ??: ""<2535...@qq.com>; : 2018??3??23??(??) 10:54 ??: "user"<user@cassandra.apache.org>; : ?? disable compaction in bootstrap process Thanks Alain.We are using C* 2.1.18,7core/30G/1.5T ssd,as the cluster is growing too fast,we are painful in bootstrap/rebuild/remove node. Thanks, Peng Xiao -- -- ??: "Alain RODRIGUEZ"<arodr...@gmail.com>; : 2018??3??22??(??) 7:31 ??: "user cassandra.apache.org"<user@cassandra.apache.org>; : Re: disable compaction in bootstrap process Hello, Is it reasonable to disable compaction on all the source node? I would say no, as a short answer. You can, I did it for some operations in the past. Technically no problem you can do that. It will most likely improve the response time of the queries immediately as it seems that in your cluster compactions are impacting the transactions. That being said, the impact in the middle/long term will be substantially worst. Compactions allow fragments of rows to be merged so the reads can be more efficient, hitting the disk just once ideally (at least to reach a reasonably low number of h