回复: Data Loss irreparabley so

2017-08-02 Thread Peng Xiao
Due to the tombstone,we have set GC_GRACE_SECONDS to 6 hours.And for a huge 
table with 4T size,repair is a hard thing for us.




-- 原始邮件 --
发件人: "kurt";;
发送时间: 2017年8月3日(星期四) 中午12:08
收件人: "User"; 

主题: Re: Data Loss irreparabley so



You should run repairs every GC_GRACE_SECONDS. If a node is overloaded/goes 
down, you should run repairs. LOCAL_QUORUM will somewhat maintain consistency 
within a DC, but certainly doesn't mean you can get away without running 
repairs. You need to run repairs even if you are using QUORUM or ONE.​

回复: Data Loss irreparabley so

2017-08-02 Thread Peng Xiao
Hi,
We are also experiencing the same issue.we have 3 DCs(DC1 RF=3,DC2 
RF=3,DC3,RF=1),if we use local_quorum,we are not meant to loss any data,right?
if we use local_one, maybe loss data? then we need to run repair regularly?
Could anyone advise?


Thanks






-- 原始邮件 --
发件人: "Jon Haddad";;
发送时间: 2017年7月28日(星期五) 凌晨1:37
收件人: "user"; 

主题: Re: Data Loss irreparabley so



We (The Last Pickle) maintain an open source tool to help manage repairs across 
your clusters called Reaper.  It’s a lot easier to set up and manage than 
trying to manage it through cron.


http://thelastpickle.com/reaper.html

On Jul 27, 2017, at 12:38 AM, Daniel Hölbling-Inzko 
 wrote:

In that vein, Cassandra support Auto compaction and incremental repair. 
Does this mean I have to set up cron jobs on each node to do a nodetool repair 
or is this taken care of by Cassandra anyways?
How often should I run nodetool repair

Greetings Daniel
Jeff Jirsa  schrieb am Do. 27. Juli 2017 um 07:48:


 
 On 2017-07-25 15:49 (-0700), Roger Warner  wrote:
 > This is a quick informational question. I know that Cassandra can detect 
 > failures of nodes and repair them given replication and multiple DC.
 >
 > My question is can Cassandra tell if data was lost after a failure and 
 > node(s) “fixed” and resumed operation?
 >
 
 Sorta concerned by the way you're asking this - Cassandra doesn't "fix" failed 
nodes. It can route requests around a down node, but the "fixing" is entirely 
manual.
 
 If you have a node go down temporarily, and it comes back up (with it's disk 
intact), you can see it "repair" data with a combination of active 
(anti-entropy) repair via nodetool repair, or by watching 'nodetool netstats' 
and see the read repair counters increase over time (which will happen 
naturally as data is requested and mismatches are detected in the data, based 
on your consistency level).
 
 
 
 -
 To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
 For additional commands, e-mail: user-h...@cassandra.apache.org

Cassandra data loss in come DC

2017-08-02 Thread Peng Xiao
Hi there,


We have a three DCs Cluster (two DCs with RF=3,one remote DC with RF=1),we 
currently find that in DC1/DC2 select count(*) from t=1250,while in DC3  select 
count(*) from t=750.
looks some data is missing in DC3(remote DC).there are no node down or anything 
exceptional.
we only upgrade this DC from 2.1.13 to 2.1.18,but this seems won't cause data 
loss.


Could anyone please advise?


Thanks,
Peng

?????? ?????? tolerate how many nodes down in the cluster

2017-07-25 Thread Peng Xiao
Thanks All for your reply.We will begin using RACs in our C* cluster.


Thanks.




--  --
??: "kurt greaves";<k...@instaclustr.com>;
: 2017??7??25??(??) 6:27
??: "User"<user@cassandra.apache.org>; 
"anujw_2...@yahoo.co.in"<anujw_2...@yahoo.co.in>; 
: "Peng Xiao"<2535...@qq.com>; 
: Re: ?? tolerate how many nodes down in the cluster



I've never really understood why Datastax recommends against racks. In those 
docs they make it out to be much more difficult than it actually is to 
configure and manage racks.

The important thing to keep in mind when using racks is that your # of racks 
should be equal to your RF. If you have keyspaces with different RF, then it's 
best to have the same # as the RF of your most important keyspace, but in this 
scenario you lose some of the benefits of using racks.


As Anuj has described, if you use RF # of racks, you can lose up to an entire 
rack without losing availability. Note that this entirely depends on the 
situation. When you take a node down, the other nodes in the cluster require 
capacity to be able to handle the extra load that node is no longer handling. 
What this means is that if your cluster will require the other nodes to store 
hints for that node (equivalent to the amount of writes made to that node), and 
also handle its portion of READs. You can only take out as many nodes from a 
rack as the capacity of your cluster allows.


I also strongly disagree that using racks makes operations tougher. If 
anything, it makes them considerably easier (especially when using vnodes). The 
only difficulty is the initial setup of racks, but for all the possible 
benefits it's certainly worth it. As well as the fact that you can lose up to 
an entire rack (great for AWS AZ's) without affecting availability, using racks 
also makes operations on large clusters much smoother. For example, when 
upgrading a cluster, you can now do it a rack at a time, or some portion of a 
rack at a time. Same for OS upgrades or any other operation that could happen 
in your environment. This is important if you have lots of nodes.  Also it 
makes coordinating repairs easier, as you now only need to repair a single rack 
to ensure you've repaired all the data. Basically any operation/problem where 
you need to consider the distribution of data, racks are going to help you.

回复: 回复: tolerate how many nodes down in the cluster

2017-07-25 Thread Peng Xiao
Thanks for the remind,we will setup a new DC as suggested.




-- 原始邮件 --
发件人: "kurt greaves";;
发送时间: 2017年7月26日(星期三) 上午10:30
收件人: "User"; 
抄送: "anujw_2...@yahoo.co.in"; 
主题: Re: 回复: tolerate how many nodes down in the cluster



Keep in mind that you shouldn't just enable multiple racks on an existing 
cluster (this will lead to massive inconsistencies). The best method is to 
migrate to a new DC as Brooke mentioned.​

回复: tolerate how many nodes down in the cluster

2017-07-24 Thread Peng Xiao
Hi Bhuvan,
From the following link,it doesn't suggest us to use RAC and it looks 
reasonable.
http://www.datastax.com/dev/blog/multi-datacenter-replication



Defining one rack for the entire cluster is the simplest and most common 
implementation. Multiple racks should be avoided for the following reasons:
•   Most users tend to ignore or forget rack requirements that 
state racks should be in an alternating order to allow the data to get 
distributed safely and appropriately.
•   Many users are not using the rack information effectively by 
using a setup with as many racks as they have nodes, or similar non-beneficial 
scenarios.
•   When using racks correctly, each rack should typically have the 
same number of nodes.
•   In a scenario that requires a cluster expansion while using 
racks, the expansion procedure can be tedious since it typically involves 
several node moves and has has to ensure to ensure that racks will be 
distributing data correctly and evenly. At times when clusters need immediate 
expansion, racks should be the last things to worry about.












-- 原始邮件 --
发件人: "Bhuvan Rawal";<bhu1ra...@gmail.com>;
发送时间: 2017年7月24日(星期一) 晚上7:17
收件人: "user"<user@cassandra.apache.org>; 

主题: Re: tolerate how many nodes down in the cluster



Hi Peng ,

This really depends on how you have configured your topology. Say if you have 
segregated your dc into 3 racks with 10 servers each. With RF of 3 you can 
safely assume your data to be available if one rack goes down. 


But if different servers amongst the racks fail then i guess you are not 
guaranteeing data integrity with RF of 3 in that case you can at max lose 2 
servers to be available. Best idea would be to plan failover modes 
appropriately and letting cassandra know of the same.


Regards,
Bhuvan


On Mon, Jul 24, 2017 at 3:28 PM, Peng Xiao <2535...@qq.com> wrote:
Hi,


Suppose we have a 30 nodes cluster in one DC with RF=3,
how many nodes can be down?can we tolerate 10 nodes down?
it seems that we are not able to avoid  the data distribution 3 replicas in the 
10 nodes?,
then we can only tolerate 1 node down even we have 30 nodes?
Could anyone please advise?


Thanks

?????? ?????? ?????? t olerate how many nodes down in the cluster

2017-07-27 Thread Peng Xiao
Thanks all for your thorough explanation.




--  --
??: "Anuj Wadehra";<anujw_2...@yahoo.co.in.INVALID>;
: 2017??7??28??(??) 0:49
??: "User cassandra.apache.org"<user@cassandra.apache.org>; "Peng 
Xiao"<2535...@qq.com>; 

: Re: ?? ?? t olerate how many nodes down in the cluster



Hi Peng, 

Racks can be logical (as defined with RAC attribute in Cassandra configuration 
files) or physical (racks in server rooms).  



In my view, for leveraging racks in your case, its important to understand the 
implication of following decisions:


1. Number of distinct logical RACs defined in Cassandra:If you want to leverage 
RACs optimally for operational efficiencies (like Brooke explained), you need 
to make sure that logical RACs are ALWAYS equal to RF irrespective of the fact 
whether physical Racks are equal to or greater than RF.  


Keeping logical Racks=RF, ensures that nodes allocated to a logical rack have 
exactly 1 replicas of the entire 100% data set.  So,  if your have RF=3 and you 
use QUORUM for read/write,  you can bring down ALL nodes allocated to a logical 
rack for maintenance activity and still achieve 100% availability. This makes 
operations faster and cuts down the risk involved. For example, imagine taking 
a Cassandra restart of entire cluster. If one node takes 3 minutes, a rolling 
restart of 30 nodes would take 90 minutes. But, if you use 3 logical RACs with 
RF=3 and assign 10 nodes to each logical RAC, you can restart 10 nodes within a 
RAC simultaneously (of course in off-peak hours so that remaining 20 nodes can 
take the load). Staring Cassandra on all RACs one by one will just take 9 
minutes rather than 90 minutes. If there are any issues during 
restart/maintenance, you can take all the nodes on a Logical RAC down, fix them 
and bring them back without affecting availability






2.Number of physical Racks : As per historical data, there are instances when 
more than one nodes in a physical rack fail together. When you are using VMs, 
there are three levels instead of two. VMs on a single physical machine are 
likely to fail together too due to hardware failure.


Physical Racks > Physical Machines > VMs


Ensure that all VMs on a physical machine map to single logical RAC. If you 
want to afford failure of physical racks in the server room, you also need to 
ensure that all physical servers on a physical rack must map to just one 
logical RAC. This way, you can afford failure of ALL VMs on ALL physical 
machines mapped to a single logical RAC and still be 100% available.

For Example: RF=3 , 6 physical racks, 2 physical servers per physical rack and 
3 VMs per physical server.
Setup would be-

Physical Rack1 = [Physical1 (3 VM) + Physical2 (3 VM) ]= LogicalRAC1
Physical Rack2 = [Physical3 (3 VM) + Physical4 (3 VM) ]= LogicalRAC1


Physical Rack3 = [Physical5 (3 VM) + Physical6 (3 VM) ]= LogicalRAC2
Physical Rack4 = [Physical7 (3 VM) + Physical8 (3 VM) ]= LogicalRAC2


Physical Rack5 = [Physical9 (3 VM) + Physical10 (3 VM) ]= LogicalRAC3
Physical Rack6 = [Physical11 (3 VM) + Physical12 (3 VM) ]= LogicalRAC3


Problem with this approach is scaling. What if you want to add a single 
physical server? If you do that and allocate it to one existing logical RAC, 
your cluster wont be balanced properly because the logical RAC to which the 
server is added will have additional capacity for same data as other two 
logical RACs.To keep your cluster balanced, you need to add at least 3 physical 
servers in 3 different physical Racks and assign each physical server to 
different logical RAC. This is wastage of resources and hard to digest.



If you have physical machines < logical RACs, every physical machine may have 
more than 1 replica. If entire physical machine fails, you will NOT have 100% 
availability as more than 1 replica may be unavailable. Similarly, if you have 
physical racks < logical RACs, every physical rack may have more than 1 
replica. If entire physical rack fails, you will NOT have 100% availability as 
more than 1 replica may be unavailable. 




Coming back to your example: RF=3 per DC (total RF=6), CL=QUORUM, 2 DCs, 6 
physical machines, 8 VMs per physical machine:


My Recommendation :
1. In each DC, assign 3 physical machines in a DC to 3 logical RACs in 
Cassandra configuration .  2 DCs can have same RAC names as RACs are uniquely 
identified with their DC names. So, these are 6 different logical RACs 
(multiple of RF) (i.e. 1 physical machine per logical RAC)





2. Add 6 physical machines (3 physical machines per DC) to scale the cluster 
and assign every machine to different logical RAC within the DC.


This way, even if you have Active-Passive DC setup, you can afford failure of 
any physical machine or physical rack in Active DC and still ensure 100% 
availability. You would also achieve operational benefits explained 

回复: 回复: tolerate how many nodes down in the cluster

2017-07-26 Thread Peng Xiao
as per Brooke suggests,RACs a multipile of RF.
https://www.youtube.com/watch?v=QrP7G1eeQTI


if we have 6 machines with RF=3,then we can set up 6 RACs or setup 3RACs,which 
will be better?
Could you please further advise?


Many thanks




-- 原始邮件 --
发件人: "我自己的邮箱";<2535...@qq.com>;
发送时间: 2017年7月26日(星期三) 晚上7:31
收件人: "user"; 
抄送: "anujw_2...@yahoo.co.in"; 
主题: 回复: 回复: tolerate how many nodes down in the cluster



One more question.why the  # of racks should be equal to RF? 

For example,we have 4 machines,each virtualized to 8 vms ,can we set 4 RACs 
with RF3?I mean one machine one RAC.


Thanks


-- 原始邮件 --
发件人: "我自己的邮箱";<2535...@qq.com>;
发送时间: 2017年7月26日(星期三) 上午10:32
收件人: "user"; 
抄送: "anujw_2...@yahoo.co.in"; 
主题: 回复: 回复: tolerate how many nodes down in the cluster



Thanks for the remind,we will setup a new DC as suggested.




-- 原始邮件 --
发件人: "kurt greaves";;
发送时间: 2017年7月26日(星期三) 上午10:30
收件人: "User"; 
抄送: "anujw_2...@yahoo.co.in"; 
主题: Re: 回复: tolerate how many nodes down in the cluster



Keep in mind that you shouldn't just enable multiple racks on an existing 
cluster (this will lead to massive inconsistencies). The best method is to 
migrate to a new DC as Brooke mentioned.​

回复: 回复: tolerate how many nodes down in the cluster

2017-07-26 Thread Peng Xiao
One more question.why the  # of racks should be equal to RF? 

For example,we have 4 machines,each virtualized to 8 vms ,can we set 4 RACs 
with RF3?I mean one machine one RAC.


Thanks


-- 原始邮件 --
发件人: "我自己的邮箱";<2535...@qq.com>;
发送时间: 2017年7月26日(星期三) 上午10:32
收件人: "user"; 
抄送: "anujw_2...@yahoo.co.in"; 
主题: 回复: 回复: tolerate how many nodes down in the cluster



Thanks for the remind,we will setup a new DC as suggested.




-- 原始邮件 --
发件人: "kurt greaves";;
发送时间: 2017年7月26日(星期三) 上午10:30
收件人: "User"; 
抄送: "anujw_2...@yahoo.co.in"; 
主题: Re: 回复: tolerate how many nodes down in the cluster



Keep in mind that you shouldn't just enable multiple racks on an existing 
cluster (this will lead to massive inconsistencies). The best method is to 
migrate to a new DC as Brooke mentioned.​

Re: Timeout while setting keyspace

2017-07-26 Thread Peng Xiao
https://datastax-oss.atlassian.net/browse/JAVA-1002

This one says it's the driver issue,we will have a try.




-- Original --
From:  "";<2535...@qq.com>;
Date:  Wed, Jul 26, 2017 04:12 PM
To:  "user"; 

Subject:  Timeout while setting keyspace



Dear All,


We are expericencing a strange issue.Currently we have a Cluster with Cassandra 
2.1.13.
when the applications start,it will print the following warings.And it takes 
long time for applications to start.
Could you please advise ?


2017-07-26 15:49:20.676  WARN 11706 --- [-] [cluster1-nio-worker-2] 
com.datastax.driver.core.Connection  : Timeout while setting keyspace on 
Connection[/172.16.42.138:9042-3, inFlight=5, closed=false]. This should not 
happen but is not critical (it will be retried)
2017-07-26 15:49:20.677  WARN 11706 --- [-] [cluster1-nio-worker-3] 
com.datastax.driver.core.Connection  : Timeout while setting keyspace on 
Connection[/xxx.16.42.xxx:9042-3, inFlight=5, closed=false]. This should not 
happen but is not critical (it will be retried)
2017-07-26 15:49:20.676  WARN 11706 --- [-] [cluster1-nio-worker-0] 
com.datastax.driver.core.Connection  : Timeout while setting keyspace on 
Connection[/xxx.16.42.xxx:9042-3, inFlight=5, closed=false]. This should not 
happen but is not critical (it will be retried)
2017-07-26 15:49:20.676  WARN 11706 --- [-] [cluster1-nio-worker-1] 
com.datastax.driver.core.Connection  : Timeout while setting keyspace on 
Connection[/xxx.16.42.xxx:9042-3, inFlight=5, closed=false]. This should not 
happen but is not critical (it will be retried)
2017-07-26 15:49:32.777  WARN 11706 --- [-] [main] 
com.datastax.driver.core.Connection  : Timeout while setting keyspace on 
Connection[/172.16.42.113:9042-3, inFlight=1, closed=false]. This should not 
happen but is not critical (it will be retried)





Thanks

Timeout while setting keyspace

2017-07-26 Thread Peng Xiao
Dear All,


We are expericencing a strange issue.Currently we have a Cluster with Cassandra 
2.1.13.
when the applications start,it will print the following warings.And it takes 
long time for applications to start.
Could you please advise ?


2017-07-26 15:49:20.676  WARN 11706 --- [-] [cluster1-nio-worker-2] 
com.datastax.driver.core.Connection  : Timeout while setting keyspace on 
Connection[/172.16.42.138:9042-3, inFlight=5, closed=false]. This should not 
happen but is not critical (it will be retried)
2017-07-26 15:49:20.677  WARN 11706 --- [-] [cluster1-nio-worker-3] 
com.datastax.driver.core.Connection  : Timeout while setting keyspace on 
Connection[/xxx.16.42.xxx:9042-3, inFlight=5, closed=false]. This should not 
happen but is not critical (it will be retried)
2017-07-26 15:49:20.676  WARN 11706 --- [-] [cluster1-nio-worker-0] 
com.datastax.driver.core.Connection  : Timeout while setting keyspace on 
Connection[/xxx.16.42.xxx:9042-3, inFlight=5, closed=false]. This should not 
happen but is not critical (it will be retried)
2017-07-26 15:49:20.676  WARN 11706 --- [-] [cluster1-nio-worker-1] 
com.datastax.driver.core.Connection  : Timeout while setting keyspace on 
Connection[/xxx.16.42.xxx:9042-3, inFlight=5, closed=false]. This should not 
happen but is not critical (it will be retried)
2017-07-26 15:49:32.777  WARN 11706 --- [-] [main] 
com.datastax.driver.core.Connection  : Timeout while setting keyspace on 
Connection[/172.16.42.113:9042-3, inFlight=1, closed=false]. This should not 
happen but is not critical (it will be retried)





Thanks

?????? ?????? tolerate how many nodes down in the cluster

2017-07-26 Thread Peng Xiao
Kurt/All,


why the  # of racks should be equal to RF?

For example,we have 2 DCs each 6 machines with RF=3,each machine virtualized to 
8 vms ,
can we set 6 racs with RF3? I mean one machine one RAC to avoid hardware errors 
or only set 3 racs,1 rac with 2 machines,which is better?


Thanks








--  --
??: "Anuj Wadehra";<anujw_2...@yahoo.co.in.INVALID>;
: 2017??7??27??(??) 1:41
??: "Brooke Thorley"<bro...@instaclustr.com>; 
"user@cassandra.apache.org"<user@cassandra.apache.org>; 
: "Peng Xiao"<2535...@qq.com>; 
: Re: ?? tolerate how many nodes down in the cluster



 Hi Brooke,


 Very nice presentation: https://www.youtube.com/watch?v=QrP7G1eeQTI !! 
 Good to know that you are able to leverage Racks for gaining operational 
efficiencies. I think vnodes have made life easier. 
 

I still see some concerns with Racks:

 
 1. Usually scaling needs are driven by business requirements. Customers want 
value for every penny they spend. Adding 3 or 5 servers (because you have RF=3 
or 5) instead of 1 server costs them dearly. It's difficult to justify the 
additional cost as fault tolerance can only be improved but not guaranteed with 
racks.

 
2. You need to maintain mappings of Logical Racks (=RF) and physical racks 
(multiple of RFs) for large clusters. 
 
3.  Using racks tightly couples your hardware (rack size, rack count) / 
virtualization decisions (VM Size, VM count per physical node) with application 
RF.
 
Thanks
 Anuj
 
 


On Tuesday, 25 July 2017 3:56 AM, Brooke Thorley <bro...@instaclustr.com> 
wrote:

  

 Hello Peng. 

I think spending the time to set up your nodes into racks is worth it for the 
benefits that it brings. With RF3 and NTS you can tolerate the loss of a whole 
rack of nodes without losing QUORUM as each rack will contain a full set of 
data.  It makes ongoing cluster maintenance easier, as you can perform 
upgrades, repairs and restarts on a whole rack of nodes at once.  Setting up 
racks or adding nodes is not difficult particularly if you are using vnodes.  
You would simply add nodes in multiples of  to keep the racks 
balanced.  This is how we run all our managed clusters and it works very well.


You may be interested to watch my Cassandra Summit presentation from last year 
in which I discussed this very topic: 
https://www.youtube.com/watch?v=QrP7G1eeQTI (from 4:00)



If you were to consider changing your rack topology, I would recommend that you 
do this by DC migration rather than "in place". 



Kind Regards,Brooke Thorley
VP Technical Operations & Customer Services
supp...@instaclustr.com | support.instaclustr.com


Read our latest technical blog posts here.
This email has been sent on behalf of Instaclustr Pty. Limited (Australia) and 
Instaclustr Inc (USA).
This email and any attachments may contain confidential and legally privileged 
information.  If you are not the intended recipient, do not copy or disclose 
its content, but please reply to this email immediately and highlight the error 
to the sender and then immediately delete the message.













 
On 25 July 2017 at 03:06, Anuj Wadehra <anujw_2...@yahoo.co.in.invalid> wrote:
Hi Peng, 

Three things are important when you are evaluating fault tolerance and 
availability for your cluster:


1. RF
2. CL
3. Topology -  how data is replicated in racks. 


If you assume that N  nodes from ANY rack may fail at the same time,  then you 
can afford failure of RF-CL nodes and still be 100% available.  E. g.  If you 
are reading at quorum and RF=3, you can only afford one (3-2) node failure. 
Thus, even if you have a 30 node cluster,  10 node failure can not provide you 
100% availability. RF impacts availability rather than total number of nodes in 
a cluster. 


If you assume that N nodes failing together will ALWAYS be from the same rack,  
you can spread your servers in RF physical racks and use 
NetworkTopologyStrategy. While allocating replicas for any data, Cassandra will 
ensure that 3 replicas are placed in 3 different racks E.g. you can have 10 
nodes in 3 racks and then even a 10 node failure within SAME rack shall ensure 
that you have 100% availability as two replicas are there for 100% data and 
CL=QUORUM can be met. I have not tested this but that how the rack concept is 
expected to work.  I agree, using racks generally makes operations tougher.




Thanks
Anuj




 
   On Mon, 24 Jul 2017 at 20:10, Peng Xiao
<2535...@qq.com> wrote:
 
  Hi Bhuvan,
From the following link,it doesn't suggest us to use RAC and it looks 
reasonable.
http://www.datastax.com/dev/ blog/multi-datacenter- replication



Defining one rack for the entire cluster is the simplest and most common 
implementation. Multiple racks should be avoided for the following reasons:
?6?1Most users tend to ignore or forget rack requirements 

tolerate how many nodes down in the cluster

2017-07-24 Thread Peng Xiao
Hi,


Suppose we have a 30 nodes cluster in one DC with RF=3,
how many nodes can be down?can we tolerate 10 nodes down?
it seems that we are not able to avoid  the data distribution 3 replicas in the 
10 nodes?,
then we can only tolerate 1 node down even we have 30 nodes?
Could anyone please advise?


Thanks

How do you monitoring Cassandra Cluster?

2017-06-28 Thread Peng Xiao
Dear All,


we are currently using Cassandra 2.1.13,and it has grown to 5TB size with 32 
nodes in one DC.
For monitoring,opsCenter does not  send alarm and not free in higher version.so 
we have to use a simple JMX+Zabbix template.And we plan to use 
Jolokia+JMX2Graphite to draw the metrics chart now.


Could you please advise?


Thanks,
Henry

gossip down failure detected

2017-07-06 Thread Peng Xiao
Hi,
We are experiencing the following issue,the rt will fly to 15s sometime.and 
after adjusting the batch size,
it looks better,but still have the following issue.Could any one advise?


INFO  [GossipTasks:1] 2017-07-07 08:56:33,410 Gossiper.java:1009 - InetAddress 
/172.16.xx.39 is now DOWN


on 172.16.xx.39,we can see the following log:


WARN  [SharedPool-Worker-18] 2017-07-07 08:56:12,049 BatchStatement.java:255 - 
Batch of prepared statements for [ecommercedata.ecommerce_baitiao_reco
rd_by_order_no] is of size 9470, exceeding specified threshold of 5120 by 4350.
WARN  [GossipTasks:1] 2017-07-07 08:56:44,052 FailureDetector.java:258 - Not 
marking nodes down due to local pause of 31835321522 > 50
INFO  [ScheduledTasks:1] 2017-07-07 08:56:44,055 MessagingService.java:929 - 
READ messages were dropped in last 5000 ms: 48 for internal timeout and
0 for cross node timeout
INFO  [ScheduledTasks:1] 2017-07-07 08:56:44,055 MessagingService.java:929 - 
RANGE_SLICE messages were dropped in last 5000 ms: 12 for internal timeo
ut and 0 for cross node timeout
INFO  [ScheduledTasks:1] 2017-07-07 08:56:44,055 StatusLogger.java:51 - Pool 
NameActive   Pending  Completed   Blocked  All T
ime Blocked
INFO  [ScheduledTasks:1] 2017-07-07 08:56:44,056 StatusLogger.java:66 - 
MutationStage 7  1380  618231255 0
  0



Thanks

Re: MUTATION messages were dropped in last 5000 ms for cross nodetimeout

2017-08-04 Thread Peng Xiao
hi??


Does message drop mean data loss?

Thanks


-- Original --
From: Akhil Mehra 
Date: ,8?? 4,2017 16:00
To: user 
Subject: Re: MUTATION messages were dropped in last 5000 ms  for cross 
nodetimeout



Glad I could be of help :)

Hopefully the partition size resize goes smoothly.


Regards,
Akhil

On 4/08/2017, at 5:41 AM, ZAIDI, ASAD A  wrote:

Hi Akhil,
 
Thank you for your reply.
 
I kept testing different timeout numbers over last week and eventually settled 
at setting *_request_timeout_in_ms parameters at 1.5minutes for coordinator 
wait time. That is the number where I donot see any dropped mutations. 
 
Also asked developers to tweak data model where we saw bunch of tables with 
really large partition size , some are ranging  Partition-key size around 
~6.6GB.. we??re now working to reduce the partition size of the tables. I am 
hoping corrected data model will help reduce coordinator wait time (get back to 
default number!)  again.
 
Thank again/Asad
 
From: Akhil Mehra [mailto:akhilme...@gmail.com] 
Sent: Friday, July 21, 2017 4:24 PM
To: user@cassandra.apache.org
Subject: Re: MUTATION messages were dropped in last 5000 ms for cross node 
timeout


 
Hi Asad,
 

The 5000 ms is not configurable 
(https://github.com/apache/cassandra/blob/8b3a60b9a7dbefeecc06bace617279612ec7092d/src/java/org/apache/cassandra/net/MessagingService.java#L423).
 This just the time after which the number of dropped messages are reported. 
Thus dropped messages are reported every 5000ms. 

 

If you are looking to tweak the number of ms after which a message is 
considered dropped then you need to use the write_request_timeout_in_ms.  The 
write_request_timeout_in_ms 
(http://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configCassandra_yaml_r.html)
 can be used to increase the mutation timeout. By default it is set to 2000ms.

 

I hope that helps.

 

Regards,

Akhil

 

 

On 22/07/2017, at 2:46 AM, ZAIDI, ASAD A  wrote:

 
Hi Akhil,

 

Thank you for your reply. Previously, I did ??tune?? various timeouts ?C 
basically increased them a bit but none of those parameter listed in the link 
matches with that ??were dropped in last 5000 ms??.

I was wondering from where that [5000ms] number is coming from when,  like I 
mentioned before, none of any timeout parameter settings matches that #!

 

Load is intermittently high but again cpu queue length never goes beyond medium 
depth. I wonder if there is some internal limit that I??m still not aware of.

 

Thanks/Asad

 

 

From: Akhil Mehra [mailto:akhilme...@gmail.com] 
Sent: Thursday, July 20, 2017 3:47 PM
To: user@cassandra.apache.org
Subject: Re: MUTATION messages were dropped in last 5000 ms for cross node 
timeout



 

Hi Asad,


 


http://cassandra.apache.org/doc/latest/faq/index.html#why-message-dropped

 


As mentioned in the link above this is a load shedding mechanism used by 
Cassandra.


 


Is you cluster under heavy load?


 


Regards,


Akhil


 

 

On 21/07/2017, at 3:27 AM, ZAIDI, ASAD A  wrote:


 

Hello Folks ?C


 


I??m using apache-cassandra 2.2.8.


 


I see many messages like below in my system.log file. In Cassandra.yaml file [ 
cross_node_timeout: true] is set and NTP server is also running correcting 
clock drift on 16node cluster. I do not see pending or blocked HintedHandoff  
in tpstats output though there are bunch of MUTATIONS dropped observed.


 





INFO  [ScheduledTasks:1] 2017-07-20 08:02:52,511 MessagingService.java:946 - 
MUTATION messages were dropped in last 5000 ms: 822 for internal timeout and 
2152 for cross node timeout





 


I??m seeking help here if you please let me know what I need to check in order 
to address these cross node timeouts.


 


Thank you,


Asad

optimal value for native_transport_max_threads

2017-08-08 Thread Peng Xiao
Dear All,


any suggestion for optimal value for native_transport_max_threads?
as per 
https://issues.apache.org/jira/browse/CASSANDRA-11363,max_queued_native_transport_requests=4096,how
 about native_transport_max_threads?


Thanks,
Peng Xiao

Re: Row Cache hit issue

2017-09-19 Thread Peng Xiao
And we are using C* 2.1.18.




-- Original --
From:  "";<2535...@qq.com>;
Date:  Wed, Sep 20, 2017 11:27 AM
To:  "user"<user@cassandra.apache.org>;

Subject:  Row Cache hit issue



Dear All,


The default row_cache_save_period=0,looks Row Cache does not work in this 
situation?
but we can still see the row cache hit.


Row Cache  : entries 202787, size 100 MB, capacity 100 MB, 3095293 
hits, 6796801 requests, 0.455 recent hit rate, 0 save period in seconds


Could anyone please explain this?


Thanks,
Peng Xiao

Row Cache hit issue

2017-09-19 Thread Peng Xiao
Dear All,


The default row_cache_save_period=0,looks Row Cache does not work in this 
situation?
but we can still see the row cache hit.


Row Cache  : entries 202787, size 100 MB, capacity 100 MB, 3095293 
hits, 6796801 requests, 0.455 recent hit rate, 0 save period in seconds


Could anyone please explain this?


Thanks,
Peng Xiao

??????RE: Row Cache hit issue

2017-09-19 Thread Peng Xiao
Thanks All.




--  --
??: "Steinmaurer, Thomas";<thomas.steinmau...@dynatrace.com>;
: 2017??9??20??(??) 1:38
??: "user@cassandra.apache.org"<user@cassandra.apache.org>;

: RE: Row Cache hit issue



  
Hi,
 
 
 
additionally, with saved (key) caches, we had some sort of corruption (I think, 
for whatever reason) once. So, if you  see something like that upon Cassandra 
startup:
 
 
 
INFO [main] 2017-01-04 15:38:58,772 AutoSavingCache.java (line 114) reading 
saved cache /var/opt/xxx/cassandra/saved_caches/ks-cf-KeyCache-b.db
 
ERROR [main] 2017-01-04 15:38:58,891 CassandraDaemon.java (line 571) Exception 
encountered during startup
 
java.lang.OutOfMemoryError: Java heap space
 
at java.util.ArrayList.(ArrayList.java:152)
 
at 
org.apache.cassandra.db.RowIndexEntry$Serializer.deserialize(RowIndexEntry.java:132)
 
at 
org.apache.cassandra.service.CacheService$KeyCacheSerializer.deserialize(CacheService.java:365)
 
at 
org.apache.cassandra.cache.AutoSavingCache.loadSaved(AutoSavingCache.java:119)
 
at 
org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:276)
 
at 
org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:435)
 
at 
org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:406)
 
at org.apache.cassandra.db.Keyspace.initCf(Keyspace.java:322)
 
at org.apache.cassandra.db.Keyspace.(Keyspace.java:268)
 
at org.apache.cassandra.db.Keyspace.open(Keyspace.java:110)
 
at org.apache.cassandra.db.Keyspace.open(Keyspace.java:88)
 
at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:364)
 
at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:554)
 
at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:643)
 
 
 
resulting in Cassandra going OOM, with a ??reading saved cache?? log entry 
close before the OOM, you may have hit some sort  of corruption. Workaround is 
to physically delete the saved cache file and Cassandra will start up just fine.
 
 
 
Regards,
 
Thomas
 
 
 
 
 
From: Dikang Gu [mailto:dikan...@gmail.com] 
 Sent: Mittwoch, 20. September 2017 06:06
 To: cassandra <user@cassandra.apache.org>
 Subject: Re: Row Cache hit issue
 
 
  
Hi Peng,
  
 
 
  
C* periodically saves cache to disk, to solve cold start problem. If 
row_cache_save_period=0, it means C* does not save cache to disk. But the cache 
is still working, if it's enabled in table schema, just the cache will be empty 
after restart.
 
  
 
 
  
--Dikang.
 
 
  
 
  
On Tue, Sep 19, 2017 at 8:27 PM, Peng Xiao <2535...@qq.com> wrote:
   
And we are using C* 2.1.18.
 
   
 
 
  
 
 
  
-- Original --
 
   
From:  "";<2535...@qq.com>;
 
  
Date:  Wed, Sep 20, 2017 11:27 AM
 
  
To:  "user"<user@cassandra.apache.org>;
 
  
Subject:  Row Cache hit issue
 
 

 
 
  
Dear All,
 
  
 
 
  
The default row_cache_save_period=0,looks Row Cache does not work in this 
situation?
 
  
but we can still see the row cache hit.
 
  
 
 
  
Row Cache  : entries 202787, size 100 MB, capacity 100 MB, 3095293 
hits, 6796801 requests, 0.455 recent hit rate, 0 save period in seconds
 
  
 
 
  
Could anyone please explain this?
 
  
 
 
  
Thanks,
 
  
Peng Xiao
 
 
 
 
  
 

 
 
  
 
 
 
-- 
   
Dikang
  
 
 
 
 
 
 
 The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received  it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freist?0?1dterstra?0?8e 313

Pending-range-calculator during bootstrapping

2017-09-22 Thread Peng Xiao
Dear All,

when we are bootstrapping a new node,we are experiencing high cpu load which 
affect the rt ,and we noticed that it's mainly costing on 
Pending-range-calculator ,this did not happen before.
We are using C* 2.1.13 in one DC,2.1.18 in another DC.
Could anyone please advise on this?


Thanks,
Peng Xiao

add new nodes in two DCs at the same time

2017-09-22 Thread Peng Xiao
Hi,


as Datastax suggests,we should only bootstrap one new node one time.
but can we add new nodes in two DCs at the same time?


Thanks,
Peng Xiao

Pending-range-calculator during bootstrapping

2017-09-21 Thread Peng Xiao
Dear All,

when we are bootstrapping a new node,we are experiencing high cpu load and this 
affect the rt ,and we noticed that it's mainly costing on 
Pending-range-calculator ,this did not happen before.
We are using C* 2.1.13.
Could anyone please advise on this?


Thanks,
Peng Xiao

network down between DCs

2017-09-21 Thread Peng Xiao
Hi there,


We have two DCs for a Cassandra Cluster,if the network is down less than 3 
hours(default hint window),with my understanding,it will recover 
automatically,right?Do we need to run repair manually?


Thanks,
Peng Xiao

回复:RE: network down between DCs

2017-09-21 Thread Peng Xiao
Thanks Thomas for the reminder,we will watch the system log.




-- 原始邮件 --
发件人: "Steinmaurer, Thomas";<thomas.steinmau...@dynatrace.com>;
发送时间: 2017年9月21日(星期四) 下午5:17
收件人: "user@cassandra.apache.org"<user@cassandra.apache.org>;

主题: RE: network down between DCs



  
Hi,
 
 
 
within the default hint window of 3 hours, the hinted handoff mechanism should 
take care of that, but we have seen that  failing from time to time (depending 
on the load) in 2.1 with some sort of tombstone related issues causing failing 
requests on the system hints table. So, watch out any sign of hinted handoff 
troubles in the Cassandra log.
 
 
 
Hint storage has been re-written in 3.0+ to flat files, thus tombstone related 
troubles in that area should be gone.
 
 
 
Thomas
 
 
   
From: Hannu Kröger [mailto:hkro...@gmail.com] 
 Sent: Donnerstag, 21. September 2017 10:32
 To: Peng Xiao <2535...@qq.com>; user@cassandra.apache.org
 Subject: Re: network down between DCs
 
 
 
 
  
Hi,
 
  
 
 
  
That’s correct.
 
  
 
 
  
You need to run repairs only after a node/DC/connection is down for more then 
max_hint_window_in_ms.
 
  
 
 
  
Cheers,
 
  
Hannu
 
  
 
 
 
 
 
 
 
On 21 September 2017 at 11:30:44, Peng Xiao (2535...@qq.com) wrote:
 
Hi there,
 
  
 
 
  
We have two DCs for a Cassandra Cluster,if the network is down less than 3 
hours(default hint window),with my understanding,it will recover 
automatically,right?Do we need  to run repair manually?
 
  
 
 
  
Thanks,
 
  
Peng Xiao
 
 
 
  
 The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received  it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313

split one keyspace from one cluster to another

2017-10-16 Thread Peng Xiao
Dear All,


We'd like to migrate one keyspace from one cluster to another,the keyspace is 
about 100G.
If we use sstableloader,we have to stop the application during the 
migration.any good idea?


Thanks,
Peng Xiao

cassandra hardware requirements (STAT/SSD)

2017-09-29 Thread Peng Xiao
Hi there,
we are struggling on hardware selection,we all know that ssd is good,and 
Datastax suggests us to use ssd,as Cassandra is a CPU bound db,we are 
considering to use sata disk,we noticed that the normal IO throughput is 7MB/s.


Could anyone give some advice?


Thanks,
Peng Xiao

space left for compaction

2017-09-30 Thread Peng Xiao
Dear All,


As for STCS,datastax suggest us to keep half of the free space for 
compaction,this is not strict,could anyone advise how many space should we left 
for one node?


Thanks,
Peng Xiao

limit the sstable file size

2017-09-29 Thread Peng Xiao
Dear All,


Can we limit the sstable file size?as we have a huge cluster,the sstable file 
is too large for ETL to extract,Could you please advise?


Thanks,
Peng Xiao

?????? limit the sstable file size

2017-09-29 Thread Peng Xiao
Thanks Jeff for the quick reply.




--  --
??: "Jeff Jirsa";<jji...@gmail.com>;
: 2017??9??30??(??) 11:45
??: "cassandra"<user@cassandra.apache.org>;

: Re: limit the sstable file size



There's no way to limit file size in STCS.  If you use LCS, it will default to 
160MB (except in cases where you have a very large partition - in those cases, 
the sstable will scale with your partition size, but you really shouldn't have 
partitions larger than 160MB)



On Fri, Sep 29, 2017 at 8:41 PM, Peng Xiao <2535...@qq.com> wrote:
Dear All,


Can we limit the sstable file size?as we have a huge cluster,the sstable file 
is too large for ETL to extract,Could you please advise?


Thanks,
Peng Xiao

?????? data loss in different DC

2017-09-28 Thread Peng Xiao
Thanks All


--  --
??: "Jeff Jirsa";<jji...@gmail.com>;
: 2017??9??28??(??) 9:16
??: "user"<user@cassandra.apache.org>;

: Re: data loss in different DC



Your quorum writers are only guaranteed to be on half+1 nodes - there??s no 
guarantee which nodes those will be. For strong consistency with multiple DCs, 
You can either: 

- write at quorum and read at quorum from any dc, or
- write each_quorum and read local_quorum from any dc, or
- write at local_quorum and read local_quorum from the same DC only




-- 
Jeff Jirsa


> On Sep 28, 2017, at 2:41 AM, Peng Xiao <2535...@qq.com> wrote:
> 
> Dear All,
> 
> We have a cluster with one DC1:RF=3,another DC DC2:RF=1 only for ETL,but we 
> found that sometimes we can query records in DC1,while not able not find the 
> same record in DC2 with local_quorum.How it happens?
> Could anyone please advise?
> looks we can only run repair to fix it.
> 
> Thanks,
> Peng Xiao

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

data loss in different DC

2017-09-28 Thread Peng Xiao
Dear All,


We have a cluster with one DC1:RF=3,another DC DC2:RF=1,DC2 only for ETL,but we 
found that sometimes we can query records in DC1,while not able not find the 
same record in DC2 with local_quorum.How it happens?looks data loss in DC2.
Could anyone please advise?
looks we can only run repair to fix it.


Thanks,
Peng Xiao

?????? data loss in different DC

2017-09-28 Thread Peng Xiao
even with CL=QUORUM,there is no guarantee to be sure to read the same data in 
DC2,right?
then multi DCs looks make no sense?




--  --
??: "DuyHai Doan";<doanduy...@gmail.com>;
: 2017??9??28??(??) 5:45
??: "user"<user@cassandra.apache.org>;

: Re: data loss in different DC



If you're writing into DC1 with CL = LOCAL_xxx, there is no guarantee to be 
sure to read the same data in DC2. Only repair will help you

On Thu, Sep 28, 2017 at 11:41 AM, Peng Xiao <2535...@qq.com> wrote:
Dear All,


We have a cluster with one DC1:RF=3,another DC DC2:RF=1 only for ETL,but we 
found that sometimes we can query records in DC1,while not able not find the 
same record in DC2 with local_quorum.How it happens?
Could anyone please advise?
looks we can only run repair to fix it.


Thanks,
Peng Xiao

data loss in different DC

2017-09-28 Thread Peng Xiao
Dear All,


We have a cluster with one DC1:RF=3,another DC DC2:RF=1 only for ETL,but we 
found that sometimes we can query records in DC1,while not able not find the 
same record in DC2 with local_quorum.How it happens?
Could anyone please advise?
looks we can only run repair to fix it.


Thanks,
Peng Xiao

Re: data loss in different DC

2017-09-28 Thread Peng Xiao
very sorry for the duplicate mail.




-- Original --
From:  "";<2535...@qq.com>;
Date:  Thu, Sep 28, 2017 07:41 PM
To:  "user"<user@cassandra.apache.org>;

Subject:  data loss in different DC



Dear All,


We have a cluster with one DC1:RF=3,another DC DC2:RF=1,DC2 only for ETL,but we 
found that sometimes we can query records in DC1,while not able not find the 
same record in DC2 with local_quorum.How it happens?looks data loss in DC2.
Could anyone please advise?
looks we can only run repair to fix it.


Thanks,
Peng Xiao

nodetool cleanup in parallel

2017-09-26 Thread Peng Xiao
hi,


nodetool cleanup will only remove those keys which no longer belong to those 
nodes,than theoretically we can run nodetool cleanup in parallel,right?the 
document suggests us to run this one by one,but it's too slow.


Thanks,
Peng Xiao

?????? nodetool cleanup in parallel

2017-09-26 Thread Peng Xiao
Thanks Kurt.




--  --
??: "kurt";<k...@instaclustr.com>;
: 2017??9??27??(??) 11:57
??: "User"<user@cassandra.apache.org>;

: Re: nodetool cleanup in parallel



correct. you can run it in parallel across many nodes if you have capacity. 
generally see about a 10% CPU increase from cleanups which isn't a big deal if 
you have the capacity to handle it + the io.

on that note on later versions you can specify -j  to run multiple 
cleanup compactions at the same time on a single node, and also increase 
compaction throughput to speed the process up.


On 27 Sep. 2017 13:20, "Peng Xiao" <2535...@qq.com> wrote:
hi,


nodetool cleanup will only remove those keys which no longer belong to those 
nodes,than theoretically we can run nodetool cleanup in parallel,right?the 
document suggests us to run this one by one,but it's too slow.


Thanks,
Peng Xiao

split one DC from a cluster

2017-10-19 Thread Peng Xiao
Hi,
We want to split one DC from a cluster and make this DC a new cluster(rename 
this DC to a new cluster name).
Could you please advise?


Thanks,
Peng Xiao

can repair and bootstrap run simultaneously

2017-10-24 Thread Peng Xiao
Hi there,


Can we add a new node (bootstrap) and run repair on another DC in the cluster 
or even run repair in the same DC?


Thanks,
Peng Xiao

best practice for repair

2017-11-13 Thread Peng Xiao
Hi there,

we need to repair a huge CF,just want to clarify
1.nodetool repair -pr keyspace cf 
2.nodetool repair -st -et -dc 
which will be better? or any other advice?


Thanks,
Peng Xiao

Re: best practice for repair

2017-11-13 Thread Peng Xiao
sub-range repair is much like primary range repair, except that each sub-range 
repair operation focuses even smaller subset of data.


repair is a tough process.any advise?


Thanks


-- Original --
From:  "";<2535...@qq.com>;
Date:  Mon, Nov 13, 2017 06:51 PM
To:  "user"<user@cassandra.apache.org>;

Subject:  best practice for repair



Hi there,

we need to repair a huge CF,just want to clarify
1.nodetool repair -pr keyspace cf 
2.nodetool repair -st -et -dc 
which will be better? or any other advice?


Thanks,
Peng Xiao

consistency against rebuild a new DC

2017-11-27 Thread Peng Xiao
Hi there,


We know that we need to run repair regularly to make data consistency,suppose 
we have DC1 & DC2,
if we add a new DC3 and rebuild from DC1,can we suppose the DC3 is consistency 
with DC1 at least at the time when DC3 is rebuild successfully?


Thanks,
Peng Xiao,

rebuild in the new DC always failed

2017-12-16 Thread Peng Xiao
Hi there,


We need to rebuild a new DC,but the stream is always failed with the following 
errors.
we are using C* 2.1.18.Could anyone please advise?


error: null
-- StackTrace --
java.io.EOFException
at java.io.DataInputStream.readByte(DataInputStream.java:267)
at 
sun.rmi.transport.StreamRemoteCall.executeCall(StreamRemoteCall.java:222)
at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:161)
at com.sun.jmx.remote.internal.PRef.invoke(Unknown Source)
at javax.management.remote.rmi.RMIConnectionImpl_Stub.invoke(Unknown 
Source)
at 
javax.management.remote.rmi.RMIConnector$RemoteMBeanServerConnection.invoke(RMIConnector.java:1020)
at 
javax.management.MBeanServerInvocationHandler.invoke(MBeanServerInvocationHandler.java:298)
at com.sun.proxy.$Proxy8.rebuild(Unknown Source)
at org.apache.cassandra.tools.NodeProbe.rebuild(NodeProbe.java:1057)
at 
org.apache.cassandra.tools.NodeTool$Rebuild.execute(NodeTool.java:1825)
at 
org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:292)
at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:206)



in the source side,the error is:


ERROR [GossipStage:217] 2017-12-17 00:28:02,030 CassandraDaemon.java:231 - 
Exception in thread Thread[GossipStage:217,5,main]
java.lang.NullPointerException: null



Thanks

???????????? rebuild in the new DC always failed

2017-12-16 Thread Peng Xiao
Jeff??
it's already start sending files??and if i only rebuild one node??it's ok??more 
than one node??it will fail.


thanks


--  --
??: Jeff Jirsa <jji...@gmail.com>
: 2017??12??17?? 09:32
??: user <user@cassandra.apache.org>
:  rebuild in the new DC always failed



That NPE in gossip is probably a missing endpointstate, which is probably fixed 
by the gossip patch (#13700) in 2.1.19 

I??m not sure it??s related to the rebuild failing - I??m not actually sure if 
that??s a server failure or jmx timing out. Do you see logs on the servers 
indicating that it was ever invoked? Did it calculate a streaming plan? Did it 
start sending files?  

-- Jeff Jirsa




On Dec 16, 2017, at 4:56 PM, Peng Xiao <2535...@qq.com> wrote:


Hi Jeff,
This is the only informaiton we found from system.log
<2e06f...@9f7caf6b.d1c0355a.jpg>


Thanks
--  --
??: "Jeff Jirsa";<jji...@gmail.com>;
: 2017??12??17??(??) 8:13
??: "user"<user@cassandra.apache.org>;

: Re: rebuild in the new DC always failed



What??s the rest of the stack beneath the null pointer exception? 

-- 
Jeff Jirsa


> On Dec 16, 2017, at 4:11 PM, Peng Xiao <2535...@qq.com> wrote:
> 
> Hi there,
> 
> We need to rebuild a new DC,but the stream is always failed with the 
> following errors.
> we are using C* 2.1.18.Could anyone please advise?
> 
> error: null
> -- StackTrace --
> java.io.EOFException
> at java.io.DataInputStream.readByte(DataInputStream.java:267)
> at 
> sun.rmi.transport.StreamRemoteCall.executeCall(StreamRemoteCall.java:222)
> at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:161)
> at com.sun.jmx.remote.internal.PRef.invoke(Unknown Source)
> at javax.management.remote.rmi.RMIConnectionImpl_Stub.invoke(Unknown 
> Source)
> at 
> javax.management.remote.rmi.RMIConnector$RemoteMBeanServerConnection.invoke(RMIConnector.java:1020)
> at 
> javax.management.MBeanServerInvocationHandler.invoke(MBeanServerInvocationHandler.java:298)
> at com.sun.proxy.$Proxy8.rebuild(Unknown Source)
> at org.apache.cassandra.tools.NodeProbe.rebuild(NodeProbe.java:1057)
> at 
> org.apache.cassandra.tools.NodeTool$Rebuild.execute(NodeTool.java:1825)
> at 
> org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:292)
> at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:206)
> 
> in the source side,the error is:
> 
> ERROR [GossipStage:217] 2017-12-17 00:28:02,030 CassandraDaemon.java:231 - 
> Exception in thread Thread[GossipStage:217,5,main]
> java.lang.NullPointerException: null
> 
> Thanks

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

nodetool rebuild data size

2017-12-13 Thread Peng Xiao
Hi there,


if we have a Cassandra DC1 with data size 60T,RF=3,then we rebuild a new 
DC2(RF=3),how much data will stream to DC2?20T or 60T?


Thanks,
Peng Xiao

?????? rebuild in the new DC always failed

2017-12-16 Thread Peng Xiao
Hi Jeff,
This is the only informaiton we found from system.log



Thanks
--  --
??: "Jeff Jirsa";<jji...@gmail.com>;
: 2017??12??17??(??) 8:13
??: "user"<user@cassandra.apache.org>;

: Re: rebuild in the new DC always failed



What??s the rest of the stack beneath the null pointer exception? 

-- 
Jeff Jirsa


> On Dec 16, 2017, at 4:11 PM, Peng Xiao <2535...@qq.com> wrote:
> 
> Hi there,
> 
> We need to rebuild a new DC,but the stream is always failed with the 
> following errors.
> we are using C* 2.1.18.Could anyone please advise?
> 
> error: null
> -- StackTrace --
> java.io.EOFException
> at java.io.DataInputStream.readByte(DataInputStream.java:267)
> at 
> sun.rmi.transport.StreamRemoteCall.executeCall(StreamRemoteCall.java:222)
> at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:161)
> at com.sun.jmx.remote.internal.PRef.invoke(Unknown Source)
> at javax.management.remote.rmi.RMIConnectionImpl_Stub.invoke(Unknown 
> Source)
> at 
> javax.management.remote.rmi.RMIConnector$RemoteMBeanServerConnection.invoke(RMIConnector.java:1020)
> at 
> javax.management.MBeanServerInvocationHandler.invoke(MBeanServerInvocationHandler.java:298)
> at com.sun.proxy.$Proxy8.rebuild(Unknown Source)
> at org.apache.cassandra.tools.NodeProbe.rebuild(NodeProbe.java:1057)
> at 
> org.apache.cassandra.tools.NodeTool$Rebuild.execute(NodeTool.java:1825)
> at 
> org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:292)
> at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:206)
> 
> in the source side,the error is:
> 
> ERROR [GossipStage:217] 2017-12-17 00:28:02,030 CassandraDaemon.java:231 - 
> Exception in thread Thread[GossipStage:217,5,main]
> java.lang.NullPointerException: null
> 
> Thanks

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

decommissioned node still in gossip

2017-11-01 Thread Peng Xiao
Dear All,


We have decommisioned a DC,but from system.log,it'still gossiping
INFO  [GossipStage:1] 2017-11-01 17:21:36,310 Gossiper.java:1008 - InetAddress 
/x.x.x.x is now DOWN


Could you please advise?


Thanks,
Peng Xiao

?????? decommissioned node still in gossip

2017-11-01 Thread Peng Xiao
Thanks Kurt.




--  --
??: "kurt";<k...@instaclustr.com>;
: 2017??11??1??(??) 7:22
??: "User"<user@cassandra.apache.org>;

: Re: decommissioned node still in gossip



It will likely hang around in gossip for 3-15 days but then should disappear. 
As long as it's not showing up in the cluster it should be OK.

On 1 Nov. 2017 20:25, "Peng Xiao" <2535...@qq.com> wrote:
Dear All,


We have decommisioned a DC,but from system.log,it'still gossiping
INFO  [GossipStage:1] 2017-11-01 17:21:36,310 Gossiper.java:1008 - InetAddress 
/x.x.x.x is now DOWN


Could you please advise?


Thanks,
Peng Xiao

cassandra gc issue

2017-11-02 Thread Peng Xiao
All,


We noticed that the response time jumps very high sometime. The following is 
from the cassandra gc log.


   [Eden: 760.0M(760.0M)->0.0B(11.2G) Survivors: 264.0M->96.0M Heap: 
7657.7M(20.0G)->6893.3M(20.0G)]
Heap after GC invocations=43481 (full 0):
 garbage-first heap   total 20971520K, used 7058765K [0x0002c000, 
0x0002c0805000, 0x0007c000)
  region size 8192K, 12 young (98304K), 12 survivors (98304K)
 Metaspace   used 37426K, capacity 37810K, committed 38144K, reserved 
1083392K
  class spaceused 3930K, capacity 4025K, committed 4096K, reserved 1048576K



Could anyone please advise?


Thanks,
Peng Xiao

?????? run Cassandra on physical machine

2017-12-07 Thread Peng Xiao
Thanks All.




--  --
??: "Jeff Jirsa";<jji...@gmail.com>;
: 2017??12??8??(??) 3:19
??: "cassandra"<user@cassandra.apache.org>;

: Re: run Cassandra on physical machine



Which is to say, right now you can't run them on different ports, but you can 
run them on different IPs on the same machine (and different IPs dont need 
different physical NICs, you can bind multiple IPs to a given physical NIC).


On Thu, Dec 7, 2017 at 10:54 AM, Dikang Gu <dikan...@gmail.com> wrote:
@Peng, how many network interfaces do you have on your machine? If you just 
have one NIC, you probably need to wait this storage port patch. 
https://issues.apache.org/jira/browse/CASSANDRA-7544 .

On Thu, Dec 7, 2017 at 7:01 AM, Oliver Ruebenacker <cur...@gmail.com> wrote:


 Hello,


  Yes, you can.


 Best, Oliver


On Thu, Dec 7, 2017 at 7:12 AM, Peng Xiao <2535...@qq.com> wrote:
Dear All,


Can we run Cassandra on physical machine directly?
we all know that vm can reduce the performance.For instance,we have a machine 
with 56 core,8 ssd disks.
Can we run 8 cassandra instance in the same machine within one rack with 
different port?


Could anyone please advise?


Thanks,
Peng Xiao






-- 
Oliver Ruebenacker

Senior Software Engineer, Diabetes Portal, Broad Institute










 
 






-- 
Dikang

Re: update a record which does not exists

2017-12-03 Thread Peng Xiao
After test,it do will insert a new record.




-- Original --
From:  "";<2535...@qq.com>;
Date:  Mon, Dec 4, 2017 11:13 AM
To:  "user"<user@cassandra.apache.org>;

Subject:  update a record which does not exists



Dear All,If we update a record which actually does not exist in Cassandra,will 
it generate a new record or exit?


UPDATE columnfamily SET data = 'test data' WHERE key = 'row1';

as in CQL Update and insert are semantically the same.Could anyone please 
advise?


Thanks,
Peng Xiao

update a record which does not exists

2017-12-03 Thread Peng Xiao
Dear All,If we update a record which actually does not exist in Cassandra,will 
it generate a new record or exit?


UPDATE columnfamily SET data = 'test data' WHERE key = 'row1';

as in CQL Update and insert are semantically the same.Could anyone please 
advise?


Thanks,
Peng Xiao

run Cassandra on physical machine

2017-12-07 Thread Peng Xiao
Dear All,


Can we run Cassandra on physical machine directly?
we all know that vm can reduce the performance.For instance,we have a machine 
with 56 core,8 ssd disks.
Can we run 8 cassandra instance in the same machine within one rack with 
different port?


Could anyone please advise?


Thanks,
Peng Xiao

?????? rebuild stream issue

2017-12-10 Thread Peng Xiao
Then,how can we restore the rebuild?we are using C*  3.11.0
Can we just delete the data files and rerun the rebuild?looks there will be 
some errors.




--  -- 
??: "Jeff Jirsa";<jji...@gmail.com>;
: 2017??12??11??(??) 1:07
??: "user"<user@cassandra.apache.org>;

: Re: rebuild stream issue



The streams fail, the rebuild times out if you??ve set a timeout. Or you??ll 
need to restart the nodes if you didn??t set a streaming timeout.

-- 
Jeff Jirsa


> On Dec 10, 2017, at 9:05 PM, Peng Xiao <2535...@qq.com> wrote:
> 
> Dear All,
> 
> We are rebuilding a new DC,if one of the source node was restarted,what will 
> happed with the rebuild?
> 
> Thanks,
> Peng Xiao

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

rebuild stream issue

2017-12-10 Thread Peng Xiao
Dear All,

We are rebuilding a new DC,if one of the source node was restarted,what will 
happed with the rebuild?


Thanks,
Peng Xiao

?????? ?????? rebuild stream issue

2017-12-10 Thread Peng Xiao
Thanks Jeff,we will have a try.




--  --
??: "Jeff Jirsa";<jji...@gmail.com>;
: 2017??12??11??(??) 1:20
??: "user"<user@cassandra.apache.org>;

: Re: ?? rebuild stream issue



Just restart the rebuild - it??ll stream some duplicate data but it??ll compact 
away when it??s done

You can also use subrange repair instead of rebuild if you??re short on disk 
space


-- Jeff Jirsa




On Dec 10, 2017, at 9:14 PM, Peng Xiao <2535...@qq.com> wrote:


Then,how can we restore the rebuild?we are using C*  3.11.0
Can we just delete the data files and rerun the rebuild?looks there will be 
some errors.




--  -- 
??: "Jeff Jirsa";<jji...@gmail.com>;
: 2017??12??11??(??) 1:07
??: "user"<user@cassandra.apache.org>;

: Re: rebuild stream issue



The streams fail, the rebuild times out if you??ve set a timeout. Or you??ll 
need to restart the nodes if you didn??t set a streaming timeout.

-- 
Jeff Jirsa


> On Dec 10, 2017, at 9:05 PM, Peng Xiao <2535...@qq.com> wrote:
> 
> Dear All,
> 
> We are rebuilding a new DC,if one of the source node was restarted,what will 
> happed with the rebuild?
> 
> Thanks,
> Peng Xiao

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

rt jump during new node bootstrap

2017-12-11 Thread Peng Xiao
Dear All,


We are using C* 2.1.18,when we bootstrap a new node,the rt will jump when the 
new node start up,then it back to normal.Could anyone please advise?





Thanks,
Peng Xiao

Re: Tuning bootstrap new node

2017-10-31 Thread Peng Xiao
Can we stop the compaction during the new node bootstraping and enable it after 
the new node joined?


Thanks
-- Original --
From:  "";<2535...@qq.com>;
Date:  Tue, Oct 31, 2017 07:18 PM
To:  "user"<user@cassandra.apache.org>;

Subject:  Tuning bootstrap new node



Dear All,

Can we make some tuning to make bootstrap new node more quick?We have a three 
DC cluster(RF=3 in two DCs,RF=1 in another ,48 nodes in the DC with RF=3).As 
the Cluster is becoming larger and larger,we need to spend more than 24 hours 
to bootstrap a new node.
Could you please advise how to tune this ?


Many Thanks,
Peng Xiao

?????? Tuning bootstrap new node

2017-10-31 Thread Peng Xiao
We noticed that streaming will take about 10 hours,all the left is 
compaction,we will improve  concurrent_compactors first.


Thanks all for your reply.




--  --
??: "Jon Haddad";<j...@jonhaddad.com>;
: 2017??11??1??(??) 4:06
??: "user"<user@cassandra.apache.org>;

: Re: Tuning bootstrap new node



Of all the settings you could change, why one that??s related to memtables?  
Streaming doesn??t go through the write path, memtables aren??t involved unless 
you??re using materialized views or CDC.

On Oct 31, 2017, at 11:44 AM, Anubhav Kale <anubhav.k...@microsoft.com.INVALID> 
wrote:

You can change YAML setting of memtable_cleanup_threshold to 0.7 (from the 
default of 0.3). This will push SSTables to disk less often and will reduce the 
compaction time.
 
While this won??t change the streaming time, it will reduce the overall time 
for your node to be healthy.
 
From: Harikrishnan Pillai [mailto:hpil...@walmartlabs.com] 
Sent: Tuesday, October 31, 2017 11:28 AM
To: user@cassandra.apache.org
Subject: Re: Re: Tuning bootstrap new node


 

There is no magic in speeding up the node addition other than increasing stream 
throughput and compaction throughput.

it has been noticed that with heavy compactions the latency may go up if the 
node also start serving data.

if you really don't want  this node to service traffic till all compactions 
settle down, you can disable gossip and binary protocol using the nodetool 
command. This will allow compactions to continue but requires a repair to fix 
the stale data later.

Regards

Hari

 


From: Nitan Kainth <nitankai...@gmail.com>
Sent: Tuesday, October 31, 2017 5:47 AM
To: user@cassandra.apache.org
Subject: EXT: Re: Tuning bootstrap new node
 


Do not stop compaction, you will end up with thousands of sstables.
 


You increase stream throughput from default 200 to a heifer value if your 
network can handle it.
Sent from my iPhone



On Oct 31, 2017, at 6:35 AM, Peng Xiao <2535...@qq.com> wrote:

Can we stop the compaction during the new node bootstraping and enable it after 
the new node joined?

 

Thanks

-- Original --

From:  "";<2535...@qq.com>;

Date:  Tue, Oct 31, 2017 07:18 PM

To:  "user"<user@cassandra.apache.org>;

Subject:  Tuning bootstrap new node


 

Dear All,
 

Can we make some tuning to make bootstrap new node more quick?We have a three 
DC cluster(RF=3 in two DCs,RF=1 in another ,48 nodes in the DC with RF=3).As 
the Cluster is becoming larger and larger,we need to spend more than 24 hours 
to bootstrap a new node.

Could you please advise how to tune this ?

 

Many Thanks,

Peng Xiao

回复: split one DC from a cluster

2017-10-20 Thread Peng Xiao
Thanks Kurt,we may will still use snapshot and sstableloader to split this 
schema to another cluster.




-- 原始邮件 --
发件人: "kurt";;
发送时间: 2017年10月19日(星期四) 晚上6:11
收件人: "User";

主题: Re: split one DC from a cluster



Easiest way is to separate them via firewall/network partition so the DC's 
can't talk to each other, ensure each DC sees the other DC as DOWN, then remove 
the other DC from replication, then remove all the nodes in the opposite DC 
using removenode.​

how to identify the root cause of cassandra hang

2017-10-26 Thread Peng Xiao
Hi,


We have a cluster with 48 nodes configured with RACK,sometimes it's hang for 
even 2 minutes.the response time jump from 300ms to 15s.
Could anyone please advise how to identified the root cause ?


The following is from the system log


INFO  [Service Thread] 2017-10-26 21:45:46,796 GCInspector.java:258 - G1 Young 
Generation GC in 222ms.  G1 Eden Space: 939524096 -> 0; G1 Old Gen: 6652738584 
-> 6662878232; G1 Survivor Space: 134217728 -> 109051904;
INFO  [Service Thread] 2017-10-26 21:45:46,796 StatusLogger.java:51 - Pool Name 
   Active   Pending  Completed   Blocked  All Time Blocked
INFO  [Service Thread] 2017-10-26 21:45:46,796 StatusLogger.java:66 - 
MutationStage 0 3 3612475121 0  
   0
INFO  [Service Thread] 2017-10-26 21:45:46,796 StatusLogger.java:66 - 
RequestResponseStage  0 0 6333593550 0  
   0
INFO  [Service Thread] 2017-10-26 21:45:46,796 StatusLogger.java:66 - 
ReadRepairStage   0 02773154 0  
   0
INFO  [Service Thread] 2017-10-26 21:45:46,796 StatusLogger.java:66 - 
CounterMutationStage  0 0  0 0  
   0
INFO  [Service Thread] 2017-10-26 21:45:46,796 StatusLogger.java:66 - ReadStage 
0 4  417419357 0 0



Thanks.

Re: run cleanup and rebuild simultaneously

2017-12-22 Thread Peng Xiao
Thanks Jeff




-- Original --
From: Jeff Jirsa <jji...@gmail.com>
Date: ,12?? 23,2017 09:28
To: user <user@cassandra.apache.org>
Subject: Re: run cleanup and rebuild simultaneously



Should be fine, though it will increase disk usage in dc1 for a while - a 
reference to the the cleaned up sstables will be held by the rebuild streams, 
causing you to temporarily increase disk usage until rebuild finishes streaming

-- Jeff Jirsa




On Dec 22, 2017, at 4:30 PM, Peng Xiao <2535...@qq.com> wrote:


Hi there,Can we run nodetool cleanup in DC1,and run rebuild in DC2 against DC1 
simultaneously?
in C* 2.1.18




Thanks,
Peng Xiao

Gossip stage pending tasks cause application rt jump

2017-12-23 Thread Peng Xiao
Hi,
We noticed that there will be the following warnings from the system.log and it 
will cause C* very slow.


WARN  [GossipTasks:1] 2017-12-23 16:52:17,818 Gossiper.java:748 - Gossip stage 
has 4 pending tasks; skipping status check (no nodes will be marked down)


Could anyone please advise?


Thanks,
Peng Xiao

how to check C* partition size

2018-01-07 Thread Peng Xiao
Hi guys,


Could anyone please help on this simple question?
How to check C* partition size and related information.
looks nodetool ring only shows the token distribution.


Thanks

?????? secondary index creation causes C* oom

2018-01-10 Thread Peng Xiao
Thanks Kurt.




--  --
??: "kurt";;
: 2018??1??11??(??) 11:46
??: "User";

: Re: secondary index creation causes C* oom




1.not sure if secondary index creation is the same as index rebuild

Fairly sure they are the same. 
2.we noticed that the memory table flush looks still working,not the same as 
CASSANDRA-12796 mentioned,but the compactionExecutor pending is increasing.

Do you by chance have concurrent_compactors only set to 2? Seems like your 
index builds are blocking other compactions from taking place.
Seems that maybe postflush is backed up because it's blocked on writes 
generated from the rebuild? Maybe, anyway.
3.I'm wondering if the block only blocks the specified table which is creating 
secondary index?

If all your flush writers/post flushers are blocked I assume no other flushes 
will be able to take place, regardless of table. 


Seems like CASSANDRA-12796 is related but not sure why it didn't get fixed in 
2.1.

C* keyspace layout

2018-01-11 Thread Peng Xiao
Hi there,


We plan to set keyspace1 in DC1 and DC2,keyspace2 in DC3 and DC4,all still in 
the same cluster,to avoid the interrupt.Is there any potential risk for this 
architecture?


Thanks,
Peng Xiao

secondary index creation causes C* oom

2018-01-09 Thread Peng Xiao
Dear All,


We met some C* nodes oom during secondary index creation with C* 2.1.18.
As per https://issues.apache.org/jira/browse/CASSANDRA-12796,the flush writer 
will be blocked by index rebuild.but we still have some confusions:
1.not sure if secondary index creation is the same as index rebuild
2.we noticed that the memory table flush looks still working,not the same as 
CASSANDRA-12796 mentioned,but the compactionExecutor pending is increasing.
3.I'm wondering if the block only blocks the specified table which is creating 
secondary index?


Could anyone please explain?





Thanks

does copy command will clear all the old data?

2018-02-12 Thread Peng Xiao
Dear All,


I'm trying to import csv file to a table with copy command?The question is:
will the copy command clear all the old data in this table?as we only want to 
append the csv file to this table


Thanks

?????? does copy command will clear all the old data?

2018-02-12 Thread Peng Xiao
Thanks Nandan for the confirmation.I also did the test.




--  --
??: "@Nandan@"<nandanpriyadarshi...@gmail.com>;
: 2018??2??13??(??) 12:52
??: "user"<user@cassandra.apache.org>;

: Re: does copy command will clear all the old data?



Hi Peng,Copy command will append upsert all data [based on size] to your 
existing Cassandra table. Just for testing, I executed before for 50 data 
by using COPY command into small cql table and it work very fine. 


Point to make sure :- Please check your primary key before play with COPY 
command. 


Thanks 




On Tue, Feb 13, 2018 at 12:49 PM, Peng Xiao <2535...@qq.com> wrote:
Dear All,


I'm trying to import csv file to a table with copy command?The question is:
will the copy command clear all the old data in this table?as we only want to 
append the csv file to this table


Thanks

run cleanup and rebuild simultaneously

2017-12-22 Thread Peng Xiao
Hi there,Can we run nodetool cleanup in DC1,and run rebuild in DC2 against DC1 
simultaneously?
in C* 2.1.18




Thanks,
Peng Xiao

Re: auto_bootstrap for seed node

2018-03-27 Thread Peng Xiao
We followed this 
https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_add_dc_to_cluster_t.html,
but it does not mention that change bootstrap for seed nodes after the rebuild.


Thanks,
Peng Xiao 




-- Original --
From:  "Ali Hubail"<ali.hub...@petrolink.com>;
Date:  Wed, Mar 28, 2018 10:48 AM
To:  "user"<user@cassandra.apache.org>;

Subject:  Re: auto_bootstrap for seed node



You might want to follow DataStax docs on this one: 
 
For adding a DC to an existing cluster: 
https://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/operations/opsAddDCToCluster.html
 
For adding a new node to an existing cluster: 
https://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/operations/opsAddNodeToCluster.html
 
 
briefly speaking, 
adding one node to an existing cluster --> use auto_bootstrap 
adding a DC to an existing cluster --> rebuild 
 
You need to check the version of c* that you're running, and make sure you pick 
the right doc version for that. 
 
Most of my colleagues miss very important steps while adding/removing 
nodes/cluster, but if they stick to the docs, they always get it done right. 
 
Hope this helps 

 Ali Hubail
 
 Confidentiality warning: This message and any attachments are intended only 
for the persons to whom this message is addressed, are confidential, and may be 
privileged. If you are not the intended recipient, you are hereby notified that 
any review, retransmission, conversion to hard copy, copying, modification, 
circulation or other use of this message and any attachments is strictly 
prohibited. If you receive this message in error, please notify the sender 
immediately by return email, and delete this message and any attachments from 
your system. Petrolink International Limited its subsidiaries, holding 
companies and affiliates disclaims all responsibility from and accepts no 
liability whatsoever for the consequences of any unauthorized person acting, or 
refraining from acting, on any information contained in this message. For 
security purposes, staff training, to assist in resolving complaints and to 
improve our customer service, email communications may be monitored and 
telephone calls may be recorded. 
 
 
   "Peng Xiao" <2535...@qq.com>  
03/27/2018 09:39 PM 
   Please respond to
 user@cassandra.apache.org

 
 To
 "user" <user@cassandra.apache.org>,  
  cc
  
  Subject
 auto_bootstrap for seed node
 

 

 
 
 
Dear All, 
 
For adding a new DC ,we need to set auto_bootstrap: false and then run the 
rebuild,finally we need to change auto_bootstrap: true,but for seed nodes,it 
seems that we still need to keep bootstrap false? 
Could anyone please confirm? 
 
Thanks, 
Peng Xiao

auto_bootstrap for seed node

2018-03-27 Thread Peng Xiao
Dear All,

For adding a new DC ,we need to set auto_bootstrap: false and then run the 
rebuild,finally we need to change auto_bootstrap: true,but for seed nodes,it 
seems that we still need to keep bootstrap false?
Could anyone please confirm?


Thanks,
Peng Xiao

replace dead node vs remove node

2018-03-22 Thread Peng Xiao
Dear All,


when one node failure with hardware errors,it will be in DN status in the 
cluster.Then if we are not able to handle this error in three hours(max hints 
window),we will loss data,right?we have to run repair to keep the consistency.
And as per 
https://blog.alteroot.org/articles/2014-03-12/replace-a-dead-node-in-cassandra.html,we
 can replace this dead node,is it the same as bootstrap new node?that means we 
don't need to remove node and rejoin?
Could anyone please advise?


Thanks,
Peng Xiao

disable compaction in bootstrap process

2018-03-22 Thread Peng Xiao
Dear All,


We noticed that when bootstrap new node,the source node is also quite busy 
doing compactions which impact the rt severely.Is it reasonable to disable 
compaction on all the source node?


Thanks,
Peng Xiao

?????? replace dead node vs remove node

2018-03-22 Thread Peng Xiao
Hi Anthony,


there is a problem with replacing dead node as per the blog,if the replacement 
process takes longer than max_hint_window_in_ms,we must run repair to make the 
replaced node consistent again, since it missed ongoing writes during 
bootstrapping.but for a great cluster,repair is a painful process.
 
Thanks,
Peng Xiao






--  --
??: "Anthony Grasso"<anthony.gra...@gmail.com>;
: 2018??3??22??(??) 7:13
??: "user"<user@cassandra.apache.org>;

: Re: replace dead node vs remove node



Hi Peng,

Depending on the hardware failure you can do one of two things:



1. If the disks are intact and uncorrupted you could just use the disks with 
the current data on them in the new node. Even if the IP address changes for 
the new node that is fine. In that case all you need to do is run repair on the 
new node. The repair will fix any writes the node missed while it was down. 
This process is similar to the scenario in this blog post: 
http://thelastpickle.com/blog/2018/02/21/replace-node-without-bootstrapping.html


2. If the disks are inaccessible or corrupted, then use the method as described 
in the blogpost you linked to. The operation is similar to bootstrapping a new 
node. There is no need to perform any other remove or join operation on the 
failed or new nodes. As per the blog post, you definitely want to run repair on 
the new node as soon as it joins the cluster. In this case here, the data on 
the failed node is effectively lost and replaced with data from other nodes in 
the cluster.


Hope this helps.


Regards,
Anthony


On Thu, 22 Mar 2018 at 20:52, Peng Xiao <2535...@qq.com> wrote:

Dear All,


when one node failure with hardware errors,it will be in DN status in the 
cluster.Then if we are not able to handle this error in three hours(max hints 
window),we will loss data,right?we have to run repair to keep the consistency.
And as per 
https://blog.alteroot.org/articles/2014-03-12/replace-a-dead-node-in-cassandra.html,we
 can replace this dead node,is it the same as bootstrap new node?that means we 
don't need to remove node and rejoin?
Could anyone please advise?


Thanks,
Peng Xiao

?????? disable compaction in bootstrap process

2018-03-22 Thread Peng Xiao
Sorry Alain,maybe some misunderstanding here,I mean to disable Compaction in 
the bootstrapping process,then enable it after the bootstrapping.




--  --
??: ""<2535...@qq.com>;
: 2018??3??23??(??) 10:54
??: "user"<user@cassandra.apache.org>;

: ?? disable compaction in bootstrap process



Thanks Alain.We are using C* 2.1.18,7core/30G/1.5T ssd,as the cluster is 
growing too fast,we are painful in bootstrap/rebuild/remove node.


Thanks,
Peng Xiao


--  --
??: "Alain RODRIGUEZ"<arodr...@gmail.com>;
: 2018??3??22??(??) 7:31
??: "user cassandra.apache.org"<user@cassandra.apache.org>;

: Re: disable compaction in bootstrap process



Hello,
 
Is it reasonable to disable compaction on all the source node?
I would say no, as a short answer.

You can, I did it for some operations in the past. Technically no problem you 
can do that. It will most likely improve the response time of the queries 
immediately as it seems that in your cluster compactions are impacting the 
transactions.

That being said, the impact in the middle/long term will be substantially 
worst. Compactions allow fragments of rows to be merged so the reads can be 
more efficient, hitting the disk just once ideally (at least to reach a 
reasonably low number of hits on the disk). Also, when enabling compactions 
back you might have troubles again as compaction will have to catch up.

Imho, disabling compaction should be an action to take unless your 
understanding about compaction is good enough and you are in a very specific 
case that requires it.
In any case, I would recommend you to stay away from using this solution as a 
quick workaround. It could lead to really wrong situations. Without mentioning 
tombstones that would stack there. Plus, doing this on all the nodes at once is 
really calling for troubles as all the nodes performances might degrade at the 
same pace, roughly.

I would suggest a troubleshooting on why compactions are actually impacting the 
read/write performances.


We probably can help with this here as I believe all the Cassandra users had to 
deal with this at some point (at least people running with 'limited' hardware 
compared to the needs).

Here are some questions that I believe might be useful for us to help you or 
even for you to troubleshoot.

- Is Cassandra limiting thing or resources reaching a limit?
- Is the cluster CPU or Disk bounded?
- What are the number of concurrent compactors and compaction speed in use?
- What hardware are you relying on?
- What version are you using?
- Is compaction keeping up? What compactions strategy are you using?
- 'nodetool tpstats' might also give information on pending and dropped tasks. 
It might be useful.

C*heers,
---
Alain Rodriguez - @arodream - al...@thelastpickle.com
France / Spain


The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com



2018-03-22 9:09 GMT+00:00 Peng Xiao <2535...@qq.com>:
Dear All,


We noticed that when bootstrap new node,the source node is also quite busy 
doing compactions which impact the rt severely.Is it reasonable to disable 
compaction on all the source node?


Thanks,
Peng Xiao

?????? disable compaction in bootstrap process

2018-03-22 Thread Peng Xiao
Thanks Alain.We are using C* 2.1.18,7core/30G/1.5T ssd,as the cluster is 
growing too fast,we are painful in bootstrap/rebuild/remove node.


Thanks,
Peng Xiao


--  --
??: "Alain RODRIGUEZ"<arodr...@gmail.com>;
: 2018??3??22??(??) 7:31
??: "user cassandra.apache.org"<user@cassandra.apache.org>;

: Re: disable compaction in bootstrap process



Hello,
 
Is it reasonable to disable compaction on all the source node?
I would say no, as a short answer.

You can, I did it for some operations in the past. Technically no problem you 
can do that. It will most likely improve the response time of the queries 
immediately as it seems that in your cluster compactions are impacting the 
transactions.

That being said, the impact in the middle/long term will be substantially 
worst. Compactions allow fragments of rows to be merged so the reads can be 
more efficient, hitting the disk just once ideally (at least to reach a 
reasonably low number of hits on the disk). Also, when enabling compactions 
back you might have troubles again as compaction will have to catch up.

Imho, disabling compaction should be an action to take unless your 
understanding about compaction is good enough and you are in a very specific 
case that requires it.
In any case, I would recommend you to stay away from using this solution as a 
quick workaround. It could lead to really wrong situations. Without mentioning 
tombstones that would stack there. Plus, doing this on all the nodes at once is 
really calling for troubles as all the nodes performances might degrade at the 
same pace, roughly.

I would suggest a troubleshooting on why compactions are actually impacting the 
read/write performances.


We probably can help with this here as I believe all the Cassandra users had to 
deal with this at some point (at least people running with 'limited' hardware 
compared to the needs).

Here are some questions that I believe might be useful for us to help you or 
even for you to troubleshoot.

- Is Cassandra limiting thing or resources reaching a limit?
- Is the cluster CPU or Disk bounded?
- What are the number of concurrent compactors and compaction speed in use?
- What hardware are you relying on?
- What version are you using?
- Is compaction keeping up? What compactions strategy are you using?
- 'nodetool tpstats' might also give information on pending and dropped tasks. 
It might be useful.

C*heers,
---
Alain Rodriguez - @arodream - al...@thelastpickle.com
France / Spain


The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com



2018-03-22 9:09 GMT+00:00 Peng Xiao <2535...@qq.com>:
Dear All,


We noticed that when bootstrap new node,the source node is also quite busy 
doing compactions which impact the rt severely.Is it reasonable to disable 
compaction on all the source node?


Thanks,
Peng Xiao

?????? disable compaction in bootstrap process

2018-03-23 Thread Peng Xiao
Many thanks Alain for the thorough explanation,we will not disable compaction 
for now.


Thanks,
Peng Xiao


--  --
??: "arodrime"<arodr...@gmail.com>;
: 2018??3??23??(??) ????8:57
??: "Peng Xiao"<2535...@qq.com>;
: "user"<user@cassandra.apache.org>; 
: Re: disable compaction in bootstrap process



I mean to disable Compaction in the bootstrapping process,then enable it after 
the bootstrapping.

That's how I understood it :-). Bootstrap can take a relatively long time and 
could affect all the nodes when using vnodes. Disabling compactions for hours 
is risky, even more, if the cluster is somewhat under pressure already.  My 
point is, it might work for you, but it might also bring a whole lot of other 
issues, starting with increasing latencies. Plus all this compaction work on 
hold will have to be performed, at some point, later.

You asked if it is 'reasonable', I would say no unless you know for sure the 
cluster will handle it properly. Here is what I think would be a reasonable 
approach:

Before going for solutions, and especially this solution, it is important to 
understand the limitations, to find the bottleneck or the root cause of the 
troubles. In a healthy cluster, a node can handle streaming the data, 
compacting and answering client requests. Once what is wrong is clear, it will 
be way easier to think about possible solutions and pick the best one. For now, 
we are only making guesses. Taking quick actions after making wrong guesses and 
without fully understanding consequences is where I saw the most damages being 
done to Cassandra clusters. I did that too, I don't recommend :-).


we are painful in bootstrap/rebuild/remove node.

As you express the cluster is having troubles with streaming operations 
('bootstrap/rebuild/remove node'), you can try reducing the streaming 
throughput. There is no rush in adding the new nodes as long as the other nodes 
are healthier meanwhile. Thus reducing the speed will reduce the pressure on 
the disk (mostly). This change should not harm in any case, just make things 
slower. This can be a reasonable try (keeping in mind it is a workaround and 
there is probably an underlying issue if you are using defaults).

'nodetool getstreamthroughput'
'nodetool setstreamthroughput x'

Default x is 200 (I believe). If using vnodes, do not be afraid to lower this 
quite a lot, as all the nodes are probably involved in the streaming process.

But again, until we know more about the metrics or the context, we are mostly 
guessing. With the following information, we could probably help more 
efficiently:


- What are the values of 'concurrent_compactors' and 
'compaction_throughput_in_mb' in use? (cassandra.yaml)
- Is the cluster CPU or Disk bounded? (system tools htop / charts, etc. What is 
the cpu load, % of cpu used, some io_wait ?)
- Is compaction keeping up? ('nodetool compactionstats -H')
- What compactions strategy are you using? (Table definition - ie 'echo 
"DESCRIBE TABLE keyspace.table;" | grep -i compaction')
- 'nodetool tpstats' might also give information on pending and dropped tasks.
- 'nodetool cfstats' could help as well


In any case, good luck ;-)


C*heers,
---
Alain Rodriguez - @arodream - al...@thelastpickle.com
France / Spain


The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com



2018-03-23 2:59 GMT+00:00 Peng Xiao <2535...@qq.com>:
Sorry Alain,maybe some misunderstanding here,I mean to disable Compaction in 
the bootstrapping process,then enable it after the bootstrapping.




--  --
??: ""<2535...@qq.com>;
: 2018??3??23??(??) 10:54
??: "user"<user@cassandra.apache.org>;

: ?? disable compaction in bootstrap process



Thanks Alain.We are using C* 2.1.18,7core/30G/1.5T ssd,as the cluster is 
growing too fast,we are painful in bootstrap/rebuild/remove node.


Thanks,
Peng Xiao


--  --
??: "Alain RODRIGUEZ"<arodr...@gmail.com>;
: 2018??3??22??(??) 7:31
??: "user cassandra.apache.org"<user@cassandra.apache.org>;

: Re: disable compaction in bootstrap process



Hello,
 
Is it reasonable to disable compaction on all the source node?
I would say no, as a short answer.

You can, I did it for some operations in the past. Technically no problem you 
can do that. It will most likely improve the response time of the queries 
immediately as it seems that in your cluster compactions are impacting the 
transactions.

That being said, the impact in the middle/long term will be substantially 
worst. Compactions allow fragments of rows to be merged so the reads can be 
more efficient, hitting the disk just once ideally (at least to reach a 
reasonably low number of h