[jira] [Updated] (CASSANDRA-10052) Bringing one node down, makes the whole cluster go down for a second

2015-09-22 Thread Joshua McKenzie (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joshua McKenzie updated CASSANDRA-10052:

Fix Version/s: (was: 3.0.0 rc2)
   2.1.x

> Bringing one node down, makes the whole cluster go down for a second
> 
>
> Key: CASSANDRA-10052
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10052
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Sharvanath Pathak
>Assignee: Stefania
>  Labels: client-impacting
> Fix For: 2.1.x
>
>
> When a node goes down, the other nodes learn that through the gossip.
> And I do see the log from (Gossiper.java):
> {code}
> private void markDead(InetAddress addr, EndpointState localState)
>{
>if (logger.isTraceEnabled())
>logger.trace("marking as down {}", addr);
>localState.markDead();
>liveEndpoints.remove(addr);
>unreachableEndpoints.put(addr, System.nanoTime());
>logger.info("InetAddress {} is now DOWN", addr);
>for (IEndpointStateChangeSubscriber subscriber : subscribers)
>subscriber.onDead(addr, localState);
>if (logger.isTraceEnabled())
>logger.trace("Notified " + subscribers);
>}
> {code}
> Saying: "InetAddress 192.168.101.1 is now Down", in the Cassandra's system 
> log.
> Now on all the other nodes the client side (java driver) says, " Cannot 
> connect to any host, scheduling retry in 1000 milliseconds". They eventually 
> do reconnect but some queries fail during this intermediate period.
> To me it seems like when the server pushes the nodeDown event, it call the 
> getRpcAddress(endpoint), and thus sends localhost as the argument in the 
> nodeDown event.  
> As in org.apache.cassandra.transport.Server.java
> {code}
>   public void onDown(InetAddress endpoint)
>{  
>
> server.connectionTracker.send(Event.StatusChange.nodeDown(getRpcAddress(endpoint),
>  server.socket.getPort()));
>}
> {code}
> the getRpcAddress returns localhost for any endpoint if the cassandra.yaml is 
> using localhost as the configuration for rpc_address (which by the way is the 
> default).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10052) Bringing one node down, makes the whole cluster go down for a second

2015-09-01 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-10052:
---
 Priority: Major  (was: Critical)
Fix Version/s: 2.2.x
   2.1.x

> Bringing one node down, makes the whole cluster go down for a second
> 
>
> Key: CASSANDRA-10052
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10052
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Sharvanath Pathak
>Assignee: Stefania
> Fix For: 2.1.x, 2.2.x
>
>
> When a node goes down, the other nodes learn that through the gossip.
> And I do see the log from (Gossiper.java):
> {code}
> private void markDead(InetAddress addr, EndpointState localState)
>{
>if (logger.isTraceEnabled())
>logger.trace("marking as down {}", addr);
>localState.markDead();
>liveEndpoints.remove(addr);
>unreachableEndpoints.put(addr, System.nanoTime());
>logger.info("InetAddress {} is now DOWN", addr);
>for (IEndpointStateChangeSubscriber subscriber : subscribers)
>subscriber.onDead(addr, localState);
>if (logger.isTraceEnabled())
>logger.trace("Notified " + subscribers);
>}
> {code}
> Saying: "InetAddress 192.168.101.1 is now Down", in the Cassandra's system 
> log.
> Now on all the other nodes the client side (java driver) says, " Cannot 
> connect to any host, scheduling retry in 1000 milliseconds". They eventually 
> do reconnect but some queries fail during this intermediate period.
> To me it seems like when the server pushes the nodeDown event, it call the 
> getRpcAddress(endpoint), and thus sends localhost as the argument in the 
> nodeDown event.  
> As in org.apache.cassandra.transport.Server.java
> {code}
>   public void onDown(InetAddress endpoint)
>{  
>
> server.connectionTracker.send(Event.StatusChange.nodeDown(getRpcAddress(endpoint),
>  server.socket.getPort()));
>}
> {code}
> the getRpcAddress returns localhost for any endpoint if the cassandra.yaml is 
> using localhost as the configuration for rpc_address (which by the way is the 
> default).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10052) Bringing one node down, makes the whole cluster go down for a second

2015-09-01 Thread Tyler Hobbs (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tyler Hobbs updated CASSANDRA-10052:

Labels: client-impacting  (was: )

> Bringing one node down, makes the whole cluster go down for a second
> 
>
> Key: CASSANDRA-10052
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10052
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Sharvanath Pathak
>Assignee: Stefania
>  Labels: client-impacting
> Fix For: 2.1.x, 2.2.x
>
>
> When a node goes down, the other nodes learn that through the gossip.
> And I do see the log from (Gossiper.java):
> {code}
> private void markDead(InetAddress addr, EndpointState localState)
>{
>if (logger.isTraceEnabled())
>logger.trace("marking as down {}", addr);
>localState.markDead();
>liveEndpoints.remove(addr);
>unreachableEndpoints.put(addr, System.nanoTime());
>logger.info("InetAddress {} is now DOWN", addr);
>for (IEndpointStateChangeSubscriber subscriber : subscribers)
>subscriber.onDead(addr, localState);
>if (logger.isTraceEnabled())
>logger.trace("Notified " + subscribers);
>}
> {code}
> Saying: "InetAddress 192.168.101.1 is now Down", in the Cassandra's system 
> log.
> Now on all the other nodes the client side (java driver) says, " Cannot 
> connect to any host, scheduling retry in 1000 milliseconds". They eventually 
> do reconnect but some queries fail during this intermediate period.
> To me it seems like when the server pushes the nodeDown event, it call the 
> getRpcAddress(endpoint), and thus sends localhost as the argument in the 
> nodeDown event.  
> As in org.apache.cassandra.transport.Server.java
> {code}
>   public void onDown(InetAddress endpoint)
>{  
>
> server.connectionTracker.send(Event.StatusChange.nodeDown(getRpcAddress(endpoint),
>  server.socket.getPort()));
>}
> {code}
> the getRpcAddress returns localhost for any endpoint if the cassandra.yaml is 
> using localhost as the configuration for rpc_address (which by the way is the 
> default).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10052) Bringing one node down, makes the whole cluster go down for a second

2015-08-12 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-10052:
---
Assignee: Stefania

I see.  Sounds like we should just special-case it and not send anything from 
onDown if a peer listening on localhost goes down.

 Bringing one node down, makes the whole cluster go down for a second
 

 Key: CASSANDRA-10052
 URL: https://issues.apache.org/jira/browse/CASSANDRA-10052
 Project: Cassandra
  Issue Type: Bug
Reporter: Sharvanath Pathak
Assignee: Stefania
Priority: Critical

 When a node goes down, the other nodes learn that through the gossip.
 And I do see the log from (Gossiper.java):
 {code}
 private void markDead(InetAddress addr, EndpointState localState)
{
if (logger.isTraceEnabled())
logger.trace(marking as down {}, addr);
localState.markDead();
liveEndpoints.remove(addr);
unreachableEndpoints.put(addr, System.nanoTime());
logger.info(InetAddress {} is now DOWN, addr);
for (IEndpointStateChangeSubscriber subscriber : subscribers)
subscriber.onDead(addr, localState);
if (logger.isTraceEnabled())
logger.trace(Notified  + subscribers);
}
 {code}
 Saying: InetAddress 192.168.101.1 is now Down, in the Cassandra's system 
 log.
 Now on all the other nodes the client side (java driver) says,  Cannot 
 connect to any host, scheduling retry in 1000 milliseconds. They eventually 
 do reconnect but some queries fail during this intermediate period.
 To me it seems like when the server pushes the nodeDown event, it call the 
 getRpcAddress(endpoint), and thus sends localhost as the argument in the 
 nodeDown event.  
 As in org.apache.cassandra.transport.Server.java
 {code}
   public void onDown(InetAddress endpoint)
{  

 server.connectionTracker.send(Event.StatusChange.nodeDown(getRpcAddress(endpoint),
  server.socket.getPort()));
}
 {code}
 the getRpcAddress returns localhost for any endpoint if the cassandra.yaml is 
 using localhost as the configuration for rpc_address (which by the way is the 
 default).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10052) Bringing one node down, makes the whole cluster go down for a second

2015-08-12 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-10052:
---
Reviewer: Olivier Michallat  (was: Sylvain Lebresne)

 Bringing one node down, makes the whole cluster go down for a second
 

 Key: CASSANDRA-10052
 URL: https://issues.apache.org/jira/browse/CASSANDRA-10052
 Project: Cassandra
  Issue Type: Bug
Reporter: Sharvanath Pathak
Assignee: Stefania
Priority: Critical

 When a node goes down, the other nodes learn that through the gossip.
 And I do see the log from (Gossiper.java):
 {code}
 private void markDead(InetAddress addr, EndpointState localState)
{
if (logger.isTraceEnabled())
logger.trace(marking as down {}, addr);
localState.markDead();
liveEndpoints.remove(addr);
unreachableEndpoints.put(addr, System.nanoTime());
logger.info(InetAddress {} is now DOWN, addr);
for (IEndpointStateChangeSubscriber subscriber : subscribers)
subscriber.onDead(addr, localState);
if (logger.isTraceEnabled())
logger.trace(Notified  + subscribers);
}
 {code}
 Saying: InetAddress 192.168.101.1 is now Down, in the Cassandra's system 
 log.
 Now on all the other nodes the client side (java driver) says,  Cannot 
 connect to any host, scheduling retry in 1000 milliseconds. They eventually 
 do reconnect but some queries fail during this intermediate period.
 To me it seems like when the server pushes the nodeDown event, it call the 
 getRpcAddress(endpoint), and thus sends localhost as the argument in the 
 nodeDown event.  
 As in org.apache.cassandra.transport.Server.java
 {code}
   public void onDown(InetAddress endpoint)
{  

 server.connectionTracker.send(Event.StatusChange.nodeDown(getRpcAddress(endpoint),
  server.socket.getPort()));
}
 {code}
 the getRpcAddress returns localhost for any endpoint if the cassandra.yaml is 
 using localhost as the configuration for rpc_address (which by the way is the 
 default).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10052) Bringing one node down, makes the whole cluster go down for a second

2015-08-12 Thread Sharvanath Pathak (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sharvanath Pathak updated CASSANDRA-10052:
--
Summary: Bringing one node down, makes the whole cluster go down for a 
second  (was: Bring one node down, makes the whole cluster go down for a second)

 Bringing one node down, makes the whole cluster go down for a second
 

 Key: CASSANDRA-10052
 URL: https://issues.apache.org/jira/browse/CASSANDRA-10052
 Project: Cassandra
  Issue Type: Bug
Reporter: Sharvanath Pathak
Priority: Critical

 When a node goes down, the other nodes learn that through the gossip.
 And I do see the log from (Gossiper.java):
 {code}
 private void markDead(InetAddress addr, EndpointState localState)
{
if (logger.isTraceEnabled())
logger.trace(marking as down {}, addr);
localState.markDead();
liveEndpoints.remove(addr);
unreachableEndpoints.put(addr, System.nanoTime());
logger.info(InetAddress {} is now DOWN, addr);
for (IEndpointStateChangeSubscriber subscriber : subscribers)
subscriber.onDead(addr, localState);
if (logger.isTraceEnabled())
logger.trace(Notified  + subscribers);
}
 {code}
 Saying: InetAddress 192.168.101.1 is now Down, in the Cassandra's system 
 log.
 Now on all the other nodes the client side (java driver) says,  Cannot 
 connect to any host, scheduling retry in 1000 milliseconds. They eventually 
 do reconnect but some queries fail during this intermediate period.
 To me it seems like when the server pushes the nodeDown event, it call the 
 getRpcAddress(endpoint), and thus sends localhost as the argument in the 
 nodeDown event.  
 As in org.apache.cassandra.transport.Server.java
 {code}
   public void onDown(InetAddress endpoint)
{  

 server.connectionTracker.send(Event.StatusChange.nodeDown(getRpcAddress(endpoint),
  server.socket.getPort()));
}
 {code}
 the getRpcAddress returns localhost for any endpoint if the cassandra.yaml is 
 using localhost as the configuration for rpc_address (which by the way is the 
 default).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)