[ 
https://issues.apache.org/jira/browse/CASSANDRA-3533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams reopened CASSANDRA-3533:
-----------------------------------------


Something's wrong here, because I'm randomly seeing these in the dtests:

{noformat}
 INFO [main] 2013-04-05 04:53:22,574 ThriftServer.java (line 90) Binding thrift 
service to /127.0.0.2:9160
 INFO [main] 2013-04-05 04:53:22,622 ThriftServer.java (line 102) Using 
TFramedTransport with a max frame size of 15728640 bytes.
ERROR [GossipStage:1] 2013-04-05 04:53:23,048 CassandraDaemon.java (line 179) 
Exception in thread Thread[GossipStage:1,5,main]
java.lang.AssertionError
    at 
org.apache.cassandra.service.EchoVerbHandler.doVerb(EchoVerbHandler.java:17)
    at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:56)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
    at java.lang.Thread.run(Thread.java:662)
{noformat}
                
> TimeoutException when there is a firewall issue.
> ------------------------------------------------
>
>                 Key: CASSANDRA-3533
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3533
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Vijay
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 2.0
>
>         Attachments: 0001-CASSANDRA-3533.patch, 3533.txt
>
>
> When one node in the cluster is not able to talk to the other DC/RAC due to 
> firewall or network related issue (StorageProxy calls fail), and the nodes 
> are NOT marked down because at least one node in the cluster can talk to the 
> other DC/RAC, we get timeoutException instead of throwing a 
> unavailableException.
> The problem with this:
> 1) It is hard to monitor/identify these errors.
> 2) It is hard to diffrentiate from the client if the node being bad vs a bad 
> query.
> 3) when this issue happens we have to wait for at-least the RPC timeout time 
> to know that the query wont succeed.
> Possible Solution: when marking a node down we might want to check if the 
> node is actually alive by trying to communicate to it? So we can be sure that 
> the node is actually alive.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to