[jira] [Commented] (CASSANDRA-13196) test failure in snitch_test.TestGossipingPropertyFileSnitch.test_prefer_local_reconnect_on_listen_address

2021-03-26 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17309445#comment-17309445
 ] 

Ekaterina Dimitrova commented on CASSANDRA-13196:
-

Thank you :) 

> test failure in 
> snitch_test.TestGossipingPropertyFileSnitch.test_prefer_local_reconnect_on_listen_address
> -
>
> Key: CASSANDRA-13196
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13196
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Testing
>Reporter: Michael Shuler
>Assignee: Brandon Williams
>Priority: Normal
>  Labels: dtest, test-failure
> Fix For: 4.0, 4.0-rc
>
> Attachments: node1.log, node1_debug.log, node1_gc.log, node2.log, 
> node2_debug.log, node2_gc.log
>
>
> example failure:
> http://cassci.datastax.com/job/trunk_dtest/1487/testReport/snitch_test/TestGossipingPropertyFileSnitch/test_prefer_local_reconnect_on_listen_address
> {code}
> {novnode}
> Error Message
> Error from server: code=2200 [Invalid query] message="keyspace keyspace1 does 
> not exist"
>  >> begin captured logging << 
> dtest: DEBUG: cluster ccm directory: /tmp/dtest-k6b0iF
> dtest: DEBUG: Done setting configuration options:
> {   'initial_token': None,
> 'num_tokens': '32',
> 'phi_convict_threshold': 5,
> 'range_request_timeout_in_ms': 1,
> 'read_request_timeout_in_ms': 1,
> 'request_timeout_in_ms': 1,
> 'truncate_request_timeout_in_ms': 1,
> 'write_request_timeout_in_ms': 1}
> cassandra.policies: INFO: Using datacenter 'dc1' for DCAwareRoundRobinPolicy 
> (via host '127.0.0.1'); if incorrect, please specify a local_dc to the 
> constructor, or limit contact points to local cluster nodes
> cassandra.cluster: INFO: New Cassandra host  discovered
> - >> end captured logging << -
> Stacktrace
>   File "/usr/lib/python2.7/unittest/case.py", line 329, in run
> testMethod()
>   File "/home/automaton/cassandra-dtest/snitch_test.py", line 87, in 
> test_prefer_local_reconnect_on_listen_address
> new_rows = list(session.execute("SELECT * FROM {}".format(stress_table)))
>   File "/home/automaton/src/cassandra-driver/cassandra/cluster.py", line 
> 1998, in execute
> return self.execute_async(query, parameters, trace, custom_payload, 
> timeout, execution_profile, paging_state).result()
>   File "/home/automaton/src/cassandra-driver/cassandra/cluster.py", line 
> 3784, in result
> raise self._final_exception
> 'Error from server: code=2200 [Invalid query] message="keyspace keyspace1 
> does not exist"\n >> begin captured logging << 
> \ndtest: DEBUG: cluster ccm directory: 
> /tmp/dtest-k6b0iF\ndtest: DEBUG: Done setting configuration options:\n{   
> \'initial_token\': None,\n\'num_tokens\': \'32\',\n
> \'phi_convict_threshold\': 5,\n\'range_request_timeout_in_ms\': 1,\n  
>   \'read_request_timeout_in_ms\': 1,\n\'request_timeout_in_ms\': 
> 1,\n\'truncate_request_timeout_in_ms\': 1,\n
> \'write_request_timeout_in_ms\': 1}\ncassandra.policies: INFO: Using 
> datacenter \'dc1\' for DCAwareRoundRobinPolicy (via host \'127.0.0.1\'); if 
> incorrect, please specify a local_dc to the constructor, or limit contact 
> points to local cluster nodes\ncassandra.cluster: INFO: New Cassandra host 
>  discovered\n- >> end captured 
> logging << -'
> {novnode}
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13196) test failure in snitch_test.TestGossipingPropertyFileSnitch.test_prefer_local_reconnect_on_listen_address

2021-03-25 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17309049#comment-17309049
 ] 

Ekaterina Dimitrova commented on CASSANDRA-13196:
-

PS I ran the new version also locally with 3.0 and 3.11 and it passes 
successfully

> test failure in 
> snitch_test.TestGossipingPropertyFileSnitch.test_prefer_local_reconnect_on_listen_address
> -
>
> Key: CASSANDRA-13196
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13196
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Testing
>Reporter: Michael Shuler
>Assignee: Brandon Williams
>Priority: Normal
>  Labels: dtest, test-failure
> Fix For: 4.0, 4.0-rc
>
> Attachments: node1.log, node1_debug.log, node1_gc.log, node2.log, 
> node2_debug.log, node2_gc.log
>
>
> example failure:
> http://cassci.datastax.com/job/trunk_dtest/1487/testReport/snitch_test/TestGossipingPropertyFileSnitch/test_prefer_local_reconnect_on_listen_address
> {code}
> {novnode}
> Error Message
> Error from server: code=2200 [Invalid query] message="keyspace keyspace1 does 
> not exist"
>  >> begin captured logging << 
> dtest: DEBUG: cluster ccm directory: /tmp/dtest-k6b0iF
> dtest: DEBUG: Done setting configuration options:
> {   'initial_token': None,
> 'num_tokens': '32',
> 'phi_convict_threshold': 5,
> 'range_request_timeout_in_ms': 1,
> 'read_request_timeout_in_ms': 1,
> 'request_timeout_in_ms': 1,
> 'truncate_request_timeout_in_ms': 1,
> 'write_request_timeout_in_ms': 1}
> cassandra.policies: INFO: Using datacenter 'dc1' for DCAwareRoundRobinPolicy 
> (via host '127.0.0.1'); if incorrect, please specify a local_dc to the 
> constructor, or limit contact points to local cluster nodes
> cassandra.cluster: INFO: New Cassandra host  discovered
> - >> end captured logging << -
> Stacktrace
>   File "/usr/lib/python2.7/unittest/case.py", line 329, in run
> testMethod()
>   File "/home/automaton/cassandra-dtest/snitch_test.py", line 87, in 
> test_prefer_local_reconnect_on_listen_address
> new_rows = list(session.execute("SELECT * FROM {}".format(stress_table)))
>   File "/home/automaton/src/cassandra-driver/cassandra/cluster.py", line 
> 1998, in execute
> return self.execute_async(query, parameters, trace, custom_payload, 
> timeout, execution_profile, paging_state).result()
>   File "/home/automaton/src/cassandra-driver/cassandra/cluster.py", line 
> 3784, in result
> raise self._final_exception
> 'Error from server: code=2200 [Invalid query] message="keyspace keyspace1 
> does not exist"\n >> begin captured logging << 
> \ndtest: DEBUG: cluster ccm directory: 
> /tmp/dtest-k6b0iF\ndtest: DEBUG: Done setting configuration options:\n{   
> \'initial_token\': None,\n\'num_tokens\': \'32\',\n
> \'phi_convict_threshold\': 5,\n\'range_request_timeout_in_ms\': 1,\n  
>   \'read_request_timeout_in_ms\': 1,\n\'request_timeout_in_ms\': 
> 1,\n\'truncate_request_timeout_in_ms\': 1,\n
> \'write_request_timeout_in_ms\': 1}\ncassandra.policies: INFO: Using 
> datacenter \'dc1\' for DCAwareRoundRobinPolicy (via host \'127.0.0.1\'); if 
> incorrect, please specify a local_dc to the constructor, or limit contact 
> points to local cluster nodes\ncassandra.cluster: INFO: New Cassandra host 
>  discovered\n- >> end captured 
> logging << -'
> {novnode}
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13196) test failure in snitch_test.TestGossipingPropertyFileSnitch.test_prefer_local_reconnect_on_listen_address

2021-03-25 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17309046#comment-17309046
 ] 

Ekaterina Dimitrova commented on CASSANDRA-13196:
-

I am +1 on the new version of the test. The patterns are way more reliable and 
remove the need of updating the previous strings all the time. 

I haven't been able to reproduce the test failure or find recent failures. 
Looking at the log from Caleb what you say seems reasonable to me from code 
inspection. 

I suggest we commit the new version and close the ticket. If the issue 
reappears we will reopen it. WDYT?

> test failure in 
> snitch_test.TestGossipingPropertyFileSnitch.test_prefer_local_reconnect_on_listen_address
> -
>
> Key: CASSANDRA-13196
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13196
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Testing
>Reporter: Michael Shuler
>Assignee: Brandon Williams
>Priority: Normal
>  Labels: dtest, test-failure
> Fix For: 4.0, 4.0-rc
>
> Attachments: node1.log, node1_debug.log, node1_gc.log, node2.log, 
> node2_debug.log, node2_gc.log
>
>
> example failure:
> http://cassci.datastax.com/job/trunk_dtest/1487/testReport/snitch_test/TestGossipingPropertyFileSnitch/test_prefer_local_reconnect_on_listen_address
> {code}
> {novnode}
> Error Message
> Error from server: code=2200 [Invalid query] message="keyspace keyspace1 does 
> not exist"
>  >> begin captured logging << 
> dtest: DEBUG: cluster ccm directory: /tmp/dtest-k6b0iF
> dtest: DEBUG: Done setting configuration options:
> {   'initial_token': None,
> 'num_tokens': '32',
> 'phi_convict_threshold': 5,
> 'range_request_timeout_in_ms': 1,
> 'read_request_timeout_in_ms': 1,
> 'request_timeout_in_ms': 1,
> 'truncate_request_timeout_in_ms': 1,
> 'write_request_timeout_in_ms': 1}
> cassandra.policies: INFO: Using datacenter 'dc1' for DCAwareRoundRobinPolicy 
> (via host '127.0.0.1'); if incorrect, please specify a local_dc to the 
> constructor, or limit contact points to local cluster nodes
> cassandra.cluster: INFO: New Cassandra host  discovered
> - >> end captured logging << -
> Stacktrace
>   File "/usr/lib/python2.7/unittest/case.py", line 329, in run
> testMethod()
>   File "/home/automaton/cassandra-dtest/snitch_test.py", line 87, in 
> test_prefer_local_reconnect_on_listen_address
> new_rows = list(session.execute("SELECT * FROM {}".format(stress_table)))
>   File "/home/automaton/src/cassandra-driver/cassandra/cluster.py", line 
> 1998, in execute
> return self.execute_async(query, parameters, trace, custom_payload, 
> timeout, execution_profile, paging_state).result()
>   File "/home/automaton/src/cassandra-driver/cassandra/cluster.py", line 
> 3784, in result
> raise self._final_exception
> 'Error from server: code=2200 [Invalid query] message="keyspace keyspace1 
> does not exist"\n >> begin captured logging << 
> \ndtest: DEBUG: cluster ccm directory: 
> /tmp/dtest-k6b0iF\ndtest: DEBUG: Done setting configuration options:\n{   
> \'initial_token\': None,\n\'num_tokens\': \'32\',\n
> \'phi_convict_threshold\': 5,\n\'range_request_timeout_in_ms\': 1,\n  
>   \'read_request_timeout_in_ms\': 1,\n\'request_timeout_in_ms\': 
> 1,\n\'truncate_request_timeout_in_ms\': 1,\n
> \'write_request_timeout_in_ms\': 1}\ncassandra.policies: INFO: Using 
> datacenter \'dc1\' for DCAwareRoundRobinPolicy (via host \'127.0.0.1\'); if 
> incorrect, please specify a local_dc to the constructor, or limit contact 
> points to local cluster nodes\ncassandra.cluster: INFO: New Cassandra host 
>  discovered\n- >> end captured 
> logging << -'
> {novnode}
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13196) test failure in snitch_test.TestGossipingPropertyFileSnitch.test_prefer_local_reconnect_on_listen_address

2021-03-18 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304200#comment-17304200
 ] 

Brandon Williams commented on CASSANDRA-13196:
--

Patch to simplify a good portion of this test with a sprinkle of regular 
expressions.  I believe the flakiness may have been looking for the INTERNAL_IP 
state on 4.0, which could be affected by CASSANDRA-16381 and CASSANDRA-16525, 
but it is neither necessary nor logical to look for on a clean 4.0+ cluster.

[!https://ci-cassandra.apache.org/job/Cassandra-devbranch/497/badge/icon!|https://ci-cassandra.apache.org/blue/organizations/jenkins/Cassandra-devbranch/detail/Cassandra-devbranch/497/pipeline]


> test failure in 
> snitch_test.TestGossipingPropertyFileSnitch.test_prefer_local_reconnect_on_listen_address
> -
>
> Key: CASSANDRA-13196
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13196
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Testing
>Reporter: Michael Shuler
>Assignee: Brandon Williams
>Priority: Normal
>  Labels: dtest, test-failure
> Fix For: 4.0, 4.0-rc
>
> Attachments: node1.log, node1_debug.log, node1_gc.log, node2.log, 
> node2_debug.log, node2_gc.log
>
>
> example failure:
> http://cassci.datastax.com/job/trunk_dtest/1487/testReport/snitch_test/TestGossipingPropertyFileSnitch/test_prefer_local_reconnect_on_listen_address
> {code}
> {novnode}
> Error Message
> Error from server: code=2200 [Invalid query] message="keyspace keyspace1 does 
> not exist"
>  >> begin captured logging << 
> dtest: DEBUG: cluster ccm directory: /tmp/dtest-k6b0iF
> dtest: DEBUG: Done setting configuration options:
> {   'initial_token': None,
> 'num_tokens': '32',
> 'phi_convict_threshold': 5,
> 'range_request_timeout_in_ms': 1,
> 'read_request_timeout_in_ms': 1,
> 'request_timeout_in_ms': 1,
> 'truncate_request_timeout_in_ms': 1,
> 'write_request_timeout_in_ms': 1}
> cassandra.policies: INFO: Using datacenter 'dc1' for DCAwareRoundRobinPolicy 
> (via host '127.0.0.1'); if incorrect, please specify a local_dc to the 
> constructor, or limit contact points to local cluster nodes
> cassandra.cluster: INFO: New Cassandra host  discovered
> - >> end captured logging << -
> Stacktrace
>   File "/usr/lib/python2.7/unittest/case.py", line 329, in run
> testMethod()
>   File "/home/automaton/cassandra-dtest/snitch_test.py", line 87, in 
> test_prefer_local_reconnect_on_listen_address
> new_rows = list(session.execute("SELECT * FROM {}".format(stress_table)))
>   File "/home/automaton/src/cassandra-driver/cassandra/cluster.py", line 
> 1998, in execute
> return self.execute_async(query, parameters, trace, custom_payload, 
> timeout, execution_profile, paging_state).result()
>   File "/home/automaton/src/cassandra-driver/cassandra/cluster.py", line 
> 3784, in result
> raise self._final_exception
> 'Error from server: code=2200 [Invalid query] message="keyspace keyspace1 
> does not exist"\n >> begin captured logging << 
> \ndtest: DEBUG: cluster ccm directory: 
> /tmp/dtest-k6b0iF\ndtest: DEBUG: Done setting configuration options:\n{   
> \'initial_token\': None,\n\'num_tokens\': \'32\',\n
> \'phi_convict_threshold\': 5,\n\'range_request_timeout_in_ms\': 1,\n  
>   \'read_request_timeout_in_ms\': 1,\n\'request_timeout_in_ms\': 
> 1,\n\'truncate_request_timeout_in_ms\': 1,\n
> \'write_request_timeout_in_ms\': 1}\ncassandra.policies: INFO: Using 
> datacenter \'dc1\' for DCAwareRoundRobinPolicy (via host \'127.0.0.1\'); if 
> incorrect, please specify a local_dc to the constructor, or limit contact 
> points to local cluster nodes\ncassandra.cluster: INFO: New Cassandra host 
>  discovered\n- >> end captured 
> logging << -'
> {novnode}
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13196) test failure in snitch_test.TestGossipingPropertyFileSnitch.test_prefer_local_reconnect_on_listen_address

2021-03-05 Thread Caleb Rackliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17296233#comment-17296233
 ] 

Caleb Rackliffe commented on CASSANDRA-13196:
-

Failed again here: 
https://ci-cassandra.apache.org/blue/organizations/jenkins/Cassandra-devbranch/detail/Cassandra-devbranch/459/tests/

> test failure in 
> snitch_test.TestGossipingPropertyFileSnitch.test_prefer_local_reconnect_on_listen_address
> -
>
> Key: CASSANDRA-13196
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13196
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Testing
>Reporter: Michael Shuler
>Priority: Normal
>  Labels: dtest, test-failure
> Attachments: node1.log, node1_debug.log, node1_gc.log, node2.log, 
> node2_debug.log, node2_gc.log
>
>
> example failure:
> http://cassci.datastax.com/job/trunk_dtest/1487/testReport/snitch_test/TestGossipingPropertyFileSnitch/test_prefer_local_reconnect_on_listen_address
> {code}
> {novnode}
> Error Message
> Error from server: code=2200 [Invalid query] message="keyspace keyspace1 does 
> not exist"
>  >> begin captured logging << 
> dtest: DEBUG: cluster ccm directory: /tmp/dtest-k6b0iF
> dtest: DEBUG: Done setting configuration options:
> {   'initial_token': None,
> 'num_tokens': '32',
> 'phi_convict_threshold': 5,
> 'range_request_timeout_in_ms': 1,
> 'read_request_timeout_in_ms': 1,
> 'request_timeout_in_ms': 1,
> 'truncate_request_timeout_in_ms': 1,
> 'write_request_timeout_in_ms': 1}
> cassandra.policies: INFO: Using datacenter 'dc1' for DCAwareRoundRobinPolicy 
> (via host '127.0.0.1'); if incorrect, please specify a local_dc to the 
> constructor, or limit contact points to local cluster nodes
> cassandra.cluster: INFO: New Cassandra host  discovered
> - >> end captured logging << -
> Stacktrace
>   File "/usr/lib/python2.7/unittest/case.py", line 329, in run
> testMethod()
>   File "/home/automaton/cassandra-dtest/snitch_test.py", line 87, in 
> test_prefer_local_reconnect_on_listen_address
> new_rows = list(session.execute("SELECT * FROM {}".format(stress_table)))
>   File "/home/automaton/src/cassandra-driver/cassandra/cluster.py", line 
> 1998, in execute
> return self.execute_async(query, parameters, trace, custom_payload, 
> timeout, execution_profile, paging_state).result()
>   File "/home/automaton/src/cassandra-driver/cassandra/cluster.py", line 
> 3784, in result
> raise self._final_exception
> 'Error from server: code=2200 [Invalid query] message="keyspace keyspace1 
> does not exist"\n >> begin captured logging << 
> \ndtest: DEBUG: cluster ccm directory: 
> /tmp/dtest-k6b0iF\ndtest: DEBUG: Done setting configuration options:\n{   
> \'initial_token\': None,\n\'num_tokens\': \'32\',\n
> \'phi_convict_threshold\': 5,\n\'range_request_timeout_in_ms\': 1,\n  
>   \'read_request_timeout_in_ms\': 1,\n\'request_timeout_in_ms\': 
> 1,\n\'truncate_request_timeout_in_ms\': 1,\n
> \'write_request_timeout_in_ms\': 1}\ncassandra.policies: INFO: Using 
> datacenter \'dc1\' for DCAwareRoundRobinPolicy (via host \'127.0.0.1\'); if 
> incorrect, please specify a local_dc to the constructor, or limit contact 
> points to local cluster nodes\ncassandra.cluster: INFO: New Cassandra host 
>  discovered\n- >> end captured 
> logging << -'
> {novnode}
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13196) test failure in snitch_test.TestGossipingPropertyFileSnitch.test_prefer_local_reconnect_on_listen_address

2017-04-01 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952462#comment-15952462
 ] 

Jeff Jirsa commented on CASSANDRA-13196:


Wouldn't be surprised if there was a race condition there - there was a 
not-dissimilar race solved recently in CASSANDRA-12653 where the race was in 
setting up token metadata as the node came out of shadow round, and this is 
fairly similar - we come out of shadow round at {{2017-02-06 22:13:17,494}} , 
we submit the migration tasks at {{2017-02-06 22:13:20,622}} and immediately 
{{2017-02-06 22:13:20,623}} decide not to send it, and FD finally sees the 
nodes come up at {{2017-02-06 22:13:20,665}} - at the very least, I'm not sure 
why we'd even try to submit the migration task knowing the instance was down - 
requeueing the schema pull immediately on failure here definitely wouldn't have 
helped (we'd have failed to send it again, as the instance was still down).

I'm not sure of the history here, but it seems like 
[MigrationManager#shouldPullSchemaFrom|https://github.com/Ge/cassandra/blob/463f3fecd9348ea0a4ce6eeeb30141527b8b10eb/src/java/org/apache/cassandra/schema/MigrationManager.java#L125]
 could potentially check that endpoint's UP/DOWN in addition to messaging 
version.  


> test failure in 
> snitch_test.TestGossipingPropertyFileSnitch.test_prefer_local_reconnect_on_listen_address
> -
>
> Key: CASSANDRA-13196
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13196
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Michael Shuler
>Assignee: Aleksandr Sorokoumov
>  Labels: dtest, test-failure
> Attachments: node1_debug.log, node1_gc.log, node1.log, 
> node2_debug.log, node2_gc.log, node2.log
>
>
> example failure:
> http://cassci.datastax.com/job/trunk_dtest/1487/testReport/snitch_test/TestGossipingPropertyFileSnitch/test_prefer_local_reconnect_on_listen_address
> {code}
> {novnode}
> Error Message
> Error from server: code=2200 [Invalid query] message="keyspace keyspace1 does 
> not exist"
>  >> begin captured logging << 
> dtest: DEBUG: cluster ccm directory: /tmp/dtest-k6b0iF
> dtest: DEBUG: Done setting configuration options:
> {   'initial_token': None,
> 'num_tokens': '32',
> 'phi_convict_threshold': 5,
> 'range_request_timeout_in_ms': 1,
> 'read_request_timeout_in_ms': 1,
> 'request_timeout_in_ms': 1,
> 'truncate_request_timeout_in_ms': 1,
> 'write_request_timeout_in_ms': 1}
> cassandra.policies: INFO: Using datacenter 'dc1' for DCAwareRoundRobinPolicy 
> (via host '127.0.0.1'); if incorrect, please specify a local_dc to the 
> constructor, or limit contact points to local cluster nodes
> cassandra.cluster: INFO: New Cassandra host  discovered
> - >> end captured logging << -
> Stacktrace
>   File "/usr/lib/python2.7/unittest/case.py", line 329, in run
> testMethod()
>   File "/home/automaton/cassandra-dtest/snitch_test.py", line 87, in 
> test_prefer_local_reconnect_on_listen_address
> new_rows = list(session.execute("SELECT * FROM {}".format(stress_table)))
>   File "/home/automaton/src/cassandra-driver/cassandra/cluster.py", line 
> 1998, in execute
> return self.execute_async(query, parameters, trace, custom_payload, 
> timeout, execution_profile, paging_state).result()
>   File "/home/automaton/src/cassandra-driver/cassandra/cluster.py", line 
> 3784, in result
> raise self._final_exception
> 'Error from server: code=2200 [Invalid query] message="keyspace keyspace1 
> does not exist"\n >> begin captured logging << 
> \ndtest: DEBUG: cluster ccm directory: 
> /tmp/dtest-k6b0iF\ndtest: DEBUG: Done setting configuration options:\n{   
> \'initial_token\': None,\n\'num_tokens\': \'32\',\n
> \'phi_convict_threshold\': 5,\n\'range_request_timeout_in_ms\': 1,\n  
>   \'read_request_timeout_in_ms\': 1,\n\'request_timeout_in_ms\': 
> 1,\n\'truncate_request_timeout_in_ms\': 1,\n
> \'write_request_timeout_in_ms\': 1}\ncassandra.policies: INFO: Using 
> datacenter \'dc1\' for DCAwareRoundRobinPolicy (via host \'127.0.0.1\'); if 
> incorrect, please specify a local_dc to the constructor, or limit contact 
> points to local cluster nodes\ncassandra.cluster: INFO: New Cassandra host 
>  discovered\n- >> end captured 
> logging << -'
> {novnode}
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13196) test failure in snitch_test.TestGossipingPropertyFileSnitch.test_prefer_local_reconnect_on_listen_address

2017-04-01 Thread Aleksandr Sorokoumov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952407#comment-15952407
 ] 

Aleksandr Sorokoumov commented on CASSANDRA-13196:
--

Hi Jeff,

Thank you for the feedback and the reference to CASSANDRA-11748, I agree that 
blind retry might not be the best approach here. I checked available logs once 
again, compared them against the logs of the passing tests and found the 
following details. Here are the selected parts including 2 out of 3 failing 
migration tasks (node2_debug.log):

{NOFORMAT}
DEBUG [GossipStage:1] 2017-02-06 22:13:17,494 Gossiper.java:1524 - Received a 
regular ack from /127.0.0.3, can now exit shadow round
DEBUG [RequestResponseStage-1] 2017-02-06 22:13:17,544 Gossiper.java:1013 - 
removing expire time for endpoint : /127.0.0.3
INFO  [RequestResponseStage-1] 2017-02-06 22:13:17,544 Gossiper.java:1014 - 
InetAddress /127.0.0.3 is now UP

...

DEBUG [MessagingService-Outgoing-/127.0.0.3-Small] 2017-02-06 22:13:19,540 
OutboundTcpConnection.java:389 - Attempting to connect to /127.0.0.3
INFO  [HANDSHAKE-/127.0.0.3] 2017-02-06 22:13:19,542 
OutboundTcpConnection.java:502 - Handshaking version with /127.0.0.3
DEBUG [MessagingService-Outgoing-/127.0.0.3-Small] 2017-02-06 22:13:19,547 
OutboundTcpConnection.java:474 - Done connecting to /127.0.0.3
INFO  [ScheduledTasks:1] 2017-02-06 22:13:20,175 TokenMetadata.java:498 - 
Updating topology for all endpoints that have changed
DEBUG [GossipStage:1] 2017-02-06 22:13:20,582 FailureDetector.java:457 - 
Ignoring interval time of 3091069037 for /127.0.0.3
INFO  [GossipStage:1] 2017-02-06 22:13:20,583 Gossiper.java:1050 - Node 
/127.0.0.3 is now part of the cluster
DEBUG [GossipStage:1] 2017-02-06 22:13:20,585 StorageService.java:2183 - Node 
/127.0.0.3 state NORMAL, token [-1004975124483382337, -1043780946254161012, 
-1171220283294501102, -1633901128898670626, -2841540313718334770, 
-3101553897858755861, -3621837948233406461, -3885385423954957338, 
-4732725481887392373, -6179581950298736502, -6981606078877819247, 
-7346004435634569033, -8212720784299798818, -8540770738419743277, 
-9144476684623172343, -9177835532129132670, 2008413755392462902, 
2811132152336883532, 3686445661174337927, 3782543114949044043, 
4077664166716898608, 4223225346596811804, 4829686036948015467, 
5983186107673554538, 645274277526311, 6973900262646615849, 
7254341221965284892,
8057149706767269849, 8062244717118882837, 8473826416444317568, 
8957680003056564562, 9045604815824506670]

...

DEBUG [GossipStage:1] 2017-02-06 22:13:20,622 MigrationManager.java:85 - 
Submitting migration task for /127.0.0.3
INFO  [GossipStage:1] 2017-02-06 22:13:20,623 TokenMetadata.java:479 - Updating 
topology for /127.0.0.3
DEBUG [MigrationStage:1] 2017-02-06 22:13:20,623 MigrationTask.java:74 - Can't 
send schema pull request: node /127.0.0.3 is down.
INFO  [GossipStage:1] 2017-02-06 22:13:20,628 TokenMetadata.java:479 - Updating 
topology for /127.0.0.3
DEBUG [GossipStage:1] 2017-02-06 22:13:20,631 MigrationManager.java:85 - 
Submitting migration task for /127.0.0.3
DEBUG [MigrationStage:1] 2017-02-06 22:13:20,631 MigrationTask.java:74 - Can't 
send schema pull request: node /127.0.0.3 is down.
...
DEBUG [RequestResponseStage-1] 2017-02-06 22:13:20,665 Gossiper.java:1013 - 
removing expire time for endpoint : /127.0.0.3
INFO  [RequestResponseStage-1] 2017-02-06 22:13:20,665 Gossiper.java:1014 - 
InetAddress /127.0.0.3 is now UP
{NOFORMAT}

Following the reconnect ("Attempting to connect to /127.0.0.3") the migration 
tasks were submitted after the endpoint is considered NORMAL by the 
StorageService, but before it was marked alive by Gossiper.realMarkAlive. Can 
there be a rare race condition based on the order of these actions?
Can you please tell me if this might make sense or provide with any suggestions 
on what to look at?

Thanks,
Aleksandr

> test failure in 
> snitch_test.TestGossipingPropertyFileSnitch.test_prefer_local_reconnect_on_listen_address
> -
>
> Key: CASSANDRA-13196
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13196
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Michael Shuler
>Assignee: Aleksandr Sorokoumov
>  Labels: dtest, test-failure
> Attachments: node1_debug.log, node1_gc.log, node1.log, 
> node2_debug.log, node2_gc.log, node2.log
>
>
> example failure:
> http://cassci.datastax.com/job/trunk_dtest/1487/testReport/snitch_test/TestGossipingPropertyFileSnitch/test_prefer_local_reconnect_on_listen_address
> {code}
> {novnode}
> Error Message
> Error from server: code=2200 [Invalid query] message="keyspace keyspace1 does 
> not exist"
>  >> begin captured logging << 
> dtest: DEBUG: cluster 

[jira] [Commented] (CASSANDRA-13196) test failure in snitch_test.TestGossipingPropertyFileSnitch.test_prefer_local_reconnect_on_listen_address

2017-03-12 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15906715#comment-15906715
 ] 

Jeff Jirsa commented on CASSANDRA-13196:


There's a real risk (in large clusters, or in clusters with large schemas, or 
when upgrading versions where we run in a mixed-version state) that we can have 
a lot of migrationtasks in flight, so much so that we can actually kill nodes ( 
see CASSANDRA-11748 for example ) - re-queueing more migration tasks when one 
times out is a good way to make the problem worse, not better. I'm very 
concerned with the approach 
[here|https://github.com/Ge/cassandra/commit/463f3fecd9348ea0a4ce6eeeb30141527b8b10eb#diff-f484a759f797776d9cc5d8af92b29e5eR156]
 where we just blindly schedule another poll. 

Do we even know why this failed in the first place? Isn't the right fix 
understanding why all 3 migration tasks failed, not just making more and more 
and more migration tasks?

> test failure in 
> snitch_test.TestGossipingPropertyFileSnitch.test_prefer_local_reconnect_on_listen_address
> -
>
> Key: CASSANDRA-13196
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13196
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Michael Shuler
>Assignee: Aleksandr Sorokoumov
>  Labels: dtest, test-failure
> Attachments: node1_debug.log, node1_gc.log, node1.log, 
> node2_debug.log, node2_gc.log, node2.log
>
>
> example failure:
> http://cassci.datastax.com/job/trunk_dtest/1487/testReport/snitch_test/TestGossipingPropertyFileSnitch/test_prefer_local_reconnect_on_listen_address
> {code}
> {novnode}
> Error Message
> Error from server: code=2200 [Invalid query] message="keyspace keyspace1 does 
> not exist"
>  >> begin captured logging << 
> dtest: DEBUG: cluster ccm directory: /tmp/dtest-k6b0iF
> dtest: DEBUG: Done setting configuration options:
> {   'initial_token': None,
> 'num_tokens': '32',
> 'phi_convict_threshold': 5,
> 'range_request_timeout_in_ms': 1,
> 'read_request_timeout_in_ms': 1,
> 'request_timeout_in_ms': 1,
> 'truncate_request_timeout_in_ms': 1,
> 'write_request_timeout_in_ms': 1}
> cassandra.policies: INFO: Using datacenter 'dc1' for DCAwareRoundRobinPolicy 
> (via host '127.0.0.1'); if incorrect, please specify a local_dc to the 
> constructor, or limit contact points to local cluster nodes
> cassandra.cluster: INFO: New Cassandra host  discovered
> - >> end captured logging << -
> Stacktrace
>   File "/usr/lib/python2.7/unittest/case.py", line 329, in run
> testMethod()
>   File "/home/automaton/cassandra-dtest/snitch_test.py", line 87, in 
> test_prefer_local_reconnect_on_listen_address
> new_rows = list(session.execute("SELECT * FROM {}".format(stress_table)))
>   File "/home/automaton/src/cassandra-driver/cassandra/cluster.py", line 
> 1998, in execute
> return self.execute_async(query, parameters, trace, custom_payload, 
> timeout, execution_profile, paging_state).result()
>   File "/home/automaton/src/cassandra-driver/cassandra/cluster.py", line 
> 3784, in result
> raise self._final_exception
> 'Error from server: code=2200 [Invalid query] message="keyspace keyspace1 
> does not exist"\n >> begin captured logging << 
> \ndtest: DEBUG: cluster ccm directory: 
> /tmp/dtest-k6b0iF\ndtest: DEBUG: Done setting configuration options:\n{   
> \'initial_token\': None,\n\'num_tokens\': \'32\',\n
> \'phi_convict_threshold\': 5,\n\'range_request_timeout_in_ms\': 1,\n  
>   \'read_request_timeout_in_ms\': 1,\n\'request_timeout_in_ms\': 
> 1,\n\'truncate_request_timeout_in_ms\': 1,\n
> \'write_request_timeout_in_ms\': 1}\ncassandra.policies: INFO: Using 
> datacenter \'dc1\' for DCAwareRoundRobinPolicy (via host \'127.0.0.1\'); if 
> incorrect, please specify a local_dc to the constructor, or limit contact 
> points to local cluster nodes\ncassandra.cluster: INFO: New Cassandra host 
>  discovered\n- >> end captured 
> logging << -'
> {novnode}
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13196) test failure in snitch_test.TestGossipingPropertyFileSnitch.test_prefer_local_reconnect_on_listen_address

2017-02-16 Thread Aleksandr Sorokoumov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15870729#comment-15870729
 ] 

Aleksandr Sorokoumov commented on CASSANDRA-13196:
--

Hello,

I have been checking out what happened with this test and found out that the 
keyspace1 does not exist on the node2 because all 3 submitted migration tasks 
failed to complete.
After there were no more migration tasks in progress 
{{MigrationManager.waitUntilReadyForBootstrap}} returned and the node 
bootstrapped.

>From the discussion in CASSANDRA-10731 I understood that bootstrapping a node 
>with out of sync schema is undesired behavior.
In case the {{MigrationTask.inflightTasks}} queue is empty and the schema 
wasn't pulled yet, does it make sense to schedule more MigrationTasks one by 
one until at least one of them succeeds?

What do you think?

> test failure in 
> snitch_test.TestGossipingPropertyFileSnitch.test_prefer_local_reconnect_on_listen_address
> -
>
> Key: CASSANDRA-13196
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13196
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Michael Shuler
>  Labels: dtest, test-failure
> Attachments: node1_debug.log, node1_gc.log, node1.log, 
> node2_debug.log, node2_gc.log, node2.log
>
>
> example failure:
> http://cassci.datastax.com/job/trunk_dtest/1487/testReport/snitch_test/TestGossipingPropertyFileSnitch/test_prefer_local_reconnect_on_listen_address
> {code}
> {novnode}
> Error Message
> Error from server: code=2200 [Invalid query] message="keyspace keyspace1 does 
> not exist"
>  >> begin captured logging << 
> dtest: DEBUG: cluster ccm directory: /tmp/dtest-k6b0iF
> dtest: DEBUG: Done setting configuration options:
> {   'initial_token': None,
> 'num_tokens': '32',
> 'phi_convict_threshold': 5,
> 'range_request_timeout_in_ms': 1,
> 'read_request_timeout_in_ms': 1,
> 'request_timeout_in_ms': 1,
> 'truncate_request_timeout_in_ms': 1,
> 'write_request_timeout_in_ms': 1}
> cassandra.policies: INFO: Using datacenter 'dc1' for DCAwareRoundRobinPolicy 
> (via host '127.0.0.1'); if incorrect, please specify a local_dc to the 
> constructor, or limit contact points to local cluster nodes
> cassandra.cluster: INFO: New Cassandra host  discovered
> - >> end captured logging << -
> Stacktrace
>   File "/usr/lib/python2.7/unittest/case.py", line 329, in run
> testMethod()
>   File "/home/automaton/cassandra-dtest/snitch_test.py", line 87, in 
> test_prefer_local_reconnect_on_listen_address
> new_rows = list(session.execute("SELECT * FROM {}".format(stress_table)))
>   File "/home/automaton/src/cassandra-driver/cassandra/cluster.py", line 
> 1998, in execute
> return self.execute_async(query, parameters, trace, custom_payload, 
> timeout, execution_profile, paging_state).result()
>   File "/home/automaton/src/cassandra-driver/cassandra/cluster.py", line 
> 3784, in result
> raise self._final_exception
> 'Error from server: code=2200 [Invalid query] message="keyspace keyspace1 
> does not exist"\n >> begin captured logging << 
> \ndtest: DEBUG: cluster ccm directory: 
> /tmp/dtest-k6b0iF\ndtest: DEBUG: Done setting configuration options:\n{   
> \'initial_token\': None,\n\'num_tokens\': \'32\',\n
> \'phi_convict_threshold\': 5,\n\'range_request_timeout_in_ms\': 1,\n  
>   \'read_request_timeout_in_ms\': 1,\n\'request_timeout_in_ms\': 
> 1,\n\'truncate_request_timeout_in_ms\': 1,\n
> \'write_request_timeout_in_ms\': 1}\ncassandra.policies: INFO: Using 
> datacenter \'dc1\' for DCAwareRoundRobinPolicy (via host \'127.0.0.1\'); if 
> incorrect, please specify a local_dc to the constructor, or limit contact 
> points to local cluster nodes\ncassandra.cluster: INFO: New Cassandra host 
>  discovered\n- >> end captured 
> logging << -'
> {novnode}
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)