[jira] [Commented] (CASSANDRA-11143) Schema changes don't propagate correctly if nodes are down

2016-02-11 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15142713#comment-15142713
 ] 

Aleksey Yeschenko commented on CASSANDRA-11143:
---

This is a direct consequence of CASSANDRA-5202; when you recreate that table, 
it gets a new id (a timeuuid). When the downed node goes up and receives the 
delta, it's erroring out on a mismatch.

This is a known issues and it will not go away any time soon - until we rewrite 
our schema propagation code.

> Schema changes don't propagate correctly if nodes are down
> --
>
> Key: CASSANDRA-11143
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11143
> Project: Cassandra
>  Issue Type: Bug
> Environment: PROD
>Reporter: Anubhav Kale
>
> We saw a problem similar to what I describe below in our PROD environment a 
> few times. Below is a consistent repro. We can change the priority to Minor 
> since there is a workaround, though.
> Using steps from 
> http://stackoverflow.com/questions/22513979/setting-up-cassandra-multi-node-cluster-on-a-single-ubuntu-server/25348301#25348301,
>  setup a two node cluster locally. 
> . Bring up both nodes
> . Create a table, and ensure cqlsh is correctly showing it on both nodes.
> . Bring down one node
> . Drop and re-create the same table Or change some schema in the table.
> . Bring up the down node.
> You will notice the exceptions like below (because of schema mismatch), and 
> the new schema never propagates to this node that was down ((meaning  a 
> select * via cqlsh will continue to show old schema for the table). I let the 
> cluster run for an hour to see if gossip will somehow catch up. 
> However, the interesting part is if you restart this node that was down when 
> schema changes were made, the exception below goes away and it gets new 
> schema correctly. 
> What is it caching that a second restart is necessary to make it behave 
> correctly ?
> ERROR 00:23:33 Configuration exception merging remote schema
> org.apache.cassandra.exceptions.ConfigurationException: Column family ID 
> mismatch (found 7208d260-cf8c-11e5-a13b-fb6871b443fb; expected 
> e2839010-cf7e-11e5-a13b-fb6871b443fb)
>   at 
> org.apache.cassandra.config.CFMetaData.validateCompatibility(CFMetaData.java:783)
>  ~[main/:na]
>   at org.apache.cassandra.config.CFMetaData.apply(CFMetaData.java:743) 
> ~[main/:na]
>   at org.apache.cassandra.config.Schema.updateTable(Schema.java:626) 
> ~[main/:na]
>   at org.apach



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11143) Schema changes don't propagate correctly if nodes are down

2016-02-11 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15142716#comment-15142716
 ] 

Aleksey Yeschenko commented on CASSANDRA-11143:
---

The *exact* issue has been raised before, but I'll mark the ticket as a 
duplicate of CASSANDRA-10699 instead.

> Schema changes don't propagate correctly if nodes are down
> --
>
> Key: CASSANDRA-11143
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11143
> Project: Cassandra
>  Issue Type: Bug
> Environment: PROD
>Reporter: Anubhav Kale
>
> We saw a problem similar to what I describe below in our PROD environment a 
> few times. Below is a consistent repro. We can change the priority to Minor 
> since there is a workaround, though.
> Using steps from 
> http://stackoverflow.com/questions/22513979/setting-up-cassandra-multi-node-cluster-on-a-single-ubuntu-server/25348301#25348301,
>  setup a two node cluster locally. 
> . Bring up both nodes
> . Create a table, and ensure cqlsh is correctly showing it on both nodes.
> . Bring down one node
> . Drop and re-create the same table Or change some schema in the table.
> . Bring up the down node.
> You will notice the exceptions like below (because of schema mismatch), and 
> the new schema never propagates to this node that was down ((meaning  a 
> select * via cqlsh will continue to show old schema for the table). I let the 
> cluster run for an hour to see if gossip will somehow catch up. 
> However, the interesting part is if you restart this node that was down when 
> schema changes were made, the exception below goes away and it gets new 
> schema correctly. 
> What is it caching that a second restart is necessary to make it behave 
> correctly ?
> ERROR 00:23:33 Configuration exception merging remote schema
> org.apache.cassandra.exceptions.ConfigurationException: Column family ID 
> mismatch (found 7208d260-cf8c-11e5-a13b-fb6871b443fb; expected 
> e2839010-cf7e-11e5-a13b-fb6871b443fb)
>   at 
> org.apache.cassandra.config.CFMetaData.validateCompatibility(CFMetaData.java:783)
>  ~[main/:na]
>   at org.apache.cassandra.config.CFMetaData.apply(CFMetaData.java:743) 
> ~[main/:na]
>   at org.apache.cassandra.config.Schema.updateTable(Schema.java:626) 
> ~[main/:na]
>   at org.apach



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11143) Schema changes don't propagate correctly if nodes are down

2016-02-10 Thread Anubhav Kale (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15141799#comment-15141799
 ] 

Anubhav Kale commented on CASSANDRA-11143:
--

After digging through code, it appears that the cached data in CFMetadata isn't 
refreshed when system_schema.tables is changed in SchemaKeyspace.MergeSchema 
(mutations.forEach line). This leads to the check in validateCompatibility 
failing. 

On reboot, the node refreshes this data from disk so everything works correctly 
from that point onward.

Is this the expected behavior ? Seems odd to me.


> Schema changes don't propagate correctly if nodes are down
> --
>
> Key: CASSANDRA-11143
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11143
> Project: Cassandra
>  Issue Type: Bug
> Environment: PROD
>Reporter: Anubhav Kale
>
> We saw a problem similar to what I describe below in our PROD environment a 
> few times. Below is a consistent repro. We can change the priority to Minor 
> since there is a workaround, though.
> Using steps from 
> http://stackoverflow.com/questions/22513979/setting-up-cassandra-multi-node-cluster-on-a-single-ubuntu-server/25348301#25348301,
>  setup a two node cluster locally. 
> . Bring up both nodes
> . Create a table, and ensure cqlsh is correctly showing it on both nodes.
> . Bring down one node
> . Drop and re-create the same table Or change some schema in the table.
> . Bring up the down node.
> You will notice the exceptions like below (because of schema mismatch), and 
> the new schema never propagates to this node that was down ((meaning  a 
> select * via cqlsh will continue to show old schema for the table). I let the 
> cluster run for an hour to see if gossip will somehow catch up. 
> However, the interesting part is if you restart this node that was down when 
> schema changes were made, the exception below goes away and it gets new 
> schema correctly. 
> What is it caching that a second restart is necessary to make it behave 
> correctly ?
> ERROR 00:23:33 Configuration exception merging remote schema
> org.apache.cassandra.exceptions.ConfigurationException: Column family ID 
> mismatch (found 7208d260-cf8c-11e5-a13b-fb6871b443fb; expected 
> e2839010-cf7e-11e5-a13b-fb6871b443fb)
>   at 
> org.apache.cassandra.config.CFMetaData.validateCompatibility(CFMetaData.java:783)
>  ~[main/:na]
>   at org.apache.cassandra.config.CFMetaData.apply(CFMetaData.java:743) 
> ~[main/:na]
>   at org.apache.cassandra.config.Schema.updateTable(Schema.java:626) 
> ~[main/:na]
>   at org.apach



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11143) Schema changes don't propagate correctly if nodes are down

2016-02-10 Thread Brandon Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15141805#comment-15141805
 ] 

Brandon Williams commented on CASSANDRA-11143:
--

ping [~iamaleksey]

> Schema changes don't propagate correctly if nodes are down
> --
>
> Key: CASSANDRA-11143
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11143
> Project: Cassandra
>  Issue Type: Bug
> Environment: PROD
>Reporter: Anubhav Kale
>
> We saw a problem similar to what I describe below in our PROD environment a 
> few times. Below is a consistent repro. We can change the priority to Minor 
> since there is a workaround, though.
> Using steps from 
> http://stackoverflow.com/questions/22513979/setting-up-cassandra-multi-node-cluster-on-a-single-ubuntu-server/25348301#25348301,
>  setup a two node cluster locally. 
> . Bring up both nodes
> . Create a table, and ensure cqlsh is correctly showing it on both nodes.
> . Bring down one node
> . Drop and re-create the same table Or change some schema in the table.
> . Bring up the down node.
> You will notice the exceptions like below (because of schema mismatch), and 
> the new schema never propagates to this node that was down ((meaning  a 
> select * via cqlsh will continue to show old schema for the table). I let the 
> cluster run for an hour to see if gossip will somehow catch up. 
> However, the interesting part is if you restart this node that was down when 
> schema changes were made, the exception below goes away and it gets new 
> schema correctly. 
> What is it caching that a second restart is necessary to make it behave 
> correctly ?
> ERROR 00:23:33 Configuration exception merging remote schema
> org.apache.cassandra.exceptions.ConfigurationException: Column family ID 
> mismatch (found 7208d260-cf8c-11e5-a13b-fb6871b443fb; expected 
> e2839010-cf7e-11e5-a13b-fb6871b443fb)
>   at 
> org.apache.cassandra.config.CFMetaData.validateCompatibility(CFMetaData.java:783)
>  ~[main/:na]
>   at org.apache.cassandra.config.CFMetaData.apply(CFMetaData.java:743) 
> ~[main/:na]
>   at org.apache.cassandra.config.Schema.updateTable(Schema.java:626) 
> ~[main/:na]
>   at org.apach



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)