Re: multiple clients making schema changes at once

2021-06-03 Thread Erick Ramirez
Having said that, I'm still not a fan of making schema changes
programmatically. I spend way too much time helping users unscramble their
schema after they've hit multiple disagreements. I do understand the need
for it but avoid it if you can particularly in production.

On Fri, 4 Jun 2021 at 09:41, Erick Ramirez 
wrote:

> I wonder if there’s a way to query the driver to see if your schema change
>> has fully propagated.  I haven’t looked into this.
>>
>
> Yes, the drivers have APIs for this. For example, the Java driver has
> isSchemaInAgreement() and checkSchemaAgreement().
>
> See
> https://docs.datastax.com/en/developer/java-driver/latest/manual/core/metadata/schema/.
> Cheers!
>
>


Re: multiple clients making schema changes at once

2021-06-03 Thread Erick Ramirez
>
> I wonder if there’s a way to query the driver to see if your schema change
> has fully propagated.  I haven’t looked into this.
>

Yes, the drivers have APIs for this. For example, the Java driver has
isSchemaInAgreement() and checkSchemaAgreement().

See
https://docs.datastax.com/en/developer/java-driver/latest/manual/core/metadata/schema/.
Cheers!


Re: multiple clients making schema changes at once

2021-06-03 Thread Max C.
Hi Joe,

In our case we only do this in the test environment and it could be the case 
that there are several seconds or even minutes between when a schema change 
occurs vs when a test executes that depends on said schema change.  Perhaps we 
have been lucky thus far.  :-)

I wonder if there’s a way to query the driver to see if your schema change has 
fully propagated.  I haven’t looked into this.

- Max

> On Jun 3, 2021, at 8:23 am, Joe Obernberger  
> wrote:
> 
> How does this work?  I have a program that runs a series of alter table 
> statements, and then does inserts.  In some cases, the insert happens 
> immediately after the alter table statement and the insert fails because the 
> schema (apparently) has not had time to propagate.  I get an Undefined column 
> name error.
> 
> The alter statements run single threaded, but the inserts run in multiple 
> threads.  The alter statement is run in a synchronized block (Java).  Should 
> I put an artificial delay after the alter statement?
> 
> -Joe
> 
> On 6/1/2021 2:59 PM, Max C. wrote:
>> We use ZooKeeper + kazoo’s lock implementation.  Kazoo is a Python client 
>> library for ZooKeeper.
>> 
>> - Max
>> 
>>> Yes this is quite annoying. How did you implement that "external lock"? I 
>>> also thought of doing an external service that would be dedicated to that. 
>>> Cassandra client apps would send create instruction to that service, that 
>>> would receive them and do the creates 1 by 1, and the client app would wait 
>>> the response from it before starting to insert.
>>> 
>>> Best,
>>> 
>>> Sébastien.
>>> 
>>> Le mar. 1 juin 2021 à 05:21, Max C. >> > a écrit :
>>> In our case we have a shared dev cluster with (for example) a key space for 
>>> each developer, a key space for each CI runner, etc.   As part of 
>>> initializing our test suite we setup the schema to match the code that is 
>>> about to be tested.  This can mean multiple CI runners each adding/dropping 
>>> tables at the same time but for different key spaces.   
>>>   
>>> 
>>> Our experience is even though the schema changes do not conflict, we still 
>>> run into schema mismatch problems.   Our solution to this was to have a 
>>> lock (external to Cassandra) that ensures only a single schema change 
>>> operation is being issued at a time.
>>> 
>>> People assume schema changes in Cassandra work the same way as MySQL or 
>>> multiple users editing files on disk — i.e. as long as you’re not editing 
>>> the same file (or same MySQL table), then there’s no problem.  This is NOT 
>>> the case.  Cassandra schema changes are more like “git push”ing a commit to 
>>> the same branch — i.e. at most one change can be outstanding at a time 
>>> (across all tables, all key spaces)…otherwise you will run into trouble.
>>> 
>>> Hope that helps.  Best of luck.
>>> 
>>> - Max
>>> 
>>> 
>>> Hello,
>>> 
>>> I have a more general question about that, I cannot find clear answer.
>>> 
>>> In my use case I have many tables (around 10k new tables created per 
>>> months) and they are created from many clients and only dynamically, with 
>>> several clients creating same tables simulteanously.
>>> 
>>> What is the recommended way of creating tables dynamically? If I am doing 
>>> "if not exists" queries + wait for schema aggreement before and after each 
>>> create statement, will it work correctly for Cassandra?
>>> 
>>> Sébastien.
>>> 
>> 
>> 
>>  
>> 
>> Virus-free. www.avg.com 
>> 
>>  


Re: multiple clients making schema changes at once

2021-06-03 Thread Jeff Jirsa
CFID mismatch is not "schema not propagated", it means you created the
table twice at the same time, and you have an inconsistent view of the
table within your cluster.

This is bad. Really bad. Worse than you expect. It's a bug in cassandra,
but until it's fixed, you should stop doing concurrent schema modifications.



On Thu, Jun 3, 2021 at 8:37 AM Sébastien Rebecchi 
wrote:

> Sometimes even waiting hours does not change. I have a cluster where I did
> like you, synchronization of create tables statement, then even I tried
> waiting for schema agreement, in loop until success, but sometimes the
> success never happens, i got that error in loop in the logs of a node, it
> seems we must restart nodes really often :(
>
> Sébastien
>
> ERROR [InternalResponseStage:1117] 2021-06-03 17:32:34,937
> MigrationCoordinator.java:408 - Unable to merge schema from /
> 135.181.222.100
> org.apache.cassandra.exceptions.ConfigurationException: Column family ID
> mismatch (found a991bb50-c475-11eb-83cb-df35fc5a9bea; expected
> 994bee02-c475-11eb-beff-6d70d473832f)
> at
> org.apache.cassandra.config.CFMetaData.validateCompatibility(CFMetaData.java:984)
> ~[apache-cassandra-3.11.10.jar:3.11.10]
> at org.apache.cassandra.config.CFMetaData.apply(CFMetaData.java:938)
> ~[apache-cassandra-3.11.10.jar:3.11.10]
> at org.apache.cassandra.config.Schema.updateTable(Schema.java:687)
> ~[apache-cassandra-3.11.10.jar:3.11.10]
> at
> org.apache.cassandra.schema.SchemaKeyspace.updateKeyspace(SchemaKeyspace.java:1478)
> ~[apache-cassandra-3.11.10.jar:3.11.10]
> at
> org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1434)
> ~[apache-cassandra-3.11.10.jar:3.11.10]
> at
> org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1403)
> ~[apache-cassandra-3.11.10.jar:3.11.10]
> at
> org.apache.cassandra.schema.SchemaKeyspace.mergeSchemaAndAnnounceVersion(SchemaKeyspace.java:1380)
> ~[apache-cassandra-3.11.10.jar:3.11.10]
> at
> org.apache.cassandra.service.MigrationCoordinator.mergeSchemaFrom(MigrationCoordinator.java:367)
> ~[apache-cassandra-3.11.10.jar:3.11.10]
> at
> org.apache.cassandra.service.MigrationCoordinator$Callback.response(MigrationCoordinator.java:404)
> [apache-cassandra-3.11.10.jar:3.11.10]
> at
> org.apache.cassandra.service.MigrationCoordinator$Callback.response(MigrationCoordinator.java:393)
> [apache-cassandra-3.11.10.jar:3.11.10]
> at
> org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:53)
> [apache-cassandra-3.11.10.jar:3.11.10]
> at
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66)
> [apache-cassandra-3.11.10.jar:3.11.10]
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> [na:1.8.0_292]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_292]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [na:1.8.0_292]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [na:1.8.0_292]
> at
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84)
> [apache-cassandra-3.11.10.jar:3.11.10]
> at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_292]
>
> Le jeu. 3 juin 2021 à 17:23, Joe Obernberger 
> a écrit :
>
>> How does this work?  I have a program that runs a series of alter table
>> statements, and then does inserts.  In some cases, the insert happens
>> immediately after the alter table statement and the insert fails because
>> the schema (apparently) has not had time to propagate.  I get an Undefined
>> column name error.
>>
>> The alter statements run single threaded, but the inserts run in multiple
>> threads.  The alter statement is run in a synchronized block (Java).
>> Should I put an artificial delay after the alter statement?
>>
>> -Joe
>> On 6/1/2021 2:59 PM, Max C. wrote:
>>
>> We use ZooKeeper + kazoo’s lock implementation.  Kazoo is a Python client
>> library for ZooKeeper.
>>
>> - Max
>>
>> Yes this is quite annoying. How did you implement that "external lock"? I
>> also thought of doing an external service that would be dedicated to that.
>> Cassandra client apps would send create instruction to that service, that
>> would receive them and do the creates 1 by 1, and the client app would wait
>> the response from it before starting to insert.
>>
>> Best,
>>
>> Sébastien.
>>
>> Le mar. 1 juin 2021 à 05:21, Max C.  a écrit :
>>
>>> In our case we have a shared dev cluster with (for example) a key space
>>> for each developer, a key space for each CI runner, etc.   As part of
>>> initializing our test suite we setup the schema to match the code that is
>>> about to be tested.  This can mean multiple CI runners each adding/dropping
>>> tables at the same time but for different key spaces.
>>>
>>> Our experience is even though the schema changes do not conflict, we
>>> still run into schema mismatch problems.   Our solution to this was to have

Re: multiple clients making schema changes at once

2021-06-03 Thread Sébastien Rebecchi
Sometimes even waiting hours does not change. I have a cluster where I did
like you, synchronization of create tables statement, then even I tried
waiting for schema agreement, in loop until success, but sometimes the
success never happens, i got that error in loop in the logs of a node, it
seems we must restart nodes really often :(

Sébastien

ERROR [InternalResponseStage:1117] 2021-06-03 17:32:34,937
MigrationCoordinator.java:408 - Unable to merge schema from /135.181.222.100
org.apache.cassandra.exceptions.ConfigurationException: Column family ID
mismatch (found a991bb50-c475-11eb-83cb-df35fc5a9bea; expected
994bee02-c475-11eb-beff-6d70d473832f)
at
org.apache.cassandra.config.CFMetaData.validateCompatibility(CFMetaData.java:984)
~[apache-cassandra-3.11.10.jar:3.11.10]
at org.apache.cassandra.config.CFMetaData.apply(CFMetaData.java:938)
~[apache-cassandra-3.11.10.jar:3.11.10]
at org.apache.cassandra.config.Schema.updateTable(Schema.java:687)
~[apache-cassandra-3.11.10.jar:3.11.10]
at
org.apache.cassandra.schema.SchemaKeyspace.updateKeyspace(SchemaKeyspace.java:1478)
~[apache-cassandra-3.11.10.jar:3.11.10]
at
org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1434)
~[apache-cassandra-3.11.10.jar:3.11.10]
at
org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1403)
~[apache-cassandra-3.11.10.jar:3.11.10]
at
org.apache.cassandra.schema.SchemaKeyspace.mergeSchemaAndAnnounceVersion(SchemaKeyspace.java:1380)
~[apache-cassandra-3.11.10.jar:3.11.10]
at
org.apache.cassandra.service.MigrationCoordinator.mergeSchemaFrom(MigrationCoordinator.java:367)
~[apache-cassandra-3.11.10.jar:3.11.10]
at
org.apache.cassandra.service.MigrationCoordinator$Callback.response(MigrationCoordinator.java:404)
[apache-cassandra-3.11.10.jar:3.11.10]
at
org.apache.cassandra.service.MigrationCoordinator$Callback.response(MigrationCoordinator.java:393)
[apache-cassandra-3.11.10.jar:3.11.10]
at
org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:53)
[apache-cassandra-3.11.10.jar:3.11.10]
at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66)
[apache-cassandra-3.11.10.jar:3.11.10]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
[na:1.8.0_292]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_292]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[na:1.8.0_292]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[na:1.8.0_292]
at
org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84)
[apache-cassandra-3.11.10.jar:3.11.10]
at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_292]

Le jeu. 3 juin 2021 à 17:23, Joe Obernberger 
a écrit :

> How does this work?  I have a program that runs a series of alter table
> statements, and then does inserts.  In some cases, the insert happens
> immediately after the alter table statement and the insert fails because
> the schema (apparently) has not had time to propagate.  I get an Undefined
> column name error.
>
> The alter statements run single threaded, but the inserts run in multiple
> threads.  The alter statement is run in a synchronized block (Java).
> Should I put an artificial delay after the alter statement?
>
> -Joe
> On 6/1/2021 2:59 PM, Max C. wrote:
>
> We use ZooKeeper + kazoo’s lock implementation.  Kazoo is a Python client
> library for ZooKeeper.
>
> - Max
>
> Yes this is quite annoying. How did you implement that "external lock"? I
> also thought of doing an external service that would be dedicated to that.
> Cassandra client apps would send create instruction to that service, that
> would receive them and do the creates 1 by 1, and the client app would wait
> the response from it before starting to insert.
>
> Best,
>
> Sébastien.
>
> Le mar. 1 juin 2021 à 05:21, Max C.  a écrit :
>
>> In our case we have a shared dev cluster with (for example) a key space
>> for each developer, a key space for each CI runner, etc.   As part of
>> initializing our test suite we setup the schema to match the code that is
>> about to be tested.  This can mean multiple CI runners each adding/dropping
>> tables at the same time but for different key spaces.
>>
>> Our experience is even though the schema changes do not conflict, we
>> still run into schema mismatch problems.   Our solution to this was to have
>> a lock (external to Cassandra) that ensures only a single schema change
>> operation is being issued at a time.
>>
>> People assume schema changes in Cassandra work the same way as MySQL or
>> multiple users editing files on disk — i.e. as long as you’re not editing
>> the same file (or same MySQL table), then there’s no problem.  *This is
>> NOT the case.*  Cassandra schema changes are more like “git push”ing a
>> commit to the same branch — i.e. at most one change can be outstanding at a
>> time (across all tables, all key 

Re: multiple clients making schema changes at once

2021-06-03 Thread Joe Obernberger
How does this work?  I have a program that runs a series of alter table 
statements, and then does inserts.  In some cases, the insert happens 
immediately after the alter table statement and the insert fails because 
the schema (apparently) has not had time to propagate.  I get an 
Undefined column name error.


The alter statements run single threaded, but the inserts run in 
multiple threads.  The alter statement is run in a synchronized block 
(Java).  Should I put an artificial delay after the alter statement?


-Joe

On 6/1/2021 2:59 PM, Max C. wrote:
We use ZooKeeper + kazoo’s lock implementation.  Kazoo is a Python 
client library for ZooKeeper.


- Max

Yes this is quite annoying. How did you implement that "external 
lock"? I also thought of doing an external service that would be 
dedicated to that. Cassandra client apps would send create 
instruction to that service, that would receive them and do the 
creates 1 by 1, and the client app would wait the response from it 
before starting to insert.


Best,

Sébastien.

Le mar. 1 juin 2021 à 05:21, Max C.  a 
écrit :


In our case we have a shared dev cluster with (for example) a key
space for each developer, a key space for each CI runner, etc.  
As part of initializing our test suite we setup the schema to
match the code that is about to be tested.� This can mean
multiple CI runners each adding/dropping tables at the same time
but for different key spaces.

Our experience is even though the schema changes do not conflict,
we still run into schema mismatch problems.   Our solution to
this was to have a lock (external to Cassandra) that ensures only
a single schema change operation is being issued at a time.

People assume schema changes in Cassandra work the same way as
MySQL or multiple users editing files on disk — i.e. as long as
you’re not editing the same file (or same MySQL table), then
there’s no problem. � *_This is NOT the case._*  Cassandra
schema changes are more like “git push”ing a commit to the
same branch — i.e. at most one change can be outstanding at a
time (across all tables, all key spaces)…otherwise you will run
into trouble.

Hope that helps.  Best of luck.

- Max

Hello,

I have a more general question about that, I cannot find
clear answer.

In my use case I have many tables (around 10k new tables
created per months) and they are created from many clients
and only dynamically, with several clients creating same
tables simulteanously.

What is the recommended way of creating tables dynamically?
If I am doing "if not exists" queries + wait for schema
aggreement before and after each create statement, will it
work correctly for Cassandra?

Sébastien.





 
	Virus-free. www.avg.com 
 



<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

Re: multiple clients making schema changes at once

2021-06-01 Thread Max C.
We use ZooKeeper + kazoo’s lock implementation.  Kazoo is a Python client 
library for ZooKeeper.

- Max

> Yes this is quite annoying. How did you implement that "external lock"? I 
> also thought of doing an external service that would be dedicated to that. 
> Cassandra client apps would send create instruction to that service, that 
> would receive them and do the creates 1 by 1, and the client app would wait 
> the response from it before starting to insert.
> 
> Best,
> 
> Sébastien.
> 
> Le mar. 1 juin 2021 à 05:21, Max C.  > a écrit :
> In our case we have a shared dev cluster with (for example) a key space for 
> each developer, a key space for each CI runner, etc.   As part of 
> initializing our test suite we setup the schema to match the code that is 
> about to be tested.  This can mean multiple CI runners each adding/dropping 
> tables at the same time but for different key spaces.
> 
> Our experience is even though the schema changes do not conflict, we still 
> run into schema mismatch problems.   Our solution to this was to have a lock 
> (external to Cassandra) that ensures only a single schema change operation is 
> being issued at a time.
> 
> People assume schema changes in Cassandra work the same way as MySQL or 
> multiple users editing files on disk — i.e. as long as you’re not editing the 
> same file (or same MySQL table), then there’s no problem.  This is NOT the 
> case.  Cassandra schema changes are more like “git push”ing a commit to the 
> same branch — i.e. at most one change can be outstanding at a time (across 
> all tables, all key spaces)…otherwise you will run into trouble.
> 
> Hope that helps.  Best of luck.
> 
> - Max
> 
> 
> Hello,
> 
> I have a more general question about that, I cannot find clear answer.
> 
> In my use case I have many tables (around 10k new tables created per months) 
> and they are created from many clients and only dynamically, with several 
> clients creating same tables simulteanously.
> 
> What is the recommended way of creating tables dynamically? If I am doing "if 
> not exists" queries + wait for schema aggreement before and after each create 
> statement, will it work correctly for Cassandra?
> 
> Sébastien.
> 



Re: multiple clients making schema changes at once

2021-06-01 Thread Sébastien Rebecchi
Hello,

Yes this is quite annoying. How did you implement that "external lock"? I
also thought of doing an external service that would be dedicated to that.
Cassandra client apps would send create instruction to that service, that
would receive them and do the creates 1 by 1, and the client app would wait
the response from it before starting to insert.

Best,

Sébastien.

Le mar. 1 juin 2021 à 05:21, Max C.  a écrit :

> In our case we have a shared dev cluster with (for example) a key space
> for each developer, a key space for each CI runner, etc.   As part of
> initializing our test suite we setup the schema to match the code that is
> about to be tested.  This can mean multiple CI runners each adding/dropping
> tables at the same time but for different key spaces.
>
> Our experience is even though the schema changes do not conflict, we still
> run into schema mismatch problems.   Our solution to this was to have a
> lock (external to Cassandra) that ensures only a single schema change
> operation is being issued at a time.
>
> People assume schema changes in Cassandra work the same way as MySQL or
> multiple users editing files on disk — i.e. as long as you’re not editing
> the same file (or same MySQL table), then there’s no problem.  *This is
> NOT the case.*  Cassandra schema changes are more like “git push”ing a
> commit to the same branch — i.e. at most one change can be outstanding at a
> time (across all tables, all key spaces)…otherwise you will run into
> trouble.
>
> Hope that helps.  Best of luck.
>
> - Max
>
> Hello,
>>
>> I have a more general question about that, I cannot find clear answer.
>>
>> In my use case I have many tables (around 10k new tables created per
>> months) and they are created from many clients and only dynamically, with
>> several clients creating same tables simulteanously.
>>
>> What is the recommended way of creating tables dynamically? If I am doing
>> "if not exists" queries + wait for schema aggreement before and after each
>> create statement, will it work correctly for Cassandra?
>>
>> Sébastien.
>>
>
>


Re: multiple clients making schema changes at once

2021-05-31 Thread Max C.
In our case we have a shared dev cluster with (for example) a key space for 
each developer, a key space for each CI runner, etc.   As part of initializing 
our test suite we setup the schema to match the code that is about to be 
tested.  This can mean multiple CI runners each adding/dropping tables at the 
same time but for different key spaces.

Our experience is even though the schema changes do not conflict, we still run 
into schema mismatch problems.   Our solution to this was to have a lock 
(external to Cassandra) that ensures only a single schema change operation is 
being issued at a time.

People assume schema changes in Cassandra work the same way as MySQL or 
multiple users editing files on disk — i.e. as long as you’re not editing the 
same file (or same MySQL table), then there’s no problem.  This is NOT the 
case.  Cassandra schema changes are more like “git push”ing a commit to the 
same branch — i.e. at most one change can be outstanding at a time (across all 
tables, all key spaces)…otherwise you will run into trouble.

Hope that helps.  Best of luck.

- Max


Hello,

I have a more general question about that, I cannot find clear answer.

In my use case I have many tables (around 10k new tables created per months) 
and they are created from many clients and only dynamically, with several 
clients creating same tables simulteanously.

What is the recommended way of creating tables dynamically? If I am doing "if 
not exists" queries + wait for schema aggreement before and after each create 
statement, will it work correctly for Cassandra?

Sébastien.