[jira] [Commented] (CASSANDRA-15953) Support fetching all user tables to compare in Cassandra-diff

2020-07-23 Thread Yifan Cai (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17163818#comment-17163818
 ] 

Yifan Cai commented on CASSANDRA-15953:
---

bq. system.batches uses LocalPartitioner so if you run an auto-discovery diff 
without excluding system, you get a bunch of errors

That is very good point. Thanks for bringing it up. 

Would you like to take another look at the PR? Now the system keyspaces are 
always excluded. 

> Support fetching all user tables to compare in Cassandra-diff
> -
>
> Key: CASSANDRA-15953
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15953
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tool/diff
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>  Labels: pull-request-available
>
> The spark diff job may fail to launch with kernel error "E2BIG: Argument list 
> too long", when passing a large list of keyspace table list to compare. 
> Proposing a mode to fetch all user tables from the clusters to be compared. 
> When the mode is on, the spark job ignores the parameter "keyspace_tables".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15953) Support fetching all user tables to compare in Cassandra-diff

2020-07-22 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17162977#comment-17162977
 ] 

Sam Tunnicliffe commented on CASSANDRA-15953:
-

{quote}Given there is really no point of comparing system tables
{quote}

It isn't just that, {{system.batches}} uses {{LocalPartitioner}} so if you run 
an auto-discovery diff without excluding {{system}}, you get a bunch of errors 
like:

{code}
com.datastax.driver.core.exceptions.CodecNotFoundException: Codec not found for 
requested operation: [timeuuid <-> java.lang.Long]
at 
com.datastax.driver.core.CodecRegistry.notFound(CodecRegistry.java:806)
at 
com.datastax.driver.core.CodecRegistry.createCodec(CodecRegistry.java:649)
at 
com.datastax.driver.core.CodecRegistry.findCodec(CodecRegistry.java:631)
at 
com.datastax.driver.core.CodecRegistry.codecFor(CodecRegistry.java:556)
at 
com.datastax.driver.core.AbstractGettableByIndexData.codecFor(AbstractGettableByIndexData.java:80)
at com.datastax.driver.core.BoundStatement.bind(BoundStatement.java:203)
at 
com.datastax.driver.core.DefaultPreparedStatement.bind(DefaultPreparedStatement.java:134)
at 
org.apache.cassandra.diff.DiffCluster.fetchPartitionKeys(DiffCluster.java:120)
at 
org.apache.cassandra.diff.DiffCluster.getPartitionKeys(DiffCluster.java:110)
at org.apache.cassandra.diff.Differ.diffTable(Differ.java:202)
at org.apache.cassandra.diff.Differ.run(Differ.java:177)
at 
org.apache.cassandra.diff.DiffJob.lambda$run$f7b4d595$1(DiffJob.java:173)
at 
org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1.apply(JavaPairRDD.scala:1040)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at 
scala.collection.TraversableOnce$class.reduceLeft(TraversableOnce.scala:185)
at scala.collection.AbstractIterator.reduceLeft(Iterator.scala:1336)
at 
org.apache.spark.rdd.RDD$$anonfun$reduce$1$$anonfun$14.apply(RDD.scala:1015)
at 
org.apache.spark.rdd.RDD$$anonfun$reduce$1$$anonfun$14.apply(RDD.scala:1013)
at 
org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:2130)
at 
org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:2130)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{code}

> Support fetching all user tables to compare in Cassandra-diff
> -
>
> Key: CASSANDRA-15953
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15953
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tool/diff
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>  Labels: pull-request-available
>
> The spark diff job may fail to launch with kernel error "E2BIG: Argument list 
> too long", when passing a large list of keyspace table list to compare. 
> Proposing a mode to fetch all user tables from the clusters to be compared. 
> When the mode is on, the spark job ignores the parameter "keyspace_tables".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15953) Support fetching all user tables to compare in Cassandra-diff

2020-07-22 Thread Yifan Cai (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17162971#comment-17162971
 ] 

Yifan Cai commented on CASSANDRA-15953:
---

Thanks for the review. 

Regarding the {{disallowed_keyspaces}} list, I vacillated between either 1) 
using the list as an addition to the predefined system keyspaces list or 2) 
using it explicitly (so the logic is either client provides a list to ignore or 
a predefined list is used). I picked the later one. But looks like it causes 
confusion. 
Given there is really no point of comparing system tables, the current 
implementation that allows it does not make sense. 

The PR was updated to address the comment. 

> Support fetching all user tables to compare in Cassandra-diff
> -
>
> Key: CASSANDRA-15953
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15953
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tool/diff
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>  Labels: pull-request-available
>
> The spark diff job may fail to launch with kernel error "E2BIG: Argument list 
> too long", when passing a large list of keyspace table list to compare. 
> Proposing a mode to fetch all user tables from the clusters to be compared. 
> When the mode is on, the spark job ignores the parameter "keyspace_tables".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15953) Support fetching all user tables to compare in Cassandra-diff

2020-07-20 Thread Yifan Cai (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161611#comment-17161611
 ] 

Yifan Cai commented on CASSANDRA-15953:
---

Log from the manual spark job testing. 
{code:java}
➜ docker run --name cas-src -d -p 9042:9042 cassandra:3.0.18
➜ docker run --name cas-tgt -d -p 9043:9042 cassandra:latest
➜ docker exec cas-src cassandra-stress write n=1k -schema keyspace="keyspace1"
➜ docker exec cas-tgt cassandra-stress write n=1k -schema keyspace="keyspace1"
➜ docker exec cas-src cassandra-stress write n=1k -schema keyspace="keyspace2"
➜ docker exec cas-tgt cassandra-stress write n=1k -schema keyspace="keyspace2"
➜ spark-submit --files ./spark-job/localconfig-auto-discover.yaml --class 
org.apache.cassandra.diff.DiffJob 
spark-uberjar/target/spark-uberjar-0.2-SNAPSHOT.jar 
localconfig-auto-discover.yaml

// The diff job yields the following result. 
INFO DiffJob: FINISHED: {KeyspaceTablePair{keyspace=keyspace1, 
table=counter1}=Matched Partitions - 0, Mismatched Partitions - 0, Partition 
Errors - 0, Partitions Only In Source - 0, Partitions Only In Target - 0, 
Skipped Partitions - 0, Matched Rows - 0, Matched Values - 0, Mismatched Values 
- 0 , KeyspaceTablePair{keyspace=keyspace1, table=standard1}=Matched Partitions 
- 1000, Mismatched Partitions - 0, Partition Errors - 0, Partitions Only In 
Source - 0, Partitions Only In Target - 0, Skipped Partitions - 0, Matched Rows 
- 1000, Matched Values - 6000, Mismatched Values - 0 , 
KeyspaceTablePair{keyspace=keyspace2, table=standard1}=Matched Partitions - 
1000, Mismatched Partitions - 0, Partition Errors - 0, Partitions Only In 
Source - 0, Partitions Only In Target - 0, Skipped Partitions - 0, Matched Rows 
- 1000, Matched Values - 6000, Mismatched Values - 0 , 
KeyspaceTablePair{keyspace=keyspace2, table=counter1}=Matched Partitions - 0, 
Mismatched Partitions - 0, Partition Errors - 0, Partitions Only In Source - 0, 
Partitions Only In Target - 0, Skipped Partitions - 0, Matched Rows - 0, 
Matched Values - 0, Mismatched Values - 0 }
{code}

> Support fetching all user tables to compare in Cassandra-diff
> -
>
> Key: CASSANDRA-15953
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15953
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tool/diff
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>
> The spark diff job may fail to launch with kernel error "E2BIG: Argument list 
> too long", when passing a large list of keyspace table list to compare. 
> Proposing a mode to fetch all user tables from the clusters to be compared. 
> When the mode is on, the spark job ignores the parameter "keyspace_tables".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15953) Support fetching all user tables to compare in Cassandra-diff

2020-07-19 Thread Yifan Cai (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17160893#comment-17160893
 ] 

Yifan Cai commented on CASSANDRA-15953:
---

PR: [https://github.com/apache/cassandra-diff/pull/10]
 Code: [https://github.com/yifan-c/cassandra-diff/tree/compare-all-user-tables]

This patch allows the spark job to discover the user tables when 
`keyspace_tables` is not specified. The discovered tables are the intersection 
of the schema from source and target clusters. 

Optionally, one can specify the `disallowed_keyspaces` to exclude the keyspaces 
that should not be compared. 

> Support fetching all user tables to compare in Cassandra-diff
> -
>
> Key: CASSANDRA-15953
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15953
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tool/diff
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>
> The spark diff job may fail to launch with kernel error "E2BIG: Argument list 
> too long", when passing a large list of keyspace table list to compare. 
> Proposing a mode to fetch all user tables from the clusters to be compared. 
> When the mode is on, the spark job ignores the parameter "keyspace_tables".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org