Re: Contributing cassandra-diff
Great addition in the tool set! A separate repo would be better. Grouping repos together only to be easier indexed does not seems to be a strong supportive reason. Just my 2 cents. - Yifan - Yifan From: Dinesh Joshi Sent: Thursday, August 22, 2019 11:42 AM To: dev Subject: Re: Contributing cassandra-diff +1 on a discrete repo. Dinesh > On Aug 22, 2019, at 9:14 AM, Michael Shuler wrote: > > CI git polling for changes on a separate repository (if/when CI is needed) is > probably a better way to go. I don't believe there are any issues with INFRA > on us having discrete repos, and creating them with the self-help web tool is > quick and easy. > > Thanks for the neat looking utility! > > Michael > > On 8/22/19 10:33 AM, Sankalp Kohli wrote: >> A different repo will be better >>> On Aug 22, 2019, at 6:16 AM, Per Otterström >>> wrote: >>> >>> Very powerful tool indeed, thanks for sharing! >>> >>> I believe it is best to keep tools like this in different repos since >>> different tools will probably have different life cycles and tool chains. >>> Yes, that could be handled in a single repo, but with different repos we'd >>> get natural boundaries. >>> >>> -Original Message- >>> From: Sumanth Pasupuleti >>> Sent: den 22 augusti 2019 14:40 >>> To: dev@cassandra.apache.org >>> Subject: Re: Contributing cassandra-diff >>> >>> No hard preference on the repo, but just excited about this tool! Looking >>> forward to employing this for upgrade testing (very timely :)) >>> On Thu, Aug 22, 2019 at 3:38 AM Sam Tunnicliffe wrote: My own weak preference would be for a dedicated repo in the first instance. If/when additional tools are contributed we should look at co-locating common stuff, but rushing toward a monorepo would be a mistake IMO. >> On 22 Aug 2019, at 11:10, Jeff Jirsa wrote: > > I weakly prefer contrib. > > > On Thu, Aug 22, 2019 at 12:09 PM Marcus Eriksson > wrote: > >> Hi, we are about to open source our tooling for comparing two >> cassandra clusters and want to get some feedback where to push it. >> I think the options are: (name bike-shedding welcome) >> >> 1. create repos/asf/cassandra-diff.git 2. create a generic >> repos/asf/cassandra-contrib.git where we can add more >> contributed tools in the future >> >> Temporary location: >> https://protect2.fireeye.com/url?k=e8982d07-b412e678-e8986d9c-86717 >> 581b0b5-292bc820a13b7138&q=1&u=https%3A%2F%2Fgithub.com%2Fkrummas%2 >> Fcassandra-diff >> >> Cassandra-diff is a spark job that compares the data in two >> clusters - it >> pages through all partitions and reads all rows for those >> partitions in both clusters to make sure they are identical. Based >> on the configuration >> variable “reverse_read_probability†the rows are either read >> forward or in >> reverse order. >> >> Our main use case for cassandra-diff has been to set up two >> identical clusters, transfer a snapshot from the cluster we want to >> test to these clusters and upgrade one side. When that is done we >> run this tool to make >> sure that 2.1 and 3.0 gives the same results. A few examples of the bugs we >> have found using this tool: >> >> * CASSANDRA-14823: Legacy sstables with range tombstones spanning multiple >> index blocks create invalid bound sequences on 3.0+ >> * CASSANDRA-14803: Rows that cross index block boundaries can cause >> incomplete reverse reads in some cases >> * CASSANDRA-15178: Skipping illegal legacy cells can break reverse >> iteration of indexed partitions >> >> /Marcus >> >> --- >> -- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >> For additional commands, e-mail: dev-h...@cassandra.apache.org >> >> - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org >>> B‹CB•È[œÝXœØÜšX™KK[XZ[ˆ]‹][œÝXœØÜšX™PØ\ÜØ[™˜K˜\XÚK›Ü™ÃB‘›ÜˆY][Û˜[ÛÛ[X[™ËK[XZ[ˆ]‹Z[Ø\ÜØ[™˜K˜\XÚK›Ü™ÃBƒB >> - >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >> For additional commands, e-mail: dev-h...@cassandra.apache.org > > - > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@cassand
Re: Contributing cassandra-diff
+1 on a discrete repo. Dinesh > On Aug 22, 2019, at 9:14 AM, Michael Shuler wrote: > > CI git polling for changes on a separate repository (if/when CI is needed) is > probably a better way to go. I don't believe there are any issues with INFRA > on us having discrete repos, and creating them with the self-help web tool is > quick and easy. > > Thanks for the neat looking utility! > > Michael > > On 8/22/19 10:33 AM, Sankalp Kohli wrote: >> A different repo will be better >>> On Aug 22, 2019, at 6:16 AM, Per Otterström >>> wrote: >>> >>> Very powerful tool indeed, thanks for sharing! >>> >>> I believe it is best to keep tools like this in different repos since >>> different tools will probably have different life cycles and tool chains. >>> Yes, that could be handled in a single repo, but with different repos we'd >>> get natural boundaries. >>> >>> -Original Message- >>> From: Sumanth Pasupuleti >>> Sent: den 22 augusti 2019 14:40 >>> To: dev@cassandra.apache.org >>> Subject: Re: Contributing cassandra-diff >>> >>> No hard preference on the repo, but just excited about this tool! Looking >>> forward to employing this for upgrade testing (very timely :)) >>> On Thu, Aug 22, 2019 at 3:38 AM Sam Tunnicliffe wrote: My own weak preference would be for a dedicated repo in the first instance. If/when additional tools are contributed we should look at co-locating common stuff, but rushing toward a monorepo would be a mistake IMO. >> On 22 Aug 2019, at 11:10, Jeff Jirsa wrote: > > I weakly prefer contrib. > > > On Thu, Aug 22, 2019 at 12:09 PM Marcus Eriksson > wrote: > >> Hi, we are about to open source our tooling for comparing two >> cassandra clusters and want to get some feedback where to push it. >> I think the options are: (name bike-shedding welcome) >> >> 1. create repos/asf/cassandra-diff.git 2. create a generic >> repos/asf/cassandra-contrib.git where we can add more >> contributed tools in the future >> >> Temporary location: >> https://protect2.fireeye.com/url?k=e8982d07-b412e678-e8986d9c-86717 >> 581b0b5-292bc820a13b7138&q=1&u=https%3A%2F%2Fgithub.com%2Fkrummas%2 >> Fcassandra-diff >> >> Cassandra-diff is a spark job that compares the data in two >> clusters - it >> pages through all partitions and reads all rows for those >> partitions in both clusters to make sure they are identical. Based >> on the configuration >> variable “reverse_read_probability†the rows are either read >> forward or in >> reverse order. >> >> Our main use case for cassandra-diff has been to set up two >> identical clusters, transfer a snapshot from the cluster we want to >> test to these clusters and upgrade one side. When that is done we >> run this tool to make >> sure that 2.1 and 3.0 gives the same results. A few examples of the bugs we >> have found using this tool: >> >> * CASSANDRA-14823: Legacy sstables with range tombstones spanning multiple >> index blocks create invalid bound sequences on 3.0+ >> * CASSANDRA-14803: Rows that cross index block boundaries can cause >> incomplete reverse reads in some cases >> * CASSANDRA-15178: Skipping illegal legacy cells can break reverse >> iteration of indexed partitions >> >> /Marcus >> >> --- >> -- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >> For additional commands, e-mail: dev-h...@cassandra.apache.org >> >> - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org >>> B‹CB•È[œÝXœØÜšX™KK[XZ[ˆ]‹][œÝXœØÜšX™PØ\ÜØ[™˜K˜\XÚK›Ü™ÃB‘›ÜˆY][Û˜[ÛÛ[X[™ËK[XZ[ˆ]‹Z[Ø\ÜØ[™˜K˜\XÚK›Ü™ÃBƒB >> - >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >> For additional commands, e-mail: dev-h...@cassandra.apache.org > > - > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org
Re: Contributing cassandra-diff
Markus, This is great and very helpful for anyone running Cassandra in production and have peace of mind to roll out upgrades. Thank you ! *Director, Cloud Data Engineering* *Regards,Roopa Tangirala* On Thu, Aug 22, 2019 at 9:14 AM Michael Shuler wrote: > CI git polling for changes on a separate repository (if/when CI is > needed) is probably a better way to go. I don't believe there are any > issues with INFRA on us having discrete repos, and creating them with > the self-help web tool is quick and easy. > > Thanks for the neat looking utility! > > Michael > > On 8/22/19 10:33 AM, Sankalp Kohli wrote: > > A different repo will be better > > > >> On Aug 22, 2019, at 6:16 AM, Per Otterström < > per.otterst...@ericsson.com> wrote: > >> > >> Very powerful tool indeed, thanks for sharing! > >> > >> I believe it is best to keep tools like this in different repos since > different tools will probably have different life cycles and tool chains. > Yes, that could be handled in a single repo, but with different repos we'd > get natural boundaries. > >> > >> -Original Message- > >> From: Sumanth Pasupuleti > >> Sent: den 22 augusti 2019 14:40 > >> To: dev@cassandra.apache.org > >> Subject: Re: Contributing cassandra-diff > >> > >> No hard preference on the repo, but just excited about this tool! > Looking forward to employing this for upgrade testing (very timely :)) > >> > >>> On Thu, Aug 22, 2019 at 3:38 AM Sam Tunnicliffe > wrote: > >>> > >>> My own weak preference would be for a dedicated repo in the first > >>> instance. If/when additional tools are contributed we should look at > >>> co-locating common stuff, but rushing toward a monorepo would be a > >>> mistake IMO. > >>> > > On 22 Aug 2019, at 11:10, Jeff Jirsa wrote: > > I weakly prefer contrib. > > > On Thu, Aug 22, 2019 at 12:09 PM Marcus Eriksson > > >>> wrote: > > > Hi, we are about to open source our tooling for comparing two > > cassandra clusters and want to get some feedback where to push it. > > I think the options are: (name bike-shedding welcome) > > > > 1. create repos/asf/cassandra-diff.git 2. create a generic > > repos/asf/cassandra-contrib.git where we can add > >>> more > > contributed tools in the future > > > > Temporary location: > > https://protect2.fireeye.com/url?k=e8982d07-b412e678-e8986d9c-86717 > > 581b0b5-292bc820a13b7138&q=1&u=https%3A%2F%2Fgithub.com%2Fkrummas%2 > > Fcassandra-diff > > > > Cassandra-diff is a spark job that compares the data in two > > clusters - > >>> it > > pages through all partitions and reads all rows for those > > partitions in both clusters to make sure they are identical. Based > > on the > >>> configuration > > variable “reverse_read_probability†the rows are either read > > forward or > >>> in > > reverse order. > > > > Our main use case for cassandra-diff has been to set up two > > identical clusters, transfer a snapshot from the cluster we want to > > test to these clusters and upgrade one side. When that is done we > > run this tool to > >>> make > > sure that 2.1 and 3.0 gives the same results. A few examples of the > >>> bugs we > > have found using this tool: > > > > * CASSANDRA-14823: Legacy sstables with range tombstones spanning > >>> multiple > > index blocks create invalid bound sequences on 3.0+ > > * CASSANDRA-14803: Rows that cross index block boundaries can cause > > incomplete reverse reads in some cases > > * CASSANDRA-15178: Skipping illegal legacy cells can break reverse > > iteration of indexed partitions > > > > /Marcus > > > > --- > > -- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > For additional commands, e-mail: dev-h...@cassandra.apache.org > > > > > >>> > >>> > >>> - > >>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > >>> For additional commands, e-mail: dev-h...@cassandra.apache.org > >>> > >>> > >> > B‹CB• È > [œÝXœØÜšX™K K[XZ[ ˆ ]‹][œÝXœØÜšX™P Ø\ÜØ[™ ˜K˜\ XÚ K›Ü™ÃB‘›Üˆ Y ] [Û˜[ > ÛÛ[X[™ Ë K[XZ[ ˆ ]‹Z [ Ø\ÜØ[™ ˜K˜\ XÚ K›Ü™ÃBƒB > > > > - > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > For additional commands, e-mail: dev-h...@cassandra.apache.org > > > > - > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > >
Re: Contributing cassandra-diff
CI git polling for changes on a separate repository (if/when CI is needed) is probably a better way to go. I don't believe there are any issues with INFRA on us having discrete repos, and creating them with the self-help web tool is quick and easy. Thanks for the neat looking utility! Michael On 8/22/19 10:33 AM, Sankalp Kohli wrote: A different repo will be better On Aug 22, 2019, at 6:16 AM, Per Otterström wrote: Very powerful tool indeed, thanks for sharing! I believe it is best to keep tools like this in different repos since different tools will probably have different life cycles and tool chains. Yes, that could be handled in a single repo, but with different repos we'd get natural boundaries. -Original Message- From: Sumanth Pasupuleti Sent: den 22 augusti 2019 14:40 To: dev@cassandra.apache.org Subject: Re: Contributing cassandra-diff No hard preference on the repo, but just excited about this tool! Looking forward to employing this for upgrade testing (very timely :)) On Thu, Aug 22, 2019 at 3:38 AM Sam Tunnicliffe wrote: My own weak preference would be for a dedicated repo in the first instance. If/when additional tools are contributed we should look at co-locating common stuff, but rushing toward a monorepo would be a mistake IMO. On 22 Aug 2019, at 11:10, Jeff Jirsa wrote: I weakly prefer contrib. On Thu, Aug 22, 2019 at 12:09 PM Marcus Eriksson wrote: Hi, we are about to open source our tooling for comparing two cassandra clusters and want to get some feedback where to push it. I think the options are: (name bike-shedding welcome) 1. create repos/asf/cassandra-diff.git 2. create a generic repos/asf/cassandra-contrib.git where we can add more contributed tools in the future Temporary location: https://protect2.fireeye.com/url?k=e8982d07-b412e678-e8986d9c-86717 581b0b5-292bc820a13b7138&q=1&u=https%3A%2F%2Fgithub.com%2Fkrummas%2 Fcassandra-diff Cassandra-diff is a spark job that compares the data in two clusters - it pages through all partitions and reads all rows for those partitions in both clusters to make sure they are identical. Based on the configuration variable “reverse_read_probability†the rows are either read forward or in reverse order. Our main use case for cassandra-diff has been to set up two identical clusters, transfer a snapshot from the cluster we want to test to these clusters and upgrade one side. When that is done we run this tool to make sure that 2.1 and 3.0 gives the same results. A few examples of the bugs we have found using this tool: * CASSANDRA-14823: Legacy sstables with range tombstones spanning multiple index blocks create invalid bound sequences on 3.0+ * CASSANDRA-14803: Rows that cross index block boundaries can cause incomplete reverse reads in some cases * CASSANDRA-15178: Skipping illegal legacy cells can break reverse iteration of indexed partitions /Marcus --- -- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org B‹CB•È[œÝXœØÜšX™KK[XZ[ˆ]‹][œÝXœØÜšX™PØ\ÜØ[™˜K˜\XÚK›Ü™ÃB‘›ÜˆY][Û˜[ÛÛ[X[™ËK[XZ[ˆ]‹Z[Ø\ÜØ[™˜K˜\XÚK›Ü™ÃBƒB - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org
Re: Contributing cassandra-diff
A different repo will be better > On Aug 22, 2019, at 6:16 AM, Per Otterström > wrote: > > Very powerful tool indeed, thanks for sharing! > > I believe it is best to keep tools like this in different repos since > different tools will probably have different life cycles and tool chains. > Yes, that could be handled in a single repo, but with different repos we'd > get natural boundaries. > > -Original Message- > From: Sumanth Pasupuleti > Sent: den 22 augusti 2019 14:40 > To: dev@cassandra.apache.org > Subject: Re: Contributing cassandra-diff > > No hard preference on the repo, but just excited about this tool! Looking > forward to employing this for upgrade testing (very timely :)) > >> On Thu, Aug 22, 2019 at 3:38 AM Sam Tunnicliffe wrote: >> >> My own weak preference would be for a dedicated repo in the first >> instance. If/when additional tools are contributed we should look at >> co-locating common stuff, but rushing toward a monorepo would be a >> mistake IMO. >> On 22 Aug 2019, at 11:10, Jeff Jirsa wrote: >>> >>> I weakly prefer contrib. >>> >>> >>> On Thu, Aug 22, 2019 at 12:09 PM Marcus Eriksson >>> >> wrote: >>> Hi, we are about to open source our tooling for comparing two cassandra clusters and want to get some feedback where to push it. I think the options are: (name bike-shedding welcome) 1. create repos/asf/cassandra-diff.git 2. create a generic repos/asf/cassandra-contrib.git where we can add >> more contributed tools in the future Temporary location: https://protect2.fireeye.com/url?k=e8982d07-b412e678-e8986d9c-86717 581b0b5-292bc820a13b7138&q=1&u=https%3A%2F%2Fgithub.com%2Fkrummas%2 Fcassandra-diff Cassandra-diff is a spark job that compares the data in two clusters - >> it pages through all partitions and reads all rows for those partitions in both clusters to make sure they are identical. Based on the >> configuration variable “reverse_read_probability†the rows are either read forward or >> in reverse order. Our main use case for cassandra-diff has been to set up two identical clusters, transfer a snapshot from the cluster we want to test to these clusters and upgrade one side. When that is done we run this tool to >> make sure that 2.1 and 3.0 gives the same results. A few examples of the >> bugs we have found using this tool: * CASSANDRA-14823: Legacy sstables with range tombstones spanning >> multiple index blocks create invalid bound sequences on 3.0+ * CASSANDRA-14803: Rows that cross index block boundaries can cause incomplete reverse reads in some cases * CASSANDRA-15178: Skipping illegal legacy cells can break reverse iteration of indexed partitions /Marcus --- -- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org >> >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >> For additional commands, e-mail: dev-h...@cassandra.apache.org >> >> > B‹CB•È[œÝXœØÜšX™KK[XZ[ˆ]‹][œÝXœØÜšX™PØ\ÜØ[™˜K˜\XÚK›Ü™ÃB‘›ÜˆY][Û˜[ÛÛ[X[™ËK[XZ[ˆ]‹Z[Ø\ÜØ[™˜K˜\XÚK›Ü™ÃBƒB - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org
RE: Contributing cassandra-diff
Very powerful tool indeed, thanks for sharing! I believe it is best to keep tools like this in different repos since different tools will probably have different life cycles and tool chains. Yes, that could be handled in a single repo, but with different repos we'd get natural boundaries. -Original Message- From: Sumanth Pasupuleti Sent: den 22 augusti 2019 14:40 To: dev@cassandra.apache.org Subject: Re: Contributing cassandra-diff No hard preference on the repo, but just excited about this tool! Looking forward to employing this for upgrade testing (very timely :)) On Thu, Aug 22, 2019 at 3:38 AM Sam Tunnicliffe wrote: > My own weak preference would be for a dedicated repo in the first > instance. If/when additional tools are contributed we should look at > co-locating common stuff, but rushing toward a monorepo would be a > mistake IMO. > > > On 22 Aug 2019, at 11:10, Jeff Jirsa wrote: > > > > I weakly prefer contrib. > > > > > > On Thu, Aug 22, 2019 at 12:09 PM Marcus Eriksson > > > wrote: > > > >> Hi, we are about to open source our tooling for comparing two > >> cassandra clusters and want to get some feedback where to push it. > >> I think the options are: (name bike-shedding welcome) > >> > >> 1. create repos/asf/cassandra-diff.git 2. create a generic > >> repos/asf/cassandra-contrib.git where we can add > more > >> contributed tools in the future > >> > >> Temporary location: > >> https://protect2.fireeye.com/url?k=e8982d07-b412e678-e8986d9c-86717 > >> 581b0b5-292bc820a13b7138&q=1&u=https%3A%2F%2Fgithub.com%2Fkrummas%2 > >> Fcassandra-diff > >> > >> Cassandra-diff is a spark job that compares the data in two > >> clusters - > it > >> pages through all partitions and reads all rows for those > >> partitions in both clusters to make sure they are identical. Based > >> on the > configuration > >> variable “reverse_read_probability” the rows are either read > >> forward or > in > >> reverse order. > >> > >> Our main use case for cassandra-diff has been to set up two > >> identical clusters, transfer a snapshot from the cluster we want to > >> test to these clusters and upgrade one side. When that is done we > >> run this tool to > make > >> sure that 2.1 and 3.0 gives the same results. A few examples of the > bugs we > >> have found using this tool: > >> > >> * CASSANDRA-14823: Legacy sstables with range tombstones spanning > multiple > >> index blocks create invalid bound sequences on 3.0+ > >> * CASSANDRA-14803: Rows that cross index block boundaries can cause > >> incomplete reverse reads in some cases > >> * CASSANDRA-15178: Skipping illegal legacy cells can break reverse > >> iteration of indexed partitions > >> > >> /Marcus > >> > >> --- > >> -- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > >> For additional commands, e-mail: dev-h...@cassandra.apache.org > >> > >> > > > - > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > >
Re: Contributing cassandra-diff
No hard preference on the repo, but just excited about this tool! Looking forward to employing this for upgrade testing (very timely :)) On Thu, Aug 22, 2019 at 3:38 AM Sam Tunnicliffe wrote: > My own weak preference would be for a dedicated repo in the first > instance. If/when additional tools are contributed we should look at > co-locating common stuff, but rushing toward a monorepo would be a mistake > IMO. > > > On 22 Aug 2019, at 11:10, Jeff Jirsa wrote: > > > > I weakly prefer contrib. > > > > > > On Thu, Aug 22, 2019 at 12:09 PM Marcus Eriksson > wrote: > > > >> Hi, we are about to open source our tooling for comparing two cassandra > >> clusters and want to get some feedback where to push it. I think the > >> options are: (name bike-shedding welcome) > >> > >> 1. create repos/asf/cassandra-diff.git > >> 2. create a generic repos/asf/cassandra-contrib.git where we can add > more > >> contributed tools in the future > >> > >> Temporary location: https://github.com/krummas/cassandra-diff > >> > >> Cassandra-diff is a spark job that compares the data in two clusters - > it > >> pages through all partitions and reads all rows for those partitions in > >> both clusters to make sure they are identical. Based on the > configuration > >> variable “reverse_read_probability” the rows are either read forward or > in > >> reverse order. > >> > >> Our main use case for cassandra-diff has been to set up two identical > >> clusters, transfer a snapshot from the cluster we want to test to these > >> clusters and upgrade one side. When that is done we run this tool to > make > >> sure that 2.1 and 3.0 gives the same results. A few examples of the > bugs we > >> have found using this tool: > >> > >> * CASSANDRA-14823: Legacy sstables with range tombstones spanning > multiple > >> index blocks create invalid bound sequences on 3.0+ > >> * CASSANDRA-14803: Rows that cross index block boundaries can cause > >> incomplete reverse reads in some cases > >> * CASSANDRA-15178: Skipping illegal legacy cells can break reverse > >> iteration of indexed partitions > >> > >> /Marcus > >> > >> - > >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > >> For additional commands, e-mail: dev-h...@cassandra.apache.org > >> > >> > > > - > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > >
Re: Contributing cassandra-diff
My own weak preference would be for a dedicated repo in the first instance. If/when additional tools are contributed we should look at co-locating common stuff, but rushing toward a monorepo would be a mistake IMO. > On 22 Aug 2019, at 11:10, Jeff Jirsa wrote: > > I weakly prefer contrib. > > > On Thu, Aug 22, 2019 at 12:09 PM Marcus Eriksson wrote: > >> Hi, we are about to open source our tooling for comparing two cassandra >> clusters and want to get some feedback where to push it. I think the >> options are: (name bike-shedding welcome) >> >> 1. create repos/asf/cassandra-diff.git >> 2. create a generic repos/asf/cassandra-contrib.git where we can add more >> contributed tools in the future >> >> Temporary location: https://github.com/krummas/cassandra-diff >> >> Cassandra-diff is a spark job that compares the data in two clusters - it >> pages through all partitions and reads all rows for those partitions in >> both clusters to make sure they are identical. Based on the configuration >> variable “reverse_read_probability” the rows are either read forward or in >> reverse order. >> >> Our main use case for cassandra-diff has been to set up two identical >> clusters, transfer a snapshot from the cluster we want to test to these >> clusters and upgrade one side. When that is done we run this tool to make >> sure that 2.1 and 3.0 gives the same results. A few examples of the bugs we >> have found using this tool: >> >> * CASSANDRA-14823: Legacy sstables with range tombstones spanning multiple >> index blocks create invalid bound sequences on 3.0+ >> * CASSANDRA-14803: Rows that cross index block boundaries can cause >> incomplete reverse reads in some cases >> * CASSANDRA-15178: Skipping illegal legacy cells can break reverse >> iteration of indexed partitions >> >> /Marcus >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >> For additional commands, e-mail: dev-h...@cassandra.apache.org >> >> - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org
Re: Contributing cassandra-diff
I weakly prefer contrib. On Thu, Aug 22, 2019 at 12:09 PM Marcus Eriksson wrote: > Hi, we are about to open source our tooling for comparing two cassandra > clusters and want to get some feedback where to push it. I think the > options are: (name bike-shedding welcome) > > 1. create repos/asf/cassandra-diff.git > 2. create a generic repos/asf/cassandra-contrib.git where we can add more > contributed tools in the future > > Temporary location: https://github.com/krummas/cassandra-diff > > Cassandra-diff is a spark job that compares the data in two clusters - it > pages through all partitions and reads all rows for those partitions in > both clusters to make sure they are identical. Based on the configuration > variable “reverse_read_probability” the rows are either read forward or in > reverse order. > > Our main use case for cassandra-diff has been to set up two identical > clusters, transfer a snapshot from the cluster we want to test to these > clusters and upgrade one side. When that is done we run this tool to make > sure that 2.1 and 3.0 gives the same results. A few examples of the bugs we > have found using this tool: > > * CASSANDRA-14823: Legacy sstables with range tombstones spanning multiple > index blocks create invalid bound sequences on 3.0+ > * CASSANDRA-14803: Rows that cross index block boundaries can cause > incomplete reverse reads in some cases > * CASSANDRA-15178: Skipping illegal legacy cells can break reverse > iteration of indexed partitions > > /Marcus > > - > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > >
Re: Contributing cassandra-diff
Great addition! Thanks Marcus. +1 for cassandra-compare as said by Jeremy. We can also think about other features like: - Comparing just the count between 2 tables. In some cases, It will be enough to say that our copy is OK. - Making a difference on a set of partition ==> This will avoid comparing the full of data in case of large volumes and when a set of data will be enough to be sure of our copy. Thanks Le jeu. 22 août 2019 à 09:49, Jeremy Hanna a écrit : > It’s great to contribute such a tool. The change between 2.x and 3.0 > brought a translation layer from thrift to cql that is hard to validate on > real clusters without something like this. Thank you. > > As for naming, perhaps cassandra-compare might be clearer as diff is an > overloaded word but that’s a bikeshed sort of argument. > > > On Aug 22, 2019, at 12:32 AM, Vinay Chella > wrote: > > > > This is a great addition to our Cassandra validation framework/tools. I > can > > see many teams in the community get benefited from tooling like this. > > > > I like the idea of the generic repo (repos/asf/cassandra-contrib.git > > or *whatever > > the name is*) for tools like this, for the following 2 main reasons. > > > > 1. Easily accessible/ reachable/ searchable > > 2. Welcomes community in Cassandra ecosystem to contribute more easily > > > > > > > > Thanks, > > Vinay Chella > > > > > >> On Wed, Aug 21, 2019 at 11:39 PM Marcus Eriksson > wrote: > >> > >> Hi, we are about to open source our tooling for comparing two cassandra > >> clusters and want to get some feedback where to push it. I think the > >> options are: (name bike-shedding welcome) > >> > >> 1. create repos/asf/cassandra-diff.git > >> 2. create a generic repos/asf/cassandra-contrib.git where we can add > more > >> contributed tools in the future > >> > >> Temporary location: https://github.com/krummas/cassandra-diff > >> > >> Cassandra-diff is a spark job that compares the data in two clusters - > it > >> pages through all partitions and reads all rows for those partitions in > >> both clusters to make sure they are identical. Based on the > configuration > >> variable “reverse_read_probability” the rows are either read forward or > in > >> reverse order. > >> > >> Our main use case for cassandra-diff has been to set up two identical > >> clusters, transfer a snapshot from the cluster we want to test to these > >> clusters and upgrade one side. When that is done we run this tool to > make > >> sure that 2.1 and 3.0 gives the same results. A few examples of the > bugs we > >> have found using this tool: > >> > >> * CASSANDRA-14823: Legacy sstables with range tombstones spanning > multiple > >> index blocks create invalid bound sequences on 3.0+ > >> * CASSANDRA-14803: Rows that cross index block boundaries can cause > >> incomplete reverse reads in some cases > >> * CASSANDRA-15178: Skipping illegal legacy cells can break reverse > >> iteration of indexed partitions > >> > >> /Marcus > >> > >> - > >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > >> For additional commands, e-mail: dev-h...@cassandra.apache.org > >> > >> > > - > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > > -- Cordialement; Ahmed ELJAMI
Re: Contributing cassandra-diff
It’s great to contribute such a tool. The change between 2.x and 3.0 brought a translation layer from thrift to cql that is hard to validate on real clusters without something like this. Thank you. As for naming, perhaps cassandra-compare might be clearer as diff is an overloaded word but that’s a bikeshed sort of argument. > On Aug 22, 2019, at 12:32 AM, Vinay Chella wrote: > > This is a great addition to our Cassandra validation framework/tools. I can > see many teams in the community get benefited from tooling like this. > > I like the idea of the generic repo (repos/asf/cassandra-contrib.git > or *whatever > the name is*) for tools like this, for the following 2 main reasons. > > 1. Easily accessible/ reachable/ searchable > 2. Welcomes community in Cassandra ecosystem to contribute more easily > > > > Thanks, > Vinay Chella > > >> On Wed, Aug 21, 2019 at 11:39 PM Marcus Eriksson wrote: >> >> Hi, we are about to open source our tooling for comparing two cassandra >> clusters and want to get some feedback where to push it. I think the >> options are: (name bike-shedding welcome) >> >> 1. create repos/asf/cassandra-diff.git >> 2. create a generic repos/asf/cassandra-contrib.git where we can add more >> contributed tools in the future >> >> Temporary location: https://github.com/krummas/cassandra-diff >> >> Cassandra-diff is a spark job that compares the data in two clusters - it >> pages through all partitions and reads all rows for those partitions in >> both clusters to make sure they are identical. Based on the configuration >> variable “reverse_read_probability” the rows are either read forward or in >> reverse order. >> >> Our main use case for cassandra-diff has been to set up two identical >> clusters, transfer a snapshot from the cluster we want to test to these >> clusters and upgrade one side. When that is done we run this tool to make >> sure that 2.1 and 3.0 gives the same results. A few examples of the bugs we >> have found using this tool: >> >> * CASSANDRA-14823: Legacy sstables with range tombstones spanning multiple >> index blocks create invalid bound sequences on 3.0+ >> * CASSANDRA-14803: Rows that cross index block boundaries can cause >> incomplete reverse reads in some cases >> * CASSANDRA-15178: Skipping illegal legacy cells can break reverse >> iteration of indexed partitions >> >> /Marcus >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >> For additional commands, e-mail: dev-h...@cassandra.apache.org >> >> - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org
Re: Contributing cassandra-diff
This is a great addition to our Cassandra validation framework/tools. I can see many teams in the community get benefited from tooling like this. I like the idea of the generic repo (repos/asf/cassandra-contrib.git or *whatever the name is*) for tools like this, for the following 2 main reasons. 1. Easily accessible/ reachable/ searchable 2. Welcomes community in Cassandra ecosystem to contribute more easily Thanks, Vinay Chella On Wed, Aug 21, 2019 at 11:39 PM Marcus Eriksson wrote: > Hi, we are about to open source our tooling for comparing two cassandra > clusters and want to get some feedback where to push it. I think the > options are: (name bike-shedding welcome) > > 1. create repos/asf/cassandra-diff.git > 2. create a generic repos/asf/cassandra-contrib.git where we can add more > contributed tools in the future > > Temporary location: https://github.com/krummas/cassandra-diff > > Cassandra-diff is a spark job that compares the data in two clusters - it > pages through all partitions and reads all rows for those partitions in > both clusters to make sure they are identical. Based on the configuration > variable “reverse_read_probability” the rows are either read forward or in > reverse order. > > Our main use case for cassandra-diff has been to set up two identical > clusters, transfer a snapshot from the cluster we want to test to these > clusters and upgrade one side. When that is done we run this tool to make > sure that 2.1 and 3.0 gives the same results. A few examples of the bugs we > have found using this tool: > > * CASSANDRA-14823: Legacy sstables with range tombstones spanning multiple > index blocks create invalid bound sequences on 3.0+ > * CASSANDRA-14803: Rows that cross index block boundaries can cause > incomplete reverse reads in some cases > * CASSANDRA-15178: Skipping illegal legacy cells can break reverse > iteration of indexed partitions > > /Marcus > > - > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > >