Re: [DISCUSS] Future of MVs
Yep, agreed this is definitely the best route forwards. On 02/07/2020, 01:10, "Joshua McKenzie" wrote: Plays pretty cleanly into the "have a test plan" we modded in last month. +1 On Wed, Jul 1, 2020 at 6:43 PM Nate McCall wrote: > > > > > > > > If so, I propose we set this thread down for now in deference to us > > articulating the quality bar we set and how we achieve it for features in > > the DB and then retroactively apply them to existing experimental > features. > > Should we determine nobody is stepping up to maintain an > > experimental feature in a reasonable time frame, we can cross the bridge > of > > the implications of scale of adoption and the perceived impact on the > user > > community of deprecation and removal at that time. > > > > We should make sure we back-haul this into the CEP process so new > features/large changes have to provide some idea of what the gates are to > be production ready. > - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org
Re: [DISCUSS] Future of MVs
Plays pretty cleanly into the "have a test plan" we modded in last month. +1 On Wed, Jul 1, 2020 at 6:43 PM Nate McCall wrote: > > > > > > > > If so, I propose we set this thread down for now in deference to us > > articulating the quality bar we set and how we achieve it for features in > > the DB and then retroactively apply them to existing experimental > features. > > Should we determine nobody is stepping up to maintain an > > experimental feature in a reasonable time frame, we can cross the bridge > of > > the implications of scale of adoption and the perceived impact on the > user > > community of deprecation and removal at that time. > > > > We should make sure we back-haul this into the CEP process so new > features/large changes have to provide some idea of what the gates are to > be production ready. >
Re: [DISCUSS] Future of MVs
> > > > If so, I propose we set this thread down for now in deference to us > articulating the quality bar we set and how we achieve it for features in > the DB and then retroactively apply them to existing experimental features. > Should we determine nobody is stepping up to maintain an > experimental feature in a reasonable time frame, we can cross the bridge of > the implications of scale of adoption and the perceived impact on the user > community of deprecation and removal at that time. > We should make sure we back-haul this into the CEP process so new features/large changes have to provide some idea of what the gates are to be production ready.
Re: [DISCUSS] Future of MVs
+1 On Wed, Jul 1, 2020 at 1:55 PM Jon Haddad wrote: > I think coming up with a formal comprehensive guide for determining if we > can merge these sort of huge impacting features is a great idea. > > I'm also on board with applying the same standard to the experimental > features. > > On Wed, Jul 1, 2020 at 1:45 PM Joshua McKenzie > wrote: > > > Which questions and how we frame it aside, it's clear we have some > > foundational thinking to do, articulate, and agree upon as a project > before > > we can reasonably make decisions about deprecation, promotion, or > inclusion > > of features in the project. > > > > Is that fair? > > > > If so, I propose we set this thread down for now in deference to us > > articulating the quality bar we set and how we achieve it for features in > > the DB and then retroactively apply them to existing experimental > features. > > Should we determine nobody is stepping up to maintain an > > experimental feature in a reasonable time frame, we can cross the bridge > of > > the implications of scale of adoption and the perceived impact on the > user > > community of deprecation and removal at that time. > > > > On Wed, Jul 1, 2020 at 9:59 AM Benedict Elliott Smith < > bened...@apache.org > > > > > wrote: > > > > > I humbly suggest these are the wrong questions to ask. Instead, two > > sides > > > of just one question matter: how did we miss these problems, and what > > would > > > we have needed to do procedurally to have not missed it. Whatever it > is, > > > we need to do it now to have confidence other things were not missed, > as > > > well as for all future features. > > > > > > We should start by producing a list of what we think is necessary for > > > deploying successful features. We can then determine what items are > > > missing that would have been needed to catch a problem. Obvious things > > > are: > > > > > > * integration tests at scale > > > * integration tests with a variety of extreme workloads > > > * integration tests with various cluster topologies > > > * data integrity tests as part of the above > > > * all of the above as reproducible tests incorporated into the source > > > tree > > > > > > We can then ensure Jira accurately represents all of the known issues > > with > > > MVs (and other features). This includes those that are poorly defined > > > (such as "doesn't scale"). > > > > > > Then we can look at all issues and ask: would this approach have caught > > > it, and if not what do we need to add to the guidelines to prevent a > > > recurrence - and also ensure this problem is unique? In future we can > > ask, > > > for bugs found in features built to these guidelines: why didn't it > catch > > > this bug? Do the guidelines need additional items, or greater > specificity > > > about how to meet given criteria? > > > > > > I do not think that data from deployments - even if reliably obtained - > > > can tell us much besides which problems we prioritise. > > > > > > > > > > > > On 01/07/2020, 01:58, "joshua.mcken...@gmail.com" < > > > joshua.mcken...@gmail.com> wrote: > > > > > > It would be incredibly helpful for us to have some empirical data > and > > > agreed upon terms and benchmarks to help us navigate discussions like > > this: > > > > > > * How widely used is a feature in C* deployments worldwide? > > > * What are the primary issues users face when deploying them? > > > Scaling them? During failure scenarios? > > > * What does the engineering effort to bridge these gaps look > like? > > > Who will do that? On what time horizon? > > > * What does our current test coverage for this feature look like? > > > * What shape of defects are arising with the feature? In a > specific > > > subsection of the module or usage? > > > * Do we have an agreed upon set of standards for labeling a > feature > > > stable? As experimental? If not, how do we get there? > > > * What effort will it take to bridge from where we are to where > we > > > agree we need to be? On what timeline is this acceptable? > > > > > > I believe these are not only answerable questions, but > fundamentally > > > the underlying themes our discussion alludes to. They’re also questions > > > that apply to a lot more than just MV’s and tie into what you’re > speaking > > > to above Benedict. > > > > > > > > > > On Jun 30, 2020, at 8:32 PM, sankalp kohli < > kohlisank...@gmail.com > > > > > > wrote: > > > > > > > > I see this discussion as several decisions which can be made in > > > small > > > > increments. > > > > > > > > 1. In release cycles, when can we propose a feature to be > > deprecated > > > or > > > > marked experimental. Ideally a new feature should come out > > > experimental if > > > > required but we have several who are candidates now. We can work > on > > > > integrating this in the release lifecycle doc we already have. > > > > 2. What is the process of making an
Re: [DISCUSS] Future of MVs
I think coming up with a formal comprehensive guide for determining if we can merge these sort of huge impacting features is a great idea. I'm also on board with applying the same standard to the experimental features. On Wed, Jul 1, 2020 at 1:45 PM Joshua McKenzie wrote: > Which questions and how we frame it aside, it's clear we have some > foundational thinking to do, articulate, and agree upon as a project before > we can reasonably make decisions about deprecation, promotion, or inclusion > of features in the project. > > Is that fair? > > If so, I propose we set this thread down for now in deference to us > articulating the quality bar we set and how we achieve it for features in > the DB and then retroactively apply them to existing experimental features. > Should we determine nobody is stepping up to maintain an > experimental feature in a reasonable time frame, we can cross the bridge of > the implications of scale of adoption and the perceived impact on the user > community of deprecation and removal at that time. > > On Wed, Jul 1, 2020 at 9:59 AM Benedict Elliott Smith > > wrote: > > > I humbly suggest these are the wrong questions to ask. Instead, two > sides > > of just one question matter: how did we miss these problems, and what > would > > we have needed to do procedurally to have not missed it. Whatever it is, > > we need to do it now to have confidence other things were not missed, as > > well as for all future features. > > > > We should start by producing a list of what we think is necessary for > > deploying successful features. We can then determine what items are > > missing that would have been needed to catch a problem. Obvious things > > are: > > > > * integration tests at scale > > * integration tests with a variety of extreme workloads > > * integration tests with various cluster topologies > > * data integrity tests as part of the above > > * all of the above as reproducible tests incorporated into the source > > tree > > > > We can then ensure Jira accurately represents all of the known issues > with > > MVs (and other features). This includes those that are poorly defined > > (such as "doesn't scale"). > > > > Then we can look at all issues and ask: would this approach have caught > > it, and if not what do we need to add to the guidelines to prevent a > > recurrence - and also ensure this problem is unique? In future we can > ask, > > for bugs found in features built to these guidelines: why didn't it catch > > this bug? Do the guidelines need additional items, or greater specificity > > about how to meet given criteria? > > > > I do not think that data from deployments - even if reliably obtained - > > can tell us much besides which problems we prioritise. > > > > > > > > On 01/07/2020, 01:58, "joshua.mcken...@gmail.com" < > > joshua.mcken...@gmail.com> wrote: > > > > It would be incredibly helpful for us to have some empirical data and > > agreed upon terms and benchmarks to help us navigate discussions like > this: > > > > * How widely used is a feature in C* deployments worldwide? > > * What are the primary issues users face when deploying them? > > Scaling them? During failure scenarios? > > * What does the engineering effort to bridge these gaps look like? > > Who will do that? On what time horizon? > > * What does our current test coverage for this feature look like? > > * What shape of defects are arising with the feature? In a specific > > subsection of the module or usage? > > * Do we have an agreed upon set of standards for labeling a feature > > stable? As experimental? If not, how do we get there? > > * What effort will it take to bridge from where we are to where we > > agree we need to be? On what timeline is this acceptable? > > > > I believe these are not only answerable questions, but fundamentally > > the underlying themes our discussion alludes to. They’re also questions > > that apply to a lot more than just MV’s and tie into what you’re speaking > > to above Benedict. > > > > > > > On Jun 30, 2020, at 8:32 PM, sankalp kohli > > > wrote: > > > > > > I see this discussion as several decisions which can be made in > > small > > > increments. > > > > > > 1. In release cycles, when can we propose a feature to be > deprecated > > or > > > marked experimental. Ideally a new feature should come out > > experimental if > > > required but we have several who are candidates now. We can work on > > > integrating this in the release lifecycle doc we already have. > > > 2. What is the process of making an existing feature experimental? > > How does > > > it affect major releases around testing. > > > 3. What is the process of deprecating/removing an experimental > > feature. > > > (Assuming experimental features should be deprecated/removed) > > > > > > Coming to MV, I think we need more data before we can say we > > > should depr
Re: [DISCUSS] Future of MVs
Which questions and how we frame it aside, it's clear we have some foundational thinking to do, articulate, and agree upon as a project before we can reasonably make decisions about deprecation, promotion, or inclusion of features in the project. Is that fair? If so, I propose we set this thread down for now in deference to us articulating the quality bar we set and how we achieve it for features in the DB and then retroactively apply them to existing experimental features. Should we determine nobody is stepping up to maintain an experimental feature in a reasonable time frame, we can cross the bridge of the implications of scale of adoption and the perceived impact on the user community of deprecation and removal at that time. On Wed, Jul 1, 2020 at 9:59 AM Benedict Elliott Smith wrote: > I humbly suggest these are the wrong questions to ask. Instead, two sides > of just one question matter: how did we miss these problems, and what would > we have needed to do procedurally to have not missed it. Whatever it is, > we need to do it now to have confidence other things were not missed, as > well as for all future features. > > We should start by producing a list of what we think is necessary for > deploying successful features. We can then determine what items are > missing that would have been needed to catch a problem. Obvious things > are: > > * integration tests at scale > * integration tests with a variety of extreme workloads > * integration tests with various cluster topologies > * data integrity tests as part of the above > * all of the above as reproducible tests incorporated into the source > tree > > We can then ensure Jira accurately represents all of the known issues with > MVs (and other features). This includes those that are poorly defined > (such as "doesn't scale"). > > Then we can look at all issues and ask: would this approach have caught > it, and if not what do we need to add to the guidelines to prevent a > recurrence - and also ensure this problem is unique? In future we can ask, > for bugs found in features built to these guidelines: why didn't it catch > this bug? Do the guidelines need additional items, or greater specificity > about how to meet given criteria? > > I do not think that data from deployments - even if reliably obtained - > can tell us much besides which problems we prioritise. > > > > On 01/07/2020, 01:58, "joshua.mcken...@gmail.com" < > joshua.mcken...@gmail.com> wrote: > > It would be incredibly helpful for us to have some empirical data and > agreed upon terms and benchmarks to help us navigate discussions like this: > > * How widely used is a feature in C* deployments worldwide? > * What are the primary issues users face when deploying them? > Scaling them? During failure scenarios? > * What does the engineering effort to bridge these gaps look like? > Who will do that? On what time horizon? > * What does our current test coverage for this feature look like? > * What shape of defects are arising with the feature? In a specific > subsection of the module or usage? > * Do we have an agreed upon set of standards for labeling a feature > stable? As experimental? If not, how do we get there? > * What effort will it take to bridge from where we are to where we > agree we need to be? On what timeline is this acceptable? > > I believe these are not only answerable questions, but fundamentally > the underlying themes our discussion alludes to. They’re also questions > that apply to a lot more than just MV’s and tie into what you’re speaking > to above Benedict. > > > > On Jun 30, 2020, at 8:32 PM, sankalp kohli > wrote: > > > > I see this discussion as several decisions which can be made in > small > > increments. > > > > 1. In release cycles, when can we propose a feature to be deprecated > or > > marked experimental. Ideally a new feature should come out > experimental if > > required but we have several who are candidates now. We can work on > > integrating this in the release lifecycle doc we already have. > > 2. What is the process of making an existing feature experimental? > How does > > it affect major releases around testing. > > 3. What is the process of deprecating/removing an experimental > feature. > > (Assuming experimental features should be deprecated/removed) > > > > Coming to MV, I think we need more data before we can say we > > should deprecate MV. Here are some of them which should be part of > > deprecation process > > 1.Talk to customers who use them and understand what is the impact. > Give > > them a forum to talk about it. > > 2. Do we have enough resources to bring this feature out of the > > experimental feature list in next 1 or 2 major releases. We cannot > have too > > many experimental features in the database. Marking a feature > experimental > > should not be a parking place
Re: [DISCUSS] Future of MVs
I humbly suggest these are the wrong questions to ask. Instead, two sides of just one question matter: how did we miss these problems, and what would we have needed to do procedurally to have not missed it. Whatever it is, we need to do it now to have confidence other things were not missed, as well as for all future features. We should start by producing a list of what we think is necessary for deploying successful features. We can then determine what items are missing that would have been needed to catch a problem. Obvious things are: * integration tests at scale * integration tests with a variety of extreme workloads * integration tests with various cluster topologies * data integrity tests as part of the above * all of the above as reproducible tests incorporated into the source tree We can then ensure Jira accurately represents all of the known issues with MVs (and other features). This includes those that are poorly defined (such as "doesn't scale"). Then we can look at all issues and ask: would this approach have caught it, and if not what do we need to add to the guidelines to prevent a recurrence - and also ensure this problem is unique? In future we can ask, for bugs found in features built to these guidelines: why didn't it catch this bug? Do the guidelines need additional items, or greater specificity about how to meet given criteria? I do not think that data from deployments - even if reliably obtained - can tell us much besides which problems we prioritise. On 01/07/2020, 01:58, "joshua.mcken...@gmail.com" wrote: It would be incredibly helpful for us to have some empirical data and agreed upon terms and benchmarks to help us navigate discussions like this: * How widely used is a feature in C* deployments worldwide? * What are the primary issues users face when deploying them? Scaling them? During failure scenarios? * What does the engineering effort to bridge these gaps look like? Who will do that? On what time horizon? * What does our current test coverage for this feature look like? * What shape of defects are arising with the feature? In a specific subsection of the module or usage? * Do we have an agreed upon set of standards for labeling a feature stable? As experimental? If not, how do we get there? * What effort will it take to bridge from where we are to where we agree we need to be? On what timeline is this acceptable? I believe these are not only answerable questions, but fundamentally the underlying themes our discussion alludes to. They’re also questions that apply to a lot more than just MV’s and tie into what you’re speaking to above Benedict. > On Jun 30, 2020, at 8:32 PM, sankalp kohli wrote: > > I see this discussion as several decisions which can be made in small > increments. > > 1. In release cycles, when can we propose a feature to be deprecated or > marked experimental. Ideally a new feature should come out experimental if > required but we have several who are candidates now. We can work on > integrating this in the release lifecycle doc we already have. > 2. What is the process of making an existing feature experimental? How does > it affect major releases around testing. > 3. What is the process of deprecating/removing an experimental feature. > (Assuming experimental features should be deprecated/removed) > > Coming to MV, I think we need more data before we can say we > should deprecate MV. Here are some of them which should be part of > deprecation process > 1.Talk to customers who use them and understand what is the impact. Give > them a forum to talk about it. > 2. Do we have enough resources to bring this feature out of the > experimental feature list in next 1 or 2 major releases. We cannot have too > many experimental features in the database. Marking a feature experimental > should not be a parking place for a non functioning feature but a place > while we stabilize it. > > > > >> On Tue, Jun 30, 2020 at 4:52 PM wrote: >> >> I followed up with the clarification about unit and dtests for that reason >> Dinesh. We test experimental features now. >> >> If we’re talking about adding experimental features to the 40 quality >> testing effort, how does that differ from just saying “we won’t release >> until we’ve tested and stabilized these features and they’re no longer >> experimental”? >> >> Maybe I’m just misunderstanding something here? >> On Jun 30, 2020, at 7:12 PM, Dinesh Joshi wrote: >>> >>> On Jun 30, 2020, at 4:05 PM, Brandon Williams wrote: Instead of ripping it out, we could instead disable them in the yaml with big fat warning comments around it. That way people already using them can just enab
Re: [DISCUSS] Future of MVs
> I agree with Jeff that there is some stuff to do to address the current MV > issues and I am willing to focus on making them production ready. +1 On Wed, 1 Jul 2020 at 15:42, Benjamin Lerer wrote: > > > > "Make the scan faster" > > "Make the scan incremental and automatic" > > "Make it not blow up your page cache" > > "Make losing your base replicas less likely". > > > > There's a concrete, real opportunity with MVs to create integrity > > assertions we're missing. A dangling record from an MV that would point > to > > missing base data is something that could raise alarm bells and signal > > JIRAs so we can potentially find and fix more surprise edge cases. > > > > I agree with Jeff that there is some stuff to do to address the current MV > issues and I am willing to focus on making them production ready. > > > > > On Wed, Jul 1, 2020 at 2:58 AM wrote: > > > It would be incredibly helpful for us to have some empirical data and > > agreed upon terms and benchmarks to help us navigate discussions like > this: > > > > * How widely used is a feature in C* deployments worldwide? > > * What are the primary issues users face when deploying them? Scaling > > them? During failure scenarios? > > * What does the engineering effort to bridge these gaps look like? Who > > will do that? On what time horizon? > > * What does our current test coverage for this feature look like? > > * What shape of defects are arising with the feature? In a specific > > subsection of the module or usage? > > * Do we have an agreed upon set of standards for labeling a feature > > stable? As experimental? If not, how do we get there? > > * What effort will it take to bridge from where we are to where we > agree > > we need to be? On what timeline is this acceptable? > > > > I believe these are not only answerable questions, but fundamentally the > > underlying themes our discussion alludes to. They’re also questions that > > apply to a lot more than just MV’s and tie into what you’re speaking to > > above Benedict. > > > > > > > On Jun 30, 2020, at 8:32 PM, sankalp kohli > > wrote: > > > > > > I see this discussion as several decisions which can be made in small > > > increments. > > > > > > 1. In release cycles, when can we propose a feature to be deprecated or > > > marked experimental. Ideally a new feature should come out experimental > > if > > > required but we have several who are candidates now. We can work on > > > integrating this in the release lifecycle doc we already have. > > > 2. What is the process of making an existing feature experimental? How > > does > > > it affect major releases around testing. > > > 3. What is the process of deprecating/removing an experimental feature. > > > (Assuming experimental features should be deprecated/removed) > > > > > > Coming to MV, I think we need more data before we can say we > > > should deprecate MV. Here are some of them which should be part of > > > deprecation process > > > 1.Talk to customers who use them and understand what is the impact. > Give > > > them a forum to talk about it. > > > 2. Do we have enough resources to bring this feature out of the > > > experimental feature list in next 1 or 2 major releases. We cannot have > > too > > > many experimental features in the database. Marking a feature > > experimental > > > should not be a parking place for a non functioning feature but a place > > > while we stabilize it. > > > > > > > > > > > > > > >> On Tue, Jun 30, 2020 at 4:52 PM wrote: > > >> > > >> I followed up with the clarification about unit and dtests for that > > reason > > >> Dinesh. We test experimental features now. > > >> > > >> If we’re talking about adding experimental features to the 40 quality > > >> testing effort, how does that differ from just saying “we won’t > release > > >> until we’ve tested and stabilized these features and they’re no longer > > >> experimental”? > > >> > > >> Maybe I’m just misunderstanding something here? > > >> > > On Jun 30, 2020, at 7:12 PM, Dinesh Joshi > wrote: > > >>> > > >>> > > > > On Jun 30, 2020, at 4:05 PM, Brandon Williams > > wrote: > > > > Instead of ripping it out, we could instead disable them in the yaml > > with big fat warning comments around it. That way people already > > using them can just enable them again, but it will raise the bar for > > new users who ignore/miss the warnings in the logs and just use > them. > > >>> > > >>> Not a bad idea. Although, the real issue is that users enable MV on > a 3 > > >> node cluster with a few megs of data and conclude that MVs will > > >> horizontally scale with the size of data. This is what causes issues > for > > >> users who naively roll it out in production and discover that MVs do > not > > >> scale with their data growth. So whatever we do, the big fat warning > > should > > >> educate the unsuspecting operator. > > >>> > > >>> Dinesh > > >>>
Re: [DISCUSS] Future of MVs
> > "Make the scan faster" > "Make the scan incremental and automatic" > "Make it not blow up your page cache" > "Make losing your base replicas less likely". > > There's a concrete, real opportunity with MVs to create integrity > assertions we're missing. A dangling record from an MV that would point to > missing base data is something that could raise alarm bells and signal > JIRAs so we can potentially find and fix more surprise edge cases. > I agree with Jeff that there is some stuff to do to address the current MV issues and I am willing to focus on making them production ready. On Wed, Jul 1, 2020 at 2:58 AM wrote: > It would be incredibly helpful for us to have some empirical data and > agreed upon terms and benchmarks to help us navigate discussions like this: > > * How widely used is a feature in C* deployments worldwide? > * What are the primary issues users face when deploying them? Scaling > them? During failure scenarios? > * What does the engineering effort to bridge these gaps look like? Who > will do that? On what time horizon? > * What does our current test coverage for this feature look like? > * What shape of defects are arising with the feature? In a specific > subsection of the module or usage? > * Do we have an agreed upon set of standards for labeling a feature > stable? As experimental? If not, how do we get there? > * What effort will it take to bridge from where we are to where we agree > we need to be? On what timeline is this acceptable? > > I believe these are not only answerable questions, but fundamentally the > underlying themes our discussion alludes to. They’re also questions that > apply to a lot more than just MV’s and tie into what you’re speaking to > above Benedict. > > > > On Jun 30, 2020, at 8:32 PM, sankalp kohli > wrote: > > > > I see this discussion as several decisions which can be made in small > > increments. > > > > 1. In release cycles, when can we propose a feature to be deprecated or > > marked experimental. Ideally a new feature should come out experimental > if > > required but we have several who are candidates now. We can work on > > integrating this in the release lifecycle doc we already have. > > 2. What is the process of making an existing feature experimental? How > does > > it affect major releases around testing. > > 3. What is the process of deprecating/removing an experimental feature. > > (Assuming experimental features should be deprecated/removed) > > > > Coming to MV, I think we need more data before we can say we > > should deprecate MV. Here are some of them which should be part of > > deprecation process > > 1.Talk to customers who use them and understand what is the impact. Give > > them a forum to talk about it. > > 2. Do we have enough resources to bring this feature out of the > > experimental feature list in next 1 or 2 major releases. We cannot have > too > > many experimental features in the database. Marking a feature > experimental > > should not be a parking place for a non functioning feature but a place > > while we stabilize it. > > > > > > > > > >> On Tue, Jun 30, 2020 at 4:52 PM wrote: > >> > >> I followed up with the clarification about unit and dtests for that > reason > >> Dinesh. We test experimental features now. > >> > >> If we’re talking about adding experimental features to the 40 quality > >> testing effort, how does that differ from just saying “we won’t release > >> until we’ve tested and stabilized these features and they’re no longer > >> experimental”? > >> > >> Maybe I’m just misunderstanding something here? > >> > On Jun 30, 2020, at 7:12 PM, Dinesh Joshi wrote: > >>> > >>> > > On Jun 30, 2020, at 4:05 PM, Brandon Williams > wrote: > > Instead of ripping it out, we could instead disable them in the yaml > with big fat warning comments around it. That way people already > using them can just enable them again, but it will raise the bar for > new users who ignore/miss the warnings in the logs and just use them. > >>> > >>> Not a bad idea. Although, the real issue is that users enable MV on a 3 > >> node cluster with a few megs of data and conclude that MVs will > >> horizontally scale with the size of data. This is what causes issues for > >> users who naively roll it out in production and discover that MVs do not > >> scale with their data growth. So whatever we do, the big fat warning > should > >> educate the unsuspecting operator. > >>> > >>> Dinesh > >>> - > >>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > >>> For additional commands, e-mail: dev-h...@cassandra.apache.org > >>> > >> > >> - > >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > >> For additional commands, e-mail: dev-h...@cassandra.apache.org > >> > >> > > -
Re: [DISCUSS] Future of MVs
It would be incredibly helpful for us to have some empirical data and agreed upon terms and benchmarks to help us navigate discussions like this: * How widely used is a feature in C* deployments worldwide? * What are the primary issues users face when deploying them? Scaling them? During failure scenarios? * What does the engineering effort to bridge these gaps look like? Who will do that? On what time horizon? * What does our current test coverage for this feature look like? * What shape of defects are arising with the feature? In a specific subsection of the module or usage? * Do we have an agreed upon set of standards for labeling a feature stable? As experimental? If not, how do we get there? * What effort will it take to bridge from where we are to where we agree we need to be? On what timeline is this acceptable? I believe these are not only answerable questions, but fundamentally the underlying themes our discussion alludes to. They’re also questions that apply to a lot more than just MV’s and tie into what you’re speaking to above Benedict. > On Jun 30, 2020, at 8:32 PM, sankalp kohli wrote: > > I see this discussion as several decisions which can be made in small > increments. > > 1. In release cycles, when can we propose a feature to be deprecated or > marked experimental. Ideally a new feature should come out experimental if > required but we have several who are candidates now. We can work on > integrating this in the release lifecycle doc we already have. > 2. What is the process of making an existing feature experimental? How does > it affect major releases around testing. > 3. What is the process of deprecating/removing an experimental feature. > (Assuming experimental features should be deprecated/removed) > > Coming to MV, I think we need more data before we can say we > should deprecate MV. Here are some of them which should be part of > deprecation process > 1.Talk to customers who use them and understand what is the impact. Give > them a forum to talk about it. > 2. Do we have enough resources to bring this feature out of the > experimental feature list in next 1 or 2 major releases. We cannot have too > many experimental features in the database. Marking a feature experimental > should not be a parking place for a non functioning feature but a place > while we stabilize it. > > > > >> On Tue, Jun 30, 2020 at 4:52 PM wrote: >> >> I followed up with the clarification about unit and dtests for that reason >> Dinesh. We test experimental features now. >> >> If we’re talking about adding experimental features to the 40 quality >> testing effort, how does that differ from just saying “we won’t release >> until we’ve tested and stabilized these features and they’re no longer >> experimental”? >> >> Maybe I’m just misunderstanding something here? >> On Jun 30, 2020, at 7:12 PM, Dinesh Joshi wrote: >>> >>> On Jun 30, 2020, at 4:05 PM, Brandon Williams wrote: Instead of ripping it out, we could instead disable them in the yaml with big fat warning comments around it. That way people already using them can just enable them again, but it will raise the bar for new users who ignore/miss the warnings in the logs and just use them. >>> >>> Not a bad idea. Although, the real issue is that users enable MV on a 3 >> node cluster with a few megs of data and conclude that MVs will >> horizontally scale with the size of data. This is what causes issues for >> users who naively roll it out in production and discover that MVs do not >> scale with their data growth. So whatever we do, the big fat warning should >> educate the unsuspecting operator. >>> >>> Dinesh >>> - >>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>> For additional commands, e-mail: dev-h...@cassandra.apache.org >>> >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >> For additional commands, e-mail: dev-h...@cassandra.apache.org >> >> - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org
Re: [DISCUSS] Future of MVs
I see this discussion as several decisions which can be made in small increments. 1. In release cycles, when can we propose a feature to be deprecated or marked experimental. Ideally a new feature should come out experimental if required but we have several who are candidates now. We can work on integrating this in the release lifecycle doc we already have. 2. What is the process of making an existing feature experimental? How does it affect major releases around testing. 3. What is the process of deprecating/removing an experimental feature. (Assuming experimental features should be deprecated/removed) Coming to MV, I think we need more data before we can say we should deprecate MV. Here are some of them which should be part of deprecation process 1.Talk to customers who use them and understand what is the impact. Give them a forum to talk about it. 2. Do we have enough resources to bring this feature out of the experimental feature list in next 1 or 2 major releases. We cannot have too many experimental features in the database. Marking a feature experimental should not be a parking place for a non functioning feature but a place while we stabilize it. On Tue, Jun 30, 2020 at 4:52 PM wrote: > I followed up with the clarification about unit and dtests for that reason > Dinesh. We test experimental features now. > > If we’re talking about adding experimental features to the 40 quality > testing effort, how does that differ from just saying “we won’t release > until we’ve tested and stabilized these features and they’re no longer > experimental”? > > Maybe I’m just misunderstanding something here? > > > On Jun 30, 2020, at 7:12 PM, Dinesh Joshi wrote: > > > > > >> > >> On Jun 30, 2020, at 4:05 PM, Brandon Williams wrote: > >> > >> Instead of ripping it out, we could instead disable them in the yaml > >> with big fat warning comments around it. That way people already > >> using them can just enable them again, but it will raise the bar for > >> new users who ignore/miss the warnings in the logs and just use them. > > > > Not a bad idea. Although, the real issue is that users enable MV on a 3 > node cluster with a few megs of data and conclude that MVs will > horizontally scale with the size of data. This is what causes issues for > users who naively roll it out in production and discover that MVs do not > scale with their data growth. So whatever we do, the big fat warning should > educate the unsuspecting operator. > > > > Dinesh > > - > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > For additional commands, e-mail: dev-h...@cassandra.apache.org > > > > - > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > >
Re: [DISCUSS] Future of MVs
> On Jun 30, 2020, at 4:52 PM, joshua.mcken...@gmail.com wrote: > > I followed up with the clarification about unit and dtests for that reason > Dinesh. We test experimental features now. I hit send before seeing your clarification. I personally feel that unit and dtests may not surface regressions. I'd prefer the user community trying out the alpha, beta, RC releases and report regressions as they find them. Dinesh - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org
Re: [DISCUSS] Future of MVs
>>> Instead of ripping it out, we could instead disable them in the yaml >>> with big fat warning comments around it. FYI we have already disabled use of materialized views, SASI, and transient replication by default in 4.0 https://github.com/apache/cassandra/blob/trunk/conf/cassandra.yaml#L1393 > On Jun 30, 2020, at 6:53 PM, joshua.mcken...@gmail.com wrote: > > I followed up with the clarification about unit and dtests for that reason > Dinesh. We test experimental features now. > > If we’re talking about adding experimental features to the 40 quality testing > effort, how does that differ from just saying “we won’t release until we’ve > tested and stabilized these features and they’re no longer experimental”? > > Maybe I’m just misunderstanding something here? > >> On Jun 30, 2020, at 7:12 PM, Dinesh Joshi wrote: >> >> >>> On Jun 30, 2020, at 4:05 PM, Brandon Williams wrote: >>> >>> Instead of ripping it out, we could instead disable them in the yaml >>> with big fat warning comments around it. That way people already >>> using them can just enable them again, but it will raise the bar for >>> new users who ignore/miss the warnings in the logs and just use them. >> >> Not a bad idea. Although, the real issue is that users enable MV on a 3 node >> cluster with a few megs of data and conclude that MVs will horizontally >> scale with the size of data. This is what causes issues for users who >> naively roll it out in production and discover that MVs do not scale with >> their data growth. So whatever we do, the big fat warning should educate the >> unsuspecting operator. >> >> Dinesh >> - >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >> For additional commands, e-mail: dev-h...@cassandra.apache.org >> > > - > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org >
Re: [DISCUSS] Future of MVs
I followed up with the clarification about unit and dtests for that reason Dinesh. We test experimental features now. If we’re talking about adding experimental features to the 40 quality testing effort, how does that differ from just saying “we won’t release until we’ve tested and stabilized these features and they’re no longer experimental”? Maybe I’m just misunderstanding something here? > On Jun 30, 2020, at 7:12 PM, Dinesh Joshi wrote: > > >> >> On Jun 30, 2020, at 4:05 PM, Brandon Williams wrote: >> >> Instead of ripping it out, we could instead disable them in the yaml >> with big fat warning comments around it. That way people already >> using them can just enable them again, but it will raise the bar for >> new users who ignore/miss the warnings in the logs and just use them. > > Not a bad idea. Although, the real issue is that users enable MV on a 3 node > cluster with a few megs of data and conclude that MVs will horizontally scale > with the size of data. This is what causes issues for users who naively roll > it out in production and discover that MVs do not scale with their data > growth. So whatever we do, the big fat warning should educate the > unsuspecting operator. > > Dinesh > - > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org
Re: [DISCUSS] Future of MVs
> On Jun 30, 2020, at 4:05 PM, Brandon Williams wrote: > > Instead of ripping it out, we could instead disable them in the yaml > with big fat warning comments around it. That way people already > using them can just enable them again, but it will raise the bar for > new users who ignore/miss the warnings in the logs and just use them. Not a bad idea. Although, the real issue is that users enable MV on a 3 node cluster with a few megs of data and conclude that MVs will horizontally scale with the size of data. This is what causes issues for users who naively roll it out in production and discover that MVs do not scale with their data growth. So whatever we do, the big fat warning should educate the unsuspecting operator. Dinesh - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org
Re: [DISCUSS] Future of MVs
On Tue, Jun 30, 2020 at 5:41 PM wrote: > Given we’re at a place where things like MV’s and sasi are backing production > cases (power users one would hope or smaller use cases) I don’t think ripping > those features out and further excluding users from the ecosystem is the > right move. Instead of ripping it out, we could instead disable them in the yaml with big fat warning comments around it. That way people already using them can just enable them again, but it will raise the bar for new users who ignore/miss the warnings in the logs and just use them. - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org
Re: [DISCUSS] Future of MVs
> On Jun 30, 2020, at 3:40 PM, joshua.mcken...@gmail.com wrote: > > I don’t think we should hold up releases on testing experimental features. > Especially with how many of them we have. > > Given we’re at a place where things like MV’s and sasi are backing production > cases (power users one would hope or smaller use cases) Lets back up for a second here. MV's are backing production cases but we should not spend time in testing them for 4.0? That is inherently a contradictory position. Dinesh - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org
Re: [DISCUSS] Future of MVs
Just to clarify one thing. I understand experimental features to be alpha / beta quality, and as such the guarantees of correctness to differ from the other features presented in the database. We should likely articulate this in the wiki and docs if we have not. In the case of mv’s, since they began as a regular feature, obviously we don’t want a degradation in functionality on the feature, experimental or not. Our guarantees and codification of feature apis and functionality have historically taken the form of unit tests and dtests, which while limited in their ability to explore and test a state space do provide a minimal guarantee of api consistency that should be sufficient to maintain our contracts of correctness with experimental features. Sent from my iPhone > On Jun 30, 2020, at 6:40 PM, joshua.mcken...@gmail.com wrote: > - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org
Re: [DISCUSS] Future of MVs
I don’t think we should hold up releases on testing experimental features. Especially with how many of them we have. Agree re: needing a more quantitative bar for new additions which we can also retroactively apply to experimental features to bring up to speed and eventually graduate. Probably worth separately defining criteria for submission of a feature as experimental while we’re at it. Given we’re at a place where things like MV’s and sasi are backing production cases (power users one would hope or smaller use cases) I don’t think ripping those features out and further excluding users from the ecosystem is the right move. > On Jun 30, 2020, at 6:27 PM, David Capwell wrote: > > If that is the case then shouldn't we add MV to "4.0 Quality: Components > and Test Plans" (CASSANDRA-15536)? It is currently missing, so adding it > to the testing road map would be a clear sign that someone is planning to > champion and own this feature; if people feel that this is a broken > feature, shouldn't we have tests showing this? Would be great to see > traction here. > >> On Tue, Jun 30, 2020 at 3:11 PM Joshua McKenzie >> wrote: >> >> Let's forget I said anything about release cadence. That's another thread >> entirely and a good deep conversation to explore. Don't want to derail. >> >> If there's a question about "is anyone stepping forward to maintain MV's", >> I can say with certainty that at least one full time contributor I work >> with will engage and continue to work on and improve this feature going >> forward. Who precisely that ends up being stands to be seen; that's more >> fluid, but there are no plans to stop working on it going forward. >> >> On Tue, Jun 30, 2020 at 5:45 PM Benedict Elliott Smith < >> bened...@apache.org> >> wrote: >> >>> I don't think we can realistically expect majors, with the deprecation >>> cycle they entail, to come every six months. If nothing else, we would >>> have too many versions to maintain at once. I personally think all the >>> project needs on that front is clearer roadmapping at the start of a >>> release cycle, and we would be fine with 12-18mo release cycles. >>> >>> That's another whole discussion to distract us from 4.0, anyway - though >> I >>> think we can tolerate a few slow burn conversations. >>> >>> >>> On 30/06/2020, 22:10, "Joshua McKenzie" wrote: >>> >>>Seems like a reasonable point of view to me Sankalp. I'd also suggest >>> we >>>try to find other sources of data than just the user ML, like >>> searching on >>>github for instance. A collection of imperfect metrics beats just one >>> in my >>>experience. >>> >>>Though I would ask why we're having this discussion this late in the >>>release cycle when we have what, 4 tickets left until cutting beta 1? >>> Seems >>>like the kind of thing we could reasonably defer while we focus on >>> getting >>>4.0 out, though I'm sympathetic to the "release is cutoff for >>> deprecation" >>>argument. >>> >>>If we cadence our majors to calendar (like every 6 months for >> example) >>>instead of scope this would become significantly less of a big issue >>> imo. >>> >>>On Tue, Jun 30, 2020 at 5:01 PM sankalp kohli < >> kohlisank...@gmail.com> >>>wrote: >>> Hi, I think we should revisit all features which require a lot more >>> work to make them work. Here is how I think we should do for each one of >> them 1. Identify such features and some details of why they are >>> deprecation candidates. 2. Ask the dev list if anyone is willing to work on improving them >>> over the next 1 or 2 major releases. 3. We then move to the user list to find who all are using it and >> if >>> they are opposed to removing/deprecating it. Assuming few will be using >>> it, we need to see the tradeoff of keeping it vs removing it on a case by >>> case basis. 4. Deprecate it in the next major or make it experimental if #2 and >>> #3 removes them from deprecation. 5. Remove it in next major For MV, I see this email as step #2. We should move to asking the >>> user list next. Thanks, Sankalp On Tue, Jun 30, 2020 at 1:46 PM Joshua McKenzie < >>> jmcken...@apache.org> wrote: > We're just short of 98 tickets on the component since it's >>> original merge > so at least *some* work has been done to stabilize them. Not to >>> say I'm > endorsing running them at massive scale today without knowing >> what >>> you're > doing, to be clear. They are perhaps our largest loaded gun of a >>> feature of > self-foot-shooting atm. Zhao did a bunch of work on them >>> internally and > we've backported much of that to OSS; I've pinged him to chime in >>> here. > > The "data is orphaned in your view when you lose all base >>> replicas" issue > is more or less "unsolvable", since a scan of a view to confirm >>> data
Re: [DISCUSS] Future of MVs
> On Jun 30, 2020, at 3:27 PM, David Capwell wrote: > > If that is the case then shouldn't we add MV to "4.0 Quality: Components > and Test Plans" (CASSANDRA-15536)? It is currently missing, so adding it > to the testing road map would be a clear sign that someone is planning to > champion and own this feature; if people feel that this is a broken > feature, shouldn't we have tests showing this? Would be great to see > traction here. Good point, we should definitely test it to ensure there are no regressions even though it is marked as experimental. I'd also like to clarify that the feature works for a certain subset of use-cases when it is limited to a certain scale. It unfortunately does not scale well with the size of data. I think it is important to call out this distinction. For many users, it's acceptable. For others it is not. Dinesh - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org
Re: [DISCUSS] Future of MVs
On Wed, Jul 1, 2020 at 10:27 AM David Capwell wrote: > If that is the case then shouldn't we add MV to "4.0 Quality: Components > and Test Plans" (CASSANDRA-15536)? It is currently missing, so adding it > to the testing road map would be a clear sign that someone is planning to > champion and own this feature; if people feel that this is a broken > feature, shouldn't we have tests showing this? Would be great to see > traction here. > +1 - Surfacing it like that feels like a good next step to me.
Re: [DISCUSS] Future of MVs
If that is the case then shouldn't we add MV to "4.0 Quality: Components and Test Plans" (CASSANDRA-15536)? It is currently missing, so adding it to the testing road map would be a clear sign that someone is planning to champion and own this feature; if people feel that this is a broken feature, shouldn't we have tests showing this? Would be great to see traction here. On Tue, Jun 30, 2020 at 3:11 PM Joshua McKenzie wrote: > Let's forget I said anything about release cadence. That's another thread > entirely and a good deep conversation to explore. Don't want to derail. > > If there's a question about "is anyone stepping forward to maintain MV's", > I can say with certainty that at least one full time contributor I work > with will engage and continue to work on and improve this feature going > forward. Who precisely that ends up being stands to be seen; that's more > fluid, but there are no plans to stop working on it going forward. > > On Tue, Jun 30, 2020 at 5:45 PM Benedict Elliott Smith < > bened...@apache.org> > wrote: > > > I don't think we can realistically expect majors, with the deprecation > > cycle they entail, to come every six months. If nothing else, we would > > have too many versions to maintain at once. I personally think all the > > project needs on that front is clearer roadmapping at the start of a > > release cycle, and we would be fine with 12-18mo release cycles. > > > > That's another whole discussion to distract us from 4.0, anyway - though > I > > think we can tolerate a few slow burn conversations. > > > > > > On 30/06/2020, 22:10, "Joshua McKenzie" wrote: > > > > Seems like a reasonable point of view to me Sankalp. I'd also suggest > > we > > try to find other sources of data than just the user ML, like > > searching on > > github for instance. A collection of imperfect metrics beats just one > > in my > > experience. > > > > Though I would ask why we're having this discussion this late in the > > release cycle when we have what, 4 tickets left until cutting beta 1? > > Seems > > like the kind of thing we could reasonably defer while we focus on > > getting > > 4.0 out, though I'm sympathetic to the "release is cutoff for > > deprecation" > > argument. > > > > If we cadence our majors to calendar (like every 6 months for > example) > > instead of scope this would become significantly less of a big issue > > imo. > > > > On Tue, Jun 30, 2020 at 5:01 PM sankalp kohli < > kohlisank...@gmail.com> > > wrote: > > > > > Hi, > > > I think we should revisit all features which require a lot more > > work to > > > make them work. Here is how I think we should do for each one of > them > > > > > > 1. Identify such features and some details of why they are > > deprecation > > > candidates. > > > 2. Ask the dev list if anyone is willing to work on improving them > > over the > > > next 1 or 2 major releases. > > > 3. We then move to the user list to find who all are using it and > if > > they > > > are opposed to removing/deprecating it. Assuming few will be using > > it, we > > > need to see the tradeoff of keeping it vs removing it on a case by > > case > > > basis. > > > 4. Deprecate it in the next major or make it experimental if #2 and > > #3 > > > removes them from deprecation. > > > 5. Remove it in next major > > > > > > For MV, I see this email as step #2. We should move to asking the > > user list > > > next. > > > > > > Thanks, > > > Sankalp > > > > > > On Tue, Jun 30, 2020 at 1:46 PM Joshua McKenzie < > > jmcken...@apache.org> > > > wrote: > > > > > > > We're just short of 98 tickets on the component since it's > > original merge > > > > so at least *some* work has been done to stabilize them. Not to > > say I'm > > > > endorsing running them at massive scale today without knowing > what > > you're > > > > doing, to be clear. They are perhaps our largest loaded gun of a > > feature > > > of > > > > self-foot-shooting atm. Zhao did a bunch of work on them > > internally and > > > > we've backported much of that to OSS; I've pinged him to chime in > > here. > > > > > > > > The "data is orphaned in your view when you lose all base > > replicas" issue > > > > is more or less "unsolvable", since a scan of a view to confirm > > data in > > > the > > > > base table is so slow you're talking weeks to process and it > > totally > > > > trashes your page cache. I think Paulo landed on a "you have to > > rebuild > > > the > > > > view if you lose all base data" reality. There's also, I believe, > > the > > > > unresolved issue of modeling how much data a base table with one > > to many > > > > views will end up taking up in its final form when denormalized. > > This > > > could > > > > be vastly improved with something like an "EXPLAIN ANALYZE" for a > >
Re: [DISCUSS] Future of MVs
I think the point is that we need to have a clear plan of action to bring features up to an acceptable standard. That also implies a need to agree how we determine if a feature has reached an acceptable standard - both going forwards and retrospectively. For those that don't reach that standard today, we need something like a retrospective CEP to agree how to rectify that. Then we can figure out if the necessary resources can be mustered, or if we need to consider obsolescence. I'm not convinced this discussion has to be resolved immediately, but that's how I view the situation. On 30/06/2020, 23:11, "Joshua McKenzie" wrote: Let's forget I said anything about release cadence. That's another thread entirely and a good deep conversation to explore. Don't want to derail. If there's a question about "is anyone stepping forward to maintain MV's", I can say with certainty that at least one full time contributor I work with will engage and continue to work on and improve this feature going forward. Who precisely that ends up being stands to be seen; that's more fluid, but there are no plans to stop working on it going forward. On Tue, Jun 30, 2020 at 5:45 PM Benedict Elliott Smith wrote: > I don't think we can realistically expect majors, with the deprecation > cycle they entail, to come every six months. If nothing else, we would > have too many versions to maintain at once. I personally think all the > project needs on that front is clearer roadmapping at the start of a > release cycle, and we would be fine with 12-18mo release cycles. > > That's another whole discussion to distract us from 4.0, anyway - though I > think we can tolerate a few slow burn conversations. > > > On 30/06/2020, 22:10, "Joshua McKenzie" wrote: > > Seems like a reasonable point of view to me Sankalp. I'd also suggest > we > try to find other sources of data than just the user ML, like > searching on > github for instance. A collection of imperfect metrics beats just one > in my > experience. > > Though I would ask why we're having this discussion this late in the > release cycle when we have what, 4 tickets left until cutting beta 1? > Seems > like the kind of thing we could reasonably defer while we focus on > getting > 4.0 out, though I'm sympathetic to the "release is cutoff for > deprecation" > argument. > > If we cadence our majors to calendar (like every 6 months for example) > instead of scope this would become significantly less of a big issue > imo. > > On Tue, Jun 30, 2020 at 5:01 PM sankalp kohli > wrote: > > > Hi, > > I think we should revisit all features which require a lot more > work to > > make them work. Here is how I think we should do for each one of them > > > > 1. Identify such features and some details of why they are > deprecation > > candidates. > > 2. Ask the dev list if anyone is willing to work on improving them > over the > > next 1 or 2 major releases. > > 3. We then move to the user list to find who all are using it and if > they > > are opposed to removing/deprecating it. Assuming few will be using > it, we > > need to see the tradeoff of keeping it vs removing it on a case by > case > > basis. > > 4. Deprecate it in the next major or make it experimental if #2 and > #3 > > removes them from deprecation. > > 5. Remove it in next major > > > > For MV, I see this email as step #2. We should move to asking the > user list > > next. > > > > Thanks, > > Sankalp > > > > On Tue, Jun 30, 2020 at 1:46 PM Joshua McKenzie < > jmcken...@apache.org> > > wrote: > > > > > We're just short of 98 tickets on the component since it's > original merge > > > so at least *some* work has been done to stabilize them. Not to > say I'm > > > endorsing running them at massive scale today without knowing what > you're > > > doing, to be clear. They are perhaps our largest loaded gun of a > feature > > of > > > self-foot-shooting atm. Zhao did a bunch of work on them > internally and > > > we've backported much of that to OSS; I've pinged him to chime in > here. > > > > > > The "data is orphaned in your view when you lose all base > replicas" issue > > > is more or less "unsolvable", since a scan of a view to confirm > data in > > the > > > base table is so slow you're talking weeks to process and it > totally > > > trashes your page cache. I think Paulo landed on a "you have to > rebuild > > the
Re: [DISCUSS] Future of MVs
Let's forget I said anything about release cadence. That's another thread entirely and a good deep conversation to explore. Don't want to derail. If there's a question about "is anyone stepping forward to maintain MV's", I can say with certainty that at least one full time contributor I work with will engage and continue to work on and improve this feature going forward. Who precisely that ends up being stands to be seen; that's more fluid, but there are no plans to stop working on it going forward. On Tue, Jun 30, 2020 at 5:45 PM Benedict Elliott Smith wrote: > I don't think we can realistically expect majors, with the deprecation > cycle they entail, to come every six months. If nothing else, we would > have too many versions to maintain at once. I personally think all the > project needs on that front is clearer roadmapping at the start of a > release cycle, and we would be fine with 12-18mo release cycles. > > That's another whole discussion to distract us from 4.0, anyway - though I > think we can tolerate a few slow burn conversations. > > > On 30/06/2020, 22:10, "Joshua McKenzie" wrote: > > Seems like a reasonable point of view to me Sankalp. I'd also suggest > we > try to find other sources of data than just the user ML, like > searching on > github for instance. A collection of imperfect metrics beats just one > in my > experience. > > Though I would ask why we're having this discussion this late in the > release cycle when we have what, 4 tickets left until cutting beta 1? > Seems > like the kind of thing we could reasonably defer while we focus on > getting > 4.0 out, though I'm sympathetic to the "release is cutoff for > deprecation" > argument. > > If we cadence our majors to calendar (like every 6 months for example) > instead of scope this would become significantly less of a big issue > imo. > > On Tue, Jun 30, 2020 at 5:01 PM sankalp kohli > wrote: > > > Hi, > > I think we should revisit all features which require a lot more > work to > > make them work. Here is how I think we should do for each one of them > > > > 1. Identify such features and some details of why they are > deprecation > > candidates. > > 2. Ask the dev list if anyone is willing to work on improving them > over the > > next 1 or 2 major releases. > > 3. We then move to the user list to find who all are using it and if > they > > are opposed to removing/deprecating it. Assuming few will be using > it, we > > need to see the tradeoff of keeping it vs removing it on a case by > case > > basis. > > 4. Deprecate it in the next major or make it experimental if #2 and > #3 > > removes them from deprecation. > > 5. Remove it in next major > > > > For MV, I see this email as step #2. We should move to asking the > user list > > next. > > > > Thanks, > > Sankalp > > > > On Tue, Jun 30, 2020 at 1:46 PM Joshua McKenzie < > jmcken...@apache.org> > > wrote: > > > > > We're just short of 98 tickets on the component since it's > original merge > > > so at least *some* work has been done to stabilize them. Not to > say I'm > > > endorsing running them at massive scale today without knowing what > you're > > > doing, to be clear. They are perhaps our largest loaded gun of a > feature > > of > > > self-foot-shooting atm. Zhao did a bunch of work on them > internally and > > > we've backported much of that to OSS; I've pinged him to chime in > here. > > > > > > The "data is orphaned in your view when you lose all base > replicas" issue > > > is more or less "unsolvable", since a scan of a view to confirm > data in > > the > > > base table is so slow you're talking weeks to process and it > totally > > > trashes your page cache. I think Paulo landed on a "you have to > rebuild > > the > > > view if you lose all base data" reality. There's also, I believe, > the > > > unresolved issue of modeling how much data a base table with one > to many > > > views will end up taking up in its final form when denormalized. > This > > could > > > be vastly improved with something like an "EXPLAIN ANALYZE" for a > table > > > with views, if you'll excuse the mapping, to show "N bytes in base > will > > > become M with base + views" or something. > > > > > > Last but definitely not least in dumping the state in my head > about this, > > > there's a bunch of potential for guardrailing people away from > self-harm > > > with MV's if we decide to go the route of guardrails (link: > > > > > > > > > https://cwiki.apache.org/confluence/display/CASSANDRA/%28DRAFT%29+-+CEP-3%3A+Guardrails > > > ). > > > > > > So from my PoV, I'm against us just voting to deprecate and remove > > without > > > going into more depth into the current state of things and what > options > >
Re: [DISCUSS] Future of MVs
I don't think we can realistically expect majors, with the deprecation cycle they entail, to come every six months. If nothing else, we would have too many versions to maintain at once. I personally think all the project needs on that front is clearer roadmapping at the start of a release cycle, and we would be fine with 12-18mo release cycles. That's another whole discussion to distract us from 4.0, anyway - though I think we can tolerate a few slow burn conversations. On 30/06/2020, 22:10, "Joshua McKenzie" wrote: Seems like a reasonable point of view to me Sankalp. I'd also suggest we try to find other sources of data than just the user ML, like searching on github for instance. A collection of imperfect metrics beats just one in my experience. Though I would ask why we're having this discussion this late in the release cycle when we have what, 4 tickets left until cutting beta 1? Seems like the kind of thing we could reasonably defer while we focus on getting 4.0 out, though I'm sympathetic to the "release is cutoff for deprecation" argument. If we cadence our majors to calendar (like every 6 months for example) instead of scope this would become significantly less of a big issue imo. On Tue, Jun 30, 2020 at 5:01 PM sankalp kohli wrote: > Hi, > I think we should revisit all features which require a lot more work to > make them work. Here is how I think we should do for each one of them > > 1. Identify such features and some details of why they are deprecation > candidates. > 2. Ask the dev list if anyone is willing to work on improving them over the > next 1 or 2 major releases. > 3. We then move to the user list to find who all are using it and if they > are opposed to removing/deprecating it. Assuming few will be using it, we > need to see the tradeoff of keeping it vs removing it on a case by case > basis. > 4. Deprecate it in the next major or make it experimental if #2 and #3 > removes them from deprecation. > 5. Remove it in next major > > For MV, I see this email as step #2. We should move to asking the user list > next. > > Thanks, > Sankalp > > On Tue, Jun 30, 2020 at 1:46 PM Joshua McKenzie > wrote: > > > We're just short of 98 tickets on the component since it's original merge > > so at least *some* work has been done to stabilize them. Not to say I'm > > endorsing running them at massive scale today without knowing what you're > > doing, to be clear. They are perhaps our largest loaded gun of a feature > of > > self-foot-shooting atm. Zhao did a bunch of work on them internally and > > we've backported much of that to OSS; I've pinged him to chime in here. > > > > The "data is orphaned in your view when you lose all base replicas" issue > > is more or less "unsolvable", since a scan of a view to confirm data in > the > > base table is so slow you're talking weeks to process and it totally > > trashes your page cache. I think Paulo landed on a "you have to rebuild > the > > view if you lose all base data" reality. There's also, I believe, the > > unresolved issue of modeling how much data a base table with one to many > > views will end up taking up in its final form when denormalized. This > could > > be vastly improved with something like an "EXPLAIN ANALYZE" for a table > > with views, if you'll excuse the mapping, to show "N bytes in base will > > become M with base + views" or something. > > > > Last but definitely not least in dumping the state in my head about this, > > there's a bunch of potential for guardrailing people away from self-harm > > with MV's if we decide to go the route of guardrails (link: > > > > > https://cwiki.apache.org/confluence/display/CASSANDRA/%28DRAFT%29+-+CEP-3%3A+Guardrails > > ). > > > > So from my PoV, I'm against us just voting to deprecate and remove > without > > going into more depth into the current state of things and what options > are > > on the table, since people will continue to build MV's at the client > level > > which, in theory, should have worse correctness and performance > > characteristics than having a clean and well stabilized implementation in > > the coordinator. > > > > Having them flagged as experimental for now as we stabilize 4.0 and get > > things out the door *seems* sufficient to me, but if people are widely > > using these out in the wild and ignoring that status and the > corresponding > > warning, maybe we consider raising the volume on that warning for 4.0 > while > > we figure this out. > > > > Just my .02. > > > > ~Josh > > > > On Tue, Jun 30, 2020 at 4:22 PM Dinesh Joshi wrote: > > > > > > On Jun 30, 2020, at 12:43 PM, J
Re: [DISCUSS] Future of MVs
On Tue, Jun 30, 2020 at 1:46 PM Joshua McKenzie wrote: > We're just short of 98 tickets on the component since it's original merge > so at least *some* work has been done to stabilize them. Not to say I'm > endorsing running them at massive scale today without knowing what you're > doing, to be clear. They are perhaps our largest loaded gun of a feature of > self-foot-shooting atm. Zhao did a bunch of work on them internally and > we've backported much of that to OSS; I've pinged him to chime in here. > Probably true. > > The "data is orphaned in your view when you lose all base replicas" issue > is more or less "unsolvable", since a scan of a view to confirm data in the > base table is so slow you're talking weeks to process and it totally > trashes your page cache. "Make the scan faster" "Make the scan incremental and automatic" "Make it not blow up your page cache" "Make losing your base replicas less likely". There's a concrete, real opportunity with MVs to create integrity assertions we're missing. A dangling record from an MV that would point to missing base data is something that could raise alarm bells and signal JIRAs so we can potentially find and fix more surprise edge cases. > So from my PoV, I'm against us just voting to deprecate and remove without > going into more depth into the current state of things and what options are > on the table, since people will continue to build MV's at the client level > which, in theory, should have worse correctness and performance > characteristics than having a clean and well stabilized implementation in > the coordinator. > Yanking features will definitely be painful for users. Leaving it experimental seems much better for users as long as the maintenance overhead is tolerable.
Re: [DISCUSS] Future of MVs
I think, just as importantly, we also need to grapple with what went wrong when features landed this way, since these were not isolated occurrences - suggesting structural issues were at play. I'm not sure if a retrospective is viable with this organisational structure, but we can perhaps engage with it implicitly, in a positive way, by working to create a framework with clear expectations for how features should be delivered - to go hand-in-hand with CEP proposals. This framework can then also be applied to existing features considered to be inadequate, as we decide how to move forward with them. On 30/06/2020, 22:01, "sankalp kohli" wrote: Hi, I think we should revisit all features which require a lot more work to make them work. Here is how I think we should do for each one of them 1. Identify such features and some details of why they are deprecation candidates. 2. Ask the dev list if anyone is willing to work on improving them over the next 1 or 2 major releases. 3. We then move to the user list to find who all are using it and if they are opposed to removing/deprecating it. Assuming few will be using it, we need to see the tradeoff of keeping it vs removing it on a case by case basis. 4. Deprecate it in the next major or make it experimental if #2 and #3 removes them from deprecation. 5. Remove it in next major For MV, I see this email as step #2. We should move to asking the user list next. Thanks, Sankalp On Tue, Jun 30, 2020 at 1:46 PM Joshua McKenzie wrote: > We're just short of 98 tickets on the component since it's original merge > so at least *some* work has been done to stabilize them. Not to say I'm > endorsing running them at massive scale today without knowing what you're > doing, to be clear. They are perhaps our largest loaded gun of a feature of > self-foot-shooting atm. Zhao did a bunch of work on them internally and > we've backported much of that to OSS; I've pinged him to chime in here. > > The "data is orphaned in your view when you lose all base replicas" issue > is more or less "unsolvable", since a scan of a view to confirm data in the > base table is so slow you're talking weeks to process and it totally > trashes your page cache. I think Paulo landed on a "you have to rebuild the > view if you lose all base data" reality. There's also, I believe, the > unresolved issue of modeling how much data a base table with one to many > views will end up taking up in its final form when denormalized. This could > be vastly improved with something like an "EXPLAIN ANALYZE" for a table > with views, if you'll excuse the mapping, to show "N bytes in base will > become M with base + views" or something. > > Last but definitely not least in dumping the state in my head about this, > there's a bunch of potential for guardrailing people away from self-harm > with MV's if we decide to go the route of guardrails (link: > > https://cwiki.apache.org/confluence/display/CASSANDRA/%28DRAFT%29+-+CEP-3%3A+Guardrails > ). > > So from my PoV, I'm against us just voting to deprecate and remove without > going into more depth into the current state of things and what options are > on the table, since people will continue to build MV's at the client level > which, in theory, should have worse correctness and performance > characteristics than having a clean and well stabilized implementation in > the coordinator. > > Having them flagged as experimental for now as we stabilize 4.0 and get > things out the door *seems* sufficient to me, but if people are widely > using these out in the wild and ignoring that status and the corresponding > warning, maybe we consider raising the volume on that warning for 4.0 while > we figure this out. > > Just my .02. > > ~Josh > > On Tue, Jun 30, 2020 at 4:22 PM Dinesh Joshi wrote: > > > > On Jun 30, 2020, at 12:43 PM, Jon Haddad wrote: > > > > > > As we move forward with the 4.0 release, we should consider this an > > > opportunity to deprecate materialized views, and remove them in 5.0. > We > > > should take this opportunity to learn from the mistake and raise the > bar > > > for new features to undergo a much more thorough run the wringer before > > > merging. > > > > I'm in favor of marking them as deprecated and removing them in 5.0. If > > someone steps up and can fix them in 5.0, then we always have the option > of > > accepting the fix. > > > > Dinesh > > - > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > For additional commands, e-mail: dev-h...@cassandra.apache.org > > > > > ---
Re: [DISCUSS] Future of MVs
Seems like a reasonable point of view to me Sankalp. I'd also suggest we try to find other sources of data than just the user ML, like searching on github for instance. A collection of imperfect metrics beats just one in my experience. Though I would ask why we're having this discussion this late in the release cycle when we have what, 4 tickets left until cutting beta 1? Seems like the kind of thing we could reasonably defer while we focus on getting 4.0 out, though I'm sympathetic to the "release is cutoff for deprecation" argument. If we cadence our majors to calendar (like every 6 months for example) instead of scope this would become significantly less of a big issue imo. On Tue, Jun 30, 2020 at 5:01 PM sankalp kohli wrote: > Hi, > I think we should revisit all features which require a lot more work to > make them work. Here is how I think we should do for each one of them > > 1. Identify such features and some details of why they are deprecation > candidates. > 2. Ask the dev list if anyone is willing to work on improving them over the > next 1 or 2 major releases. > 3. We then move to the user list to find who all are using it and if they > are opposed to removing/deprecating it. Assuming few will be using it, we > need to see the tradeoff of keeping it vs removing it on a case by case > basis. > 4. Deprecate it in the next major or make it experimental if #2 and #3 > removes them from deprecation. > 5. Remove it in next major > > For MV, I see this email as step #2. We should move to asking the user list > next. > > Thanks, > Sankalp > > On Tue, Jun 30, 2020 at 1:46 PM Joshua McKenzie > wrote: > > > We're just short of 98 tickets on the component since it's original merge > > so at least *some* work has been done to stabilize them. Not to say I'm > > endorsing running them at massive scale today without knowing what you're > > doing, to be clear. They are perhaps our largest loaded gun of a feature > of > > self-foot-shooting atm. Zhao did a bunch of work on them internally and > > we've backported much of that to OSS; I've pinged him to chime in here. > > > > The "data is orphaned in your view when you lose all base replicas" issue > > is more or less "unsolvable", since a scan of a view to confirm data in > the > > base table is so slow you're talking weeks to process and it totally > > trashes your page cache. I think Paulo landed on a "you have to rebuild > the > > view if you lose all base data" reality. There's also, I believe, the > > unresolved issue of modeling how much data a base table with one to many > > views will end up taking up in its final form when denormalized. This > could > > be vastly improved with something like an "EXPLAIN ANALYZE" for a table > > with views, if you'll excuse the mapping, to show "N bytes in base will > > become M with base + views" or something. > > > > Last but definitely not least in dumping the state in my head about this, > > there's a bunch of potential for guardrailing people away from self-harm > > with MV's if we decide to go the route of guardrails (link: > > > > > https://cwiki.apache.org/confluence/display/CASSANDRA/%28DRAFT%29+-+CEP-3%3A+Guardrails > > ). > > > > So from my PoV, I'm against us just voting to deprecate and remove > without > > going into more depth into the current state of things and what options > are > > on the table, since people will continue to build MV's at the client > level > > which, in theory, should have worse correctness and performance > > characteristics than having a clean and well stabilized implementation in > > the coordinator. > > > > Having them flagged as experimental for now as we stabilize 4.0 and get > > things out the door *seems* sufficient to me, but if people are widely > > using these out in the wild and ignoring that status and the > corresponding > > warning, maybe we consider raising the volume on that warning for 4.0 > while > > we figure this out. > > > > Just my .02. > > > > ~Josh > > > > On Tue, Jun 30, 2020 at 4:22 PM Dinesh Joshi wrote: > > > > > > On Jun 30, 2020, at 12:43 PM, Jon Haddad wrote: > > > > > > > > As we move forward with the 4.0 release, we should consider this an > > > > opportunity to deprecate materialized views, and remove them in 5.0. > > We > > > > should take this opportunity to learn from the mistake and raise the > > bar > > > > for new features to undergo a much more thorough run the wringer > before > > > > merging. > > > > > > I'm in favor of marking them as deprecated and removing them in 5.0. If > > > someone steps up and can fix them in 5.0, then we always have the > option > > of > > > accepting the fix. > > > > > > Dinesh > > > - > > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > > For additional commands, e-mail: dev-h...@cassandra.apache.org > > > > > > > > >
Re: [DISCUSS] Future of MVs
> So from my PoV, I'm against us just voting to deprecate and remove without > going into more depth into the current state of things and what options are > on the table, since people will continue to build MV's at the client level > which, in theory, should have worse correctness and performance > characteristics than having a clean and well stabilized implementation in > the coordinator. I agree with Josh here. Multiple people have put in effort to improve the stability of MV’s since they were first put into the code base and the reasons for having them be in the DB have not changed. Building MV like tables at the client level is actually harder to get right than doing it in the server. -Jeremiah > On Jun 30, 2020, at 3:45 PM, Joshua McKenzie wrote: > > We're just short of 98 tickets on the component since it's original merge > so at least *some* work has been done to stabilize them. Not to say I'm > endorsing running them at massive scale today without knowing what you're > doing, to be clear. They are perhaps our largest loaded gun of a feature of > self-foot-shooting atm. Zhao did a bunch of work on them internally and > we've backported much of that to OSS; I've pinged him to chime in here. > > The "data is orphaned in your view when you lose all base replicas" issue > is more or less "unsolvable", since a scan of a view to confirm data in the > base table is so slow you're talking weeks to process and it totally > trashes your page cache. I think Paulo landed on a "you have to rebuild the > view if you lose all base data" reality. There's also, I believe, the > unresolved issue of modeling how much data a base table with one to many > views will end up taking up in its final form when denormalized. This could > be vastly improved with something like an "EXPLAIN ANALYZE" for a table > with views, if you'll excuse the mapping, to show "N bytes in base will > become M with base + views" or something. > > Last but definitely not least in dumping the state in my head about this, > there's a bunch of potential for guardrailing people away from self-harm > with MV's if we decide to go the route of guardrails (link: > https://cwiki.apache.org/confluence/display/CASSANDRA/%28DRAFT%29+-+CEP-3%3A+Guardrails > ). > > So from my PoV, I'm against us just voting to deprecate and remove without > going into more depth into the current state of things and what options are > on the table, since people will continue to build MV's at the client level > which, in theory, should have worse correctness and performance > characteristics than having a clean and well stabilized implementation in > the coordinator. > > Having them flagged as experimental for now as we stabilize 4.0 and get > things out the door *seems* sufficient to me, but if people are widely > using these out in the wild and ignoring that status and the corresponding > warning, maybe we consider raising the volume on that warning for 4.0 while > we figure this out. > > Just my .02. > > ~Josh > > On Tue, Jun 30, 2020 at 4:22 PM Dinesh Joshi wrote: > >>> On Jun 30, 2020, at 12:43 PM, Jon Haddad wrote: >>> >>> As we move forward with the 4.0 release, we should consider this an >>> opportunity to deprecate materialized views, and remove them in 5.0. We >>> should take this opportunity to learn from the mistake and raise the bar >>> for new features to undergo a much more thorough run the wringer before >>> merging. >> >> I'm in favor of marking them as deprecated and removing them in 5.0. If >> someone steps up and can fix them in 5.0, then we always have the option of >> accepting the fix. >> >> Dinesh >> - >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >> For additional commands, e-mail: dev-h...@cassandra.apache.org >> >> - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org
Re: [DISCUSS] Future of MVs
Hi, I think we should revisit all features which require a lot more work to make them work. Here is how I think we should do for each one of them 1. Identify such features and some details of why they are deprecation candidates. 2. Ask the dev list if anyone is willing to work on improving them over the next 1 or 2 major releases. 3. We then move to the user list to find who all are using it and if they are opposed to removing/deprecating it. Assuming few will be using it, we need to see the tradeoff of keeping it vs removing it on a case by case basis. 4. Deprecate it in the next major or make it experimental if #2 and #3 removes them from deprecation. 5. Remove it in next major For MV, I see this email as step #2. We should move to asking the user list next. Thanks, Sankalp On Tue, Jun 30, 2020 at 1:46 PM Joshua McKenzie wrote: > We're just short of 98 tickets on the component since it's original merge > so at least *some* work has been done to stabilize them. Not to say I'm > endorsing running them at massive scale today without knowing what you're > doing, to be clear. They are perhaps our largest loaded gun of a feature of > self-foot-shooting atm. Zhao did a bunch of work on them internally and > we've backported much of that to OSS; I've pinged him to chime in here. > > The "data is orphaned in your view when you lose all base replicas" issue > is more or less "unsolvable", since a scan of a view to confirm data in the > base table is so slow you're talking weeks to process and it totally > trashes your page cache. I think Paulo landed on a "you have to rebuild the > view if you lose all base data" reality. There's also, I believe, the > unresolved issue of modeling how much data a base table with one to many > views will end up taking up in its final form when denormalized. This could > be vastly improved with something like an "EXPLAIN ANALYZE" for a table > with views, if you'll excuse the mapping, to show "N bytes in base will > become M with base + views" or something. > > Last but definitely not least in dumping the state in my head about this, > there's a bunch of potential for guardrailing people away from self-harm > with MV's if we decide to go the route of guardrails (link: > > https://cwiki.apache.org/confluence/display/CASSANDRA/%28DRAFT%29+-+CEP-3%3A+Guardrails > ). > > So from my PoV, I'm against us just voting to deprecate and remove without > going into more depth into the current state of things and what options are > on the table, since people will continue to build MV's at the client level > which, in theory, should have worse correctness and performance > characteristics than having a clean and well stabilized implementation in > the coordinator. > > Having them flagged as experimental for now as we stabilize 4.0 and get > things out the door *seems* sufficient to me, but if people are widely > using these out in the wild and ignoring that status and the corresponding > warning, maybe we consider raising the volume on that warning for 4.0 while > we figure this out. > > Just my .02. > > ~Josh > > On Tue, Jun 30, 2020 at 4:22 PM Dinesh Joshi wrote: > > > > On Jun 30, 2020, at 12:43 PM, Jon Haddad wrote: > > > > > > As we move forward with the 4.0 release, we should consider this an > > > opportunity to deprecate materialized views, and remove them in 5.0. > We > > > should take this opportunity to learn from the mistake and raise the > bar > > > for new features to undergo a much more thorough run the wringer before > > > merging. > > > > I'm in favor of marking them as deprecated and removing them in 5.0. If > > someone steps up and can fix them in 5.0, then we always have the option > of > > accepting the fix. > > > > Dinesh > > - > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > For additional commands, e-mail: dev-h...@cassandra.apache.org > > > > >
Re: [DISCUSS] Future of MVs
> While at TLP, I helped numerous customers move off of MVs, mostly because > they affected stability of clusters in a horrific way. The most telling > project involved helping someone create new tables to manage 1GB of data > because the views performed so poorly they made the cluster unresponsive > and unusable. The documented way to report bugs: https://cassandra.apache.org/doc/latest/bugs.html# with JIRA, Version, Environment. > As we move forward with the 4.0 release, we should consider this an opportunity to deprecate materialized views, and remove them in 5.0. While the community is focusing on 4.0 and unable to review CEP/Improvements, should we discuss it when community is ready to discuss about CEP/Improvements? > We should take this opportunity to learn from the mistake and raise the bar > for new features to undergo a much more thorough run the wringer before > merging. Agreed to learn from mistakes, but there are still users using MV. I think it's more responsible to work with users to improve MV on their use cases. > Am I missing a JIRA > that can magically fix the issues with performance, availability & > correctness? Is there any formal discussion/analysis about things being impossible to fix/improve? On Wed, 1 Jul 2020 at 04:23, Dinesh Joshi wrote: > > On Jun 30, 2020, at 12:43 PM, Jon Haddad wrote: > > > > As we move forward with the 4.0 release, we should consider this an > > opportunity to deprecate materialized views, and remove them in 5.0. We > > should take this opportunity to learn from the mistake and raise the bar > > for new features to undergo a much more thorough run the wringer before > > merging. > > I'm in favor of marking them as deprecated and removing them in 5.0. If > someone steps up and can fix them in 5.0, then we always have the option of > accepting the fix. > > Dinesh > - > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > >
Re: [DISCUSS] Future of MVs
We're just short of 98 tickets on the component since it's original merge so at least *some* work has been done to stabilize them. Not to say I'm endorsing running them at massive scale today without knowing what you're doing, to be clear. They are perhaps our largest loaded gun of a feature of self-foot-shooting atm. Zhao did a bunch of work on them internally and we've backported much of that to OSS; I've pinged him to chime in here. The "data is orphaned in your view when you lose all base replicas" issue is more or less "unsolvable", since a scan of a view to confirm data in the base table is so slow you're talking weeks to process and it totally trashes your page cache. I think Paulo landed on a "you have to rebuild the view if you lose all base data" reality. There's also, I believe, the unresolved issue of modeling how much data a base table with one to many views will end up taking up in its final form when denormalized. This could be vastly improved with something like an "EXPLAIN ANALYZE" for a table with views, if you'll excuse the mapping, to show "N bytes in base will become M with base + views" or something. Last but definitely not least in dumping the state in my head about this, there's a bunch of potential for guardrailing people away from self-harm with MV's if we decide to go the route of guardrails (link: https://cwiki.apache.org/confluence/display/CASSANDRA/%28DRAFT%29+-+CEP-3%3A+Guardrails ). So from my PoV, I'm against us just voting to deprecate and remove without going into more depth into the current state of things and what options are on the table, since people will continue to build MV's at the client level which, in theory, should have worse correctness and performance characteristics than having a clean and well stabilized implementation in the coordinator. Having them flagged as experimental for now as we stabilize 4.0 and get things out the door *seems* sufficient to me, but if people are widely using these out in the wild and ignoring that status and the corresponding warning, maybe we consider raising the volume on that warning for 4.0 while we figure this out. Just my .02. ~Josh On Tue, Jun 30, 2020 at 4:22 PM Dinesh Joshi wrote: > > On Jun 30, 2020, at 12:43 PM, Jon Haddad wrote: > > > > As we move forward with the 4.0 release, we should consider this an > > opportunity to deprecate materialized views, and remove them in 5.0. We > > should take this opportunity to learn from the mistake and raise the bar > > for new features to undergo a much more thorough run the wringer before > > merging. > > I'm in favor of marking them as deprecated and removing them in 5.0. If > someone steps up and can fix them in 5.0, then we always have the option of > accepting the fix. > > Dinesh > - > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > >
Re: [DISCUSS] Future of MVs
+1 On Tue, Jun 30, 2020 at 2:44 PM Jon Haddad wrote: > > A couple days ago when writing a separate email I came across this DataStax > blog post discussing MVs [1]. Imagine my surprise when I noticed the date > was five years ago... > > While at TLP, I helped numerous customers move off of MVs, mostly because > they affected stability of clusters in a horrific way. The most telling > project involved helping someone create new tables to manage 1GB of data > because the views performed so poorly they made the cluster unresponsive > and unusable. Despite being around for five years, they've seen very > little improvement that makes them usable for non trivial, non laptop > workloads. > > Since the original commits, it doesn't look like there's been much work to > improve them, and they're yet another feature I ended up saying "just don't > use". I haven't heard any plans to improve them in any meaningful way - > either to address their issues with performance or the inability to repair > them. > > The original contributor of MVs (Carl Yeksigian) seems to have disappeared > from the project, meaning we have a broken feature without a maintainer, > and no plans to fix it. > > As we move forward with the 4.0 release, we should consider this an > opportunity to deprecate materialized views, and remove them in 5.0. We > should take this opportunity to learn from the mistake and raise the bar > for new features to undergo a much more thorough run the wringer before > merging. > > I'm curious what folks think - am I way off base here? Am I missing a JIRA > that can magically fix the issues with performance, availability & > correctness? > > [1] > https://www.datastax.com/blog/2015/06/new-cassandra-30-materialized-views > [2] https://issues.apache.org/jira/browse/CASSANDRA-6477 - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org
Re: [DISCUSS] Future of MVs
> On Jun 30, 2020, at 12:43 PM, Jon Haddad wrote: > > As we move forward with the 4.0 release, we should consider this an > opportunity to deprecate materialized views, and remove them in 5.0. We > should take this opportunity to learn from the mistake and raise the bar > for new features to undergo a much more thorough run the wringer before > merging. I'm in favor of marking them as deprecated and removing them in 5.0. If someone steps up and can fix them in 5.0, then we always have the option of accepting the fix. Dinesh - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org
Re: [DISCUSS] Future of MVs
+1 for deprecation and removal (assuming a credible plan to fix them doesn't materialize) > On Jun 30, 2020, at 12:43 PM, Jon Haddad wrote: > > A couple days ago when writing a separate email I came across this DataStax > blog post discussing MVs [1]. Imagine my surprise when I noticed the date > was five years ago... > > While at TLP, I helped numerous customers move off of MVs, mostly because > they affected stability of clusters in a horrific way. The most telling > project involved helping someone create new tables to manage 1GB of data > because the views performed so poorly they made the cluster unresponsive > and unusable. Despite being around for five years, they've seen very > little improvement that makes them usable for non trivial, non laptop > workloads. > > Since the original commits, it doesn't look like there's been much work to > improve them, and they're yet another feature I ended up saying "just don't > use". I haven't heard any plans to improve them in any meaningful way - > either to address their issues with performance or the inability to repair > them. > > The original contributor of MVs (Carl Yeksigian) seems to have disappeared > from the project, meaning we have a broken feature without a maintainer, > and no plans to fix it. > > As we move forward with the 4.0 release, we should consider this an > opportunity to deprecate materialized views, and remove them in 5.0. We > should take this opportunity to learn from the mistake and raise the bar > for new features to undergo a much more thorough run the wringer before > merging. > > I'm curious what folks think - am I way off base here? Am I missing a JIRA > that can magically fix the issues with performance, availability & > correctness? > > [1] > https://www.datastax.com/blog/2015/06/new-cassandra-30-materialized-views > [2] https://issues.apache.org/jira/browse/CASSANDRA-6477 - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org
[DISCUSS] Future of MVs
A couple days ago when writing a separate email I came across this DataStax blog post discussing MVs [1]. Imagine my surprise when I noticed the date was five years ago... While at TLP, I helped numerous customers move off of MVs, mostly because they affected stability of clusters in a horrific way. The most telling project involved helping someone create new tables to manage 1GB of data because the views performed so poorly they made the cluster unresponsive and unusable. Despite being around for five years, they've seen very little improvement that makes them usable for non trivial, non laptop workloads. Since the original commits, it doesn't look like there's been much work to improve them, and they're yet another feature I ended up saying "just don't use". I haven't heard any plans to improve them in any meaningful way - either to address their issues with performance or the inability to repair them. The original contributor of MVs (Carl Yeksigian) seems to have disappeared from the project, meaning we have a broken feature without a maintainer, and no plans to fix it. As we move forward with the 4.0 release, we should consider this an opportunity to deprecate materialized views, and remove them in 5.0. We should take this opportunity to learn from the mistake and raise the bar for new features to undergo a much more thorough run the wringer before merging. I'm curious what folks think - am I way off base here? Am I missing a JIRA that can magically fix the issues with performance, availability & correctness? [1] https://www.datastax.com/blog/2015/06/new-cassandra-30-materialized-views [2] https://issues.apache.org/jira/browse/CASSANDRA-6477