Re: Stability of MaterializedView in 3.11.x | 4.0

2019-09-03 Thread Scott Andreas
Hi Pankaj,

There aren't plans to include substantial changes to the materialized views 
implementation in C* 4.0, and I'm not aware of project contributors who plan 
major work on MV's post-4.0 at present.

– Scott


From: Pankaj Gajjar 
Sent: Tuesday, September 3, 2019 5:47 AM
To: dev@cassandra.apache.org
Subject: Re: Stability of MaterializedView in 3.11.x | 4.0

Hi Team,

Thanks but this is not point, question again in mind, do we have any plan to 
fix this MVs issue into upcoming any Cassandra release ? 4.0 ? if yes then it 
would be great to wait.
Or is there any plugin or workaround to resolve this issue well on Cassandra 
setup ?


--
Regards
Pankaj G.

On 31/08/19, 00:33, "Jon Haddad"  wrote:

If you don't have any intent on running across multiple nodes, Cassandra is
probably the wrong DB for you.

Postgres will give you a better feature set for a single node.

On Fri, Aug 30, 2019 at 5:23 AM Pankaj Gajjar 

wrote:

> Understand it well, how about Cassandra running on single node, we don’t
> have cluster setup (3 nodes+ i.e).
>
> Does MVs perform well on single node machine ?
>
> Note: I know about HA, so lets keep it side for now and it's only possible
> when we have cluster setup.
>
> On 29/08/19, 06:21, "Dor Laor"  wrote:
>
> On Wed, Aug 28, 2019 at 5:43 PM Jon Haddad  wrote:
>
> > >  Arguably, the other alternative to server-side denormalization is
> to do
> > the denormalization client-side which comes with the same axes of
> costs and
> > complexity, just with more of each.
> >
> > That's not completely true.  You can write to any number of tables
> without
> > doing a read, and the cost of reading data off disk is significantly
> > greater than an insert alone.  You can crush a cluster with a write
> heavy
> > workload and MVs that would otherwise be completely fine to do all
> writes.
> >
> > The other issue with MVs is that you still need to understand
> fundamentals
> > of data modeling, that don't magically solve the problem of enormous
> > partitions.  One of the reasons I've had to un-MV a lot of clusters
> is
> > because people have put an MV on a table with a low-cardinality
> field and
> > found themselves with a 10GB partition nightmare, so they need to go
> back
> > and remodel the view as something more complex anyways.  In this
> case, the
> > MV was extremely high cost since now they've not only pushed out a
> poor
> > implementation to begin with but now have the cost of a migration as
> well
> > as a rewrite.
> >
>
> +1
>
> Moreover, the hard part is that an update for the base table means 
that
> the original data needs to be read and the database (or the poor
> developer
> who implements the denormalized model) needs to delete the data in the
> view
> and then to write the new ones. All need to be of course resilient to
> all
> types of
> errors and failures. Had it been simple, there was no need for a
> database
> MV..
>
>
> >
> >
> >
> > On Wed, Aug 28, 2019 at 9:58 AM Joshua McKenzie <
> jmcken...@apache.org>
> > wrote:
> >
> > > >
> > > > so we need to start migration from MVs to manual query base
> table ?
> > >
> > >  Arguably, the other alternative to server-side denormalization is
> to do
> > > the denormalization client-side which comes with the same axes of
> costs
> > and
> > > complexity, just with more of each.
> > >
> > > Jeff's spot on when he discusses the risk appetite vs. mitigation
> aspect
> > of
> > > it. There's a reason banks do end-of-day close-out validation
> analysis
> > and
> > > have redundant systems for things like this.
> > >
> > > On Wed, Aug 28, 2019 at 11:49 AM Jon Haddad 
> wrote:
> > >
> > > > I've helped a lot of teams (a dozen to two dozen maybe) migrate
> away
> > from
> > > > MVs due to inconsistencies, issues with streaming (have you
> added or
> > > > removed nodes yet?), and massive performance issues to the point
> of
> > > cluster
> > > > failure under (what I consider) trivial load.  I haven't gone
> too deep
> > > into
> > > > analyzing their issues, folks are usually fine with "move off
> them", vs
> > > > having me do a ton of analysis.
> > > >
> > > > tlp-stress has a materialized view workload built in, and you
> can add
> > > > arbitrary CQL via the --cql flag to add a MV to any existing
> workloa

Re: Stability of MaterializedView in 3.11.x | 4.0

2019-09-03 Thread Pankaj Gajjar
Hi Team,

Thanks but this is not point, question again in mind, do we have any plan to 
fix this MVs issue into upcoming any Cassandra release ? 4.0 ? if yes then it 
would be great to wait.
Or is there any plugin or workaround to resolve this issue well on Cassandra 
setup ?


-- 
Regards
Pankaj G.

On 31/08/19, 00:33, "Jon Haddad"  wrote:

If you don't have any intent on running across multiple nodes, Cassandra is
probably the wrong DB for you.

Postgres will give you a better feature set for a single node.

On Fri, Aug 30, 2019 at 5:23 AM Pankaj Gajjar 

wrote:

> Understand it well, how about Cassandra running on single node, we don’t
> have cluster setup (3 nodes+ i.e).
>
> Does MVs perform well on single node machine ?
>
> Note: I know about HA, so lets keep it side for now and it's only possible
> when we have cluster setup.
>
> On 29/08/19, 06:21, "Dor Laor"  wrote:
>
> On Wed, Aug 28, 2019 at 5:43 PM Jon Haddad  wrote:
>
> > >  Arguably, the other alternative to server-side denormalization is
> to do
> > the denormalization client-side which comes with the same axes of
> costs and
> > complexity, just with more of each.
> >
> > That's not completely true.  You can write to any number of tables
> without
> > doing a read, and the cost of reading data off disk is significantly
> > greater than an insert alone.  You can crush a cluster with a write
> heavy
> > workload and MVs that would otherwise be completely fine to do all
> writes.
> >
> > The other issue with MVs is that you still need to understand
> fundamentals
> > of data modeling, that don't magically solve the problem of enormous
> > partitions.  One of the reasons I've had to un-MV a lot of clusters
> is
> > because people have put an MV on a table with a low-cardinality
> field and
> > found themselves with a 10GB partition nightmare, so they need to go
> back
> > and remodel the view as something more complex anyways.  In this
> case, the
> > MV was extremely high cost since now they've not only pushed out a
> poor
> > implementation to begin with but now have the cost of a migration as
> well
> > as a rewrite.
> >
>
> +1
>
> Moreover, the hard part is that an update for the base table means 
that
> the original data needs to be read and the database (or the poor
> developer
> who implements the denormalized model) needs to delete the data in the
> view
> and then to write the new ones. All need to be of course resilient to
> all
> types of
> errors and failures. Had it been simple, there was no need for a
> database
> MV..
>
>
> >
> >
> >
> > On Wed, Aug 28, 2019 at 9:58 AM Joshua McKenzie <
> jmcken...@apache.org>
> > wrote:
> >
> > > >
> > > > so we need to start migration from MVs to manual query base
> table ?
> > >
> > >  Arguably, the other alternative to server-side denormalization is
> to do
> > > the denormalization client-side which comes with the same axes of
> costs
> > and
> > > complexity, just with more of each.
> > >
> > > Jeff's spot on when he discusses the risk appetite vs. mitigation
> aspect
> > of
> > > it. There's a reason banks do end-of-day close-out validation
> analysis
> > and
> > > have redundant systems for things like this.
> > >
> > > On Wed, Aug 28, 2019 at 11:49 AM Jon Haddad 
> wrote:
> > >
> > > > I've helped a lot of teams (a dozen to two dozen maybe) migrate
> away
> > from
> > > > MVs due to inconsistencies, issues with streaming (have you
> added or
> > > > removed nodes yet?), and massive performance issues to the point
> of
> > > cluster
> > > > failure under (what I consider) trivial load.  I haven't gone
> too deep
> > > into
> > > > analyzing their issues, folks are usually fine with "move off
> them", vs
> > > > having me do a ton of analysis.
> > > >
> > > > tlp-stress has a materialized view workload built in, and you
> can add
> > > > arbitrary CQL via the --cql flag to add a MV to any existing
> workload
> > > such
> > > > as KeyValue or BasicTimeSeries.
> > > >
> > > > On Wed, Aug 28, 2019 at 8:11 AM Jeff Jirsa 
> wrote:
> > > >
> > > > > There have been people who have had operational issues related
> to MVs
> > > > (many
> > > > > of them around running repair), but the biggest concern is
> > correctness.