Contributing CASSANDRA-15295 to fix CommitLog deadlock

2019-08-30 Thread Zephyr Guo
Hi all,
Could someone review CASSANDRA-15295
 this?

Thanks!


Re: Stability of MaterializedView in 3.11.x | 4.0

2019-08-30 Thread Jon Haddad
If you don't have any intent on running across multiple nodes, Cassandra is
probably the wrong DB for you.

Postgres will give you a better feature set for a single node.

On Fri, Aug 30, 2019 at 5:23 AM Pankaj Gajjar 
wrote:

> Understand it well, how about Cassandra running on single node, we don’t
> have cluster setup (3 nodes+ i.e).
>
> Does MVs perform well on single node machine ?
>
> Note: I know about HA, so lets keep it side for now and it's only possible
> when we have cluster setup.
>
> On 29/08/19, 06:21, "Dor Laor"  wrote:
>
> On Wed, Aug 28, 2019 at 5:43 PM Jon Haddad  wrote:
>
> > >  Arguably, the other alternative to server-side denormalization is
> to do
> > the denormalization client-side which comes with the same axes of
> costs and
> > complexity, just with more of each.
> >
> > That's not completely true.  You can write to any number of tables
> without
> > doing a read, and the cost of reading data off disk is significantly
> > greater than an insert alone.  You can crush a cluster with a write
> heavy
> > workload and MVs that would otherwise be completely fine to do all
> writes.
> >
> > The other issue with MVs is that you still need to understand
> fundamentals
> > of data modeling, that don't magically solve the problem of enormous
> > partitions.  One of the reasons I've had to un-MV a lot of clusters
> is
> > because people have put an MV on a table with a low-cardinality
> field and
> > found themselves with a 10GB partition nightmare, so they need to go
> back
> > and remodel the view as something more complex anyways.  In this
> case, the
> > MV was extremely high cost since now they've not only pushed out a
> poor
> > implementation to begin with but now have the cost of a migration as
> well
> > as a rewrite.
> >
>
> +1
>
> Moreover, the hard part is that an update for the base table means that
> the original data needs to be read and the database (or the poor
> developer
> who implements the denormalized model) needs to delete the data in the
> view
> and then to write the new ones. All need to be of course resilient to
> all
> types of
> errors and failures. Had it been simple, there was no need for a
> database
> MV..
>
>
> >
> >
> >
> > On Wed, Aug 28, 2019 at 9:58 AM Joshua McKenzie <
> jmcken...@apache.org>
> > wrote:
> >
> > > >
> > > > so we need to start migration from MVs to manual query base
> table ?
> > >
> > >  Arguably, the other alternative to server-side denormalization is
> to do
> > > the denormalization client-side which comes with the same axes of
> costs
> > and
> > > complexity, just with more of each.
> > >
> > > Jeff's spot on when he discusses the risk appetite vs. mitigation
> aspect
> > of
> > > it. There's a reason banks do end-of-day close-out validation
> analysis
> > and
> > > have redundant systems for things like this.
> > >
> > > On Wed, Aug 28, 2019 at 11:49 AM Jon Haddad 
> wrote:
> > >
> > > > I've helped a lot of teams (a dozen to two dozen maybe) migrate
> away
> > from
> > > > MVs due to inconsistencies, issues with streaming (have you
> added or
> > > > removed nodes yet?), and massive performance issues to the point
> of
> > > cluster
> > > > failure under (what I consider) trivial load.  I haven't gone
> too deep
> > > into
> > > > analyzing their issues, folks are usually fine with "move off
> them", vs
> > > > having me do a ton of analysis.
> > > >
> > > > tlp-stress has a materialized view workload built in, and you
> can add
> > > > arbitrary CQL via the --cql flag to add a MV to any existing
> workload
> > > such
> > > > as KeyValue or BasicTimeSeries.
> > > >
> > > > On Wed, Aug 28, 2019 at 8:11 AM Jeff Jirsa 
> wrote:
> > > >
> > > > > There have been people who have had operational issues related
> to MVs
> > > > (many
> > > > > of them around running repair), but the biggest concern is
> > correctness.
> > > > >
> > > > > It probably ultimately depends on what type of database you're
> > running.
> > > > If
> > > > > you're running some sort of IOT / analytics workload and you
> just
> > want
> > > > > another way to SELECT the data, but you won't notice one of a
> billion
> > > > > records going missing, using MVs may be fine. If you're a
> bank, and
> > one
> > > > of
> > > > > a billion records going missing means you lose someone's bank
> > account,
> > > I
> > > > > would avoid using MVs.
> > > > >
> > > > > It's all just risk management.
> > > > >
> > > > > On Wed, Aug 28, 2019 at 7:18 AM Pankaj Gajjar <
> > > > > pankaj.gaj...@contentserv.com>
> > > > > wrote:
> > > > >
> > > > > > Hi Michael,
> > > > > >
> > > > > > Thanks for putting 

Re: Stability of MaterializedView in 3.11.x | 4.0

2019-08-30 Thread Dor Laor
Single node indeed doesn't need repair so it's easier.
There is an admission control issue with MVs since they can incur a huge
amplification, a single change in the base can trigger
1000s of operations in the view and they run async*.  Hinted handoff for
the MV helps as well but isn't needed for your single node.

* In Scylla we have a back pressure mechanism that automatically slows down
the client on such cases (not yet cover 100% of the use cases but much
better). We also shared (NGCC proposal) a solution we haven't implemented
yet for repairs, if there is an interest, we can post it here.


On Fri, Aug 30, 2019 at 5:23 AM Pankaj Gajjar 
wrote:

> Understand it well, how about Cassandra running on single node, we don’t
> have cluster setup (3 nodes+ i.e).
>
> Does MVs perform well on single node machine ?
>
> Note: I know about HA, so lets keep it side for now and it's only possible
> when we have cluster setup.
>
> On 29/08/19, 06:21, "Dor Laor"  wrote:
>
> On Wed, Aug 28, 2019 at 5:43 PM Jon Haddad  wrote:
>
> > >  Arguably, the other alternative to server-side denormalization is
> to do
> > the denormalization client-side which comes with the same axes of
> costs and
> > complexity, just with more of each.
> >
> > That's not completely true.  You can write to any number of tables
> without
> > doing a read, and the cost of reading data off disk is significantly
> > greater than an insert alone.  You can crush a cluster with a write
> heavy
> > workload and MVs that would otherwise be completely fine to do all
> writes.
> >
> > The other issue with MVs is that you still need to understand
> fundamentals
> > of data modeling, that don't magically solve the problem of enormous
> > partitions.  One of the reasons I've had to un-MV a lot of clusters
> is
> > because people have put an MV on a table with a low-cardinality
> field and
> > found themselves with a 10GB partition nightmare, so they need to go
> back
> > and remodel the view as something more complex anyways.  In this
> case, the
> > MV was extremely high cost since now they've not only pushed out a
> poor
> > implementation to begin with but now have the cost of a migration as
> well
> > as a rewrite.
> >
>
> +1
>
> Moreover, the hard part is that an update for the base table means that
> the original data needs to be read and the database (or the poor
> developer
> who implements the denormalized model) needs to delete the data in the
> view
> and then to write the new ones. All need to be of course resilient to
> all
> types of
> errors and failures. Had it been simple, there was no need for a
> database
> MV..
>
>
> >
> >
> >
> > On Wed, Aug 28, 2019 at 9:58 AM Joshua McKenzie <
> jmcken...@apache.org>
> > wrote:
> >
> > > >
> > > > so we need to start migration from MVs to manual query base
> table ?
> > >
> > >  Arguably, the other alternative to server-side denormalization is
> to do
> > > the denormalization client-side which comes with the same axes of
> costs
> > and
> > > complexity, just with more of each.
> > >
> > > Jeff's spot on when he discusses the risk appetite vs. mitigation
> aspect
> > of
> > > it. There's a reason banks do end-of-day close-out validation
> analysis
> > and
> > > have redundant systems for things like this.
> > >
> > > On Wed, Aug 28, 2019 at 11:49 AM Jon Haddad 
> wrote:
> > >
> > > > I've helped a lot of teams (a dozen to two dozen maybe) migrate
> away
> > from
> > > > MVs due to inconsistencies, issues with streaming (have you
> added or
> > > > removed nodes yet?), and massive performance issues to the point
> of
> > > cluster
> > > > failure under (what I consider) trivial load.  I haven't gone
> too deep
> > > into
> > > > analyzing their issues, folks are usually fine with "move off
> them", vs
> > > > having me do a ton of analysis.
> > > >
> > > > tlp-stress has a materialized view workload built in, and you
> can add
> > > > arbitrary CQL via the --cql flag to add a MV to any existing
> workload
> > > such
> > > > as KeyValue or BasicTimeSeries.
> > > >
> > > > On Wed, Aug 28, 2019 at 8:11 AM Jeff Jirsa 
> wrote:
> > > >
> > > > > There have been people who have had operational issues related
> to MVs
> > > > (many
> > > > > of them around running repair), but the biggest concern is
> > correctness.
> > > > >
> > > > > It probably ultimately depends on what type of database you're
> > running.
> > > > If
> > > > > you're running some sort of IOT / analytics workload and you
> just
> > want
> > > > > another way to SELECT the data, but you won't notice one of a
> billion
> > > > > records going missing, using MVs may be fine. If you're a
> bank, and
> > one
> > > > of
> > 

Wishing you a happy weekend

2019-08-30 Thread Reem Hashimy
Good day,

My name is Reem E. Hashimy, the Emirates Minister of State and 
Managing Director of the United Arab Emirates (Dubai) World Expo 
2020 Committee.
 
I am writing you to manage my funds I received as financial 
gratification from various foreign companies I assisted to 
receive participation slot in the incoming Dubai World Expo 2020.

The amount is $44,762,906.00 United States dollars.The cumulative 
deposit were given as an expression of appreciation from the 
various foreign companies whose applications received approval to 
participate in the in-coming Dubai Expo 2020. But I could not 
receive the various gratifications to my personal account in my 
country because my social status as a married Muslim lady with 
limitations to certain investment opportunities.
 
For this reason, an agreement was reached with a consulting firm 
to keep the funds in open beneficiary account with a financial 
institution where it will be possible to instruct transfer of the 
funds to a third party account for investment purpose which is 
the reason I am contacting you to receive and manage the funds as 
my investment partner.

The detail will be discuss on your indication of interest with 
your information and capacity to manage the fund.

However, if you are not ready to take up responsibility in this 
partnership, do not reply to this message.
 
Wishing you the best in the year.

Reem.

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: 4.0 alpha before apachecon?

2019-08-30 Thread Aleksey Yeshchenko
Not really important IMO if we do or do not cut a new branch yet. So let’s hold 
off?

But as mentioned on Slack, I really like the idea of cutting an alpha now, with 
clear understanding that the API will still slightly change before the 4.0-GA.

There is enough complete work in trunk that will *not* change between now and 
GA that could benefit from user testing already.

> On 30 Aug 2019, at 12:46, Mick Semb Wever  wrote:
> 
> 
>> 
>> Let's just pull the trigger on it. We know there are a couple of rough
>> edges, but that is the point of an alpha.
> 
> 
> Is the idea to also cut 2.2.15, 3.0.19, and 3.11.5, at the same time as 
> 4.0-alpha1 ?
> 
> (Michael, have you the cycles for this?)
> 
> regards,
> Mick
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Stability of MaterializedView in 3.11.x | 4.0

2019-08-30 Thread Pankaj Gajjar
Understand it well, how about Cassandra running on single node, we don’t have 
cluster setup (3 nodes+ i.e). 

Does MVs perform well on single node machine ?  

Note: I know about HA, so lets keep it side for now and it's only possible when 
we have cluster setup. 

On 29/08/19, 06:21, "Dor Laor"  wrote:

On Wed, Aug 28, 2019 at 5:43 PM Jon Haddad  wrote:

> >  Arguably, the other alternative to server-side denormalization is to do
> the denormalization client-side which comes with the same axes of costs 
and
> complexity, just with more of each.
>
> That's not completely true.  You can write to any number of tables without
> doing a read, and the cost of reading data off disk is significantly
> greater than an insert alone.  You can crush a cluster with a write heavy
> workload and MVs that would otherwise be completely fine to do all writes.
>
> The other issue with MVs is that you still need to understand fundamentals
> of data modeling, that don't magically solve the problem of enormous
> partitions.  One of the reasons I've had to un-MV a lot of clusters is
> because people have put an MV on a table with a low-cardinality field and
> found themselves with a 10GB partition nightmare, so they need to go back
> and remodel the view as something more complex anyways.  In this case, the
> MV was extremely high cost since now they've not only pushed out a poor
> implementation to begin with but now have the cost of a migration as well
> as a rewrite.
>

+1

Moreover, the hard part is that an update for the base table means that
the original data needs to be read and the database (or the poor developer
who implements the denormalized model) needs to delete the data in the view
and then to write the new ones. All need to be of course resilient to all
types of
errors and failures. Had it been simple, there was no need for a database
MV..


>
>
>
> On Wed, Aug 28, 2019 at 9:58 AM Joshua McKenzie 
> wrote:
>
> > >
> > > so we need to start migration from MVs to manual query base table ?
> >
> >  Arguably, the other alternative to server-side denormalization is to do
> > the denormalization client-side which comes with the same axes of costs
> and
> > complexity, just with more of each.
> >
> > Jeff's spot on when he discusses the risk appetite vs. mitigation aspect
> of
> > it. There's a reason banks do end-of-day close-out validation analysis
> and
> > have redundant systems for things like this.
> >
> > On Wed, Aug 28, 2019 at 11:49 AM Jon Haddad  wrote:
> >
> > > I've helped a lot of teams (a dozen to two dozen maybe) migrate away
> from
> > > MVs due to inconsistencies, issues with streaming (have you added or
> > > removed nodes yet?), and massive performance issues to the point of
> > cluster
> > > failure under (what I consider) trivial load.  I haven't gone too deep
> > into
> > > analyzing their issues, folks are usually fine with "move off them", 
vs
> > > having me do a ton of analysis.
> > >
> > > tlp-stress has a materialized view workload built in, and you can add
> > > arbitrary CQL via the --cql flag to add a MV to any existing workload
> > such
> > > as KeyValue or BasicTimeSeries.
> > >
> > > On Wed, Aug 28, 2019 at 8:11 AM Jeff Jirsa  wrote:
> > >
> > > > There have been people who have had operational issues related to 
MVs
> > > (many
> > > > of them around running repair), but the biggest concern is
> correctness.
> > > >
> > > > It probably ultimately depends on what type of database you're
> running.
> > > If
> > > > you're running some sort of IOT / analytics workload and you just
> want
> > > > another way to SELECT the data, but you won't notice one of a 
billion
> > > > records going missing, using MVs may be fine. If you're a bank, and
> one
> > > of
> > > > a billion records going missing means you lose someone's bank
> account,
> > I
> > > > would avoid using MVs.
> > > >
> > > > It's all just risk management.
> > > >
> > > > On Wed, Aug 28, 2019 at 7:18 AM Pankaj Gajjar <
> > > > pankaj.gaj...@contentserv.com>
> > > > wrote:
> > > >
> > > > > Hi Michael,
> > > > >
> > > > > Thanks for putting very clever information " Users of MVs *must*
> > > > determine
> > > > > for themselves, through
> > > > > thorough testing and understanding, if they wish to use them."
> > And
> > > > > this concluded that if there is any issue occur in future then 
only
> > > > > solution is to rebuild the MVs since Cassandra does not able to
> make
> > > > > consistent synch well.
> > > > >
> > > > > Also, we practically using the 10+ MVs and as of now, we have not
  

Re: 4.0 alpha before apachecon?

2019-08-30 Thread Mick Semb Wever


> 
> Let's just pull the trigger on it. We know there are a couple of rough
> edges, but that is the point of an alpha.


Is the idea to also cut 2.2.15, 3.0.19, and 3.11.5, at the same time as 
4.0-alpha1 ?

(Michael, have you the cycles for this?)

regards,
Mick

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org