Re: If reading from materialized view with a consistency level of quorum am I guaranteed to have the most recent view?

2017-02-10 Thread Russell Spitzer
PRIMARY KEY ( (Partition key), Clustering Keys) :


On Fri, Feb 10, 2017 at 10:59 AM DuyHai Doan  wrote:

> See my blog post to understand how MV is implemented:
> http://www.doanduyhai.com/blog/?p=1930
>
> On Fri, Feb 10, 2017 at 7:48 PM, Benjamin Roth 
> wrote:
>
> > Same partition key:
> >
> > PRIMARY KEY ((a, b), c, d) and
> > PRIMARY KEY ((a, b), d, c)
> >
> > PRIMARY KEY ((a), b, c) and
> > PRIMARY KEY ((a), c, b)
> >
> > Different partition key:
> >
> > PRIMARY KEY ((a, b), c, d) and
> > PRIMARY KEY ((a), b, d, c)
> >
> > PRIMARY KEY ((a), b) and
> > PRIMARY KEY ((b), a)
> >
> >
> > 2017-02-10 19:46 GMT+01:00 Kant Kodali :
> >
> > > Okies now I understand what you mean by "same" partition key.  I think
> > you
> > > are saying
> > >
> > > PRIMARY KEY(col1, col2, col3) == PRIMARY KEY(col2, col1, col3) // so
> far
> > I
> > > assumed they are different partition keys.
> > >
> > > On Fri, Feb 10, 2017 at 10:36 AM, Benjamin Roth <
> benjamin.r...@jaumo.com
> > >
> > > wrote:
> > >
> > > > There are use cases where the partition key is the same. For example
> if
> > > you
> > > > need a sorting within a partition or a filtering different from the
> > > > original clustering keys.
> > > > We actually use this for some MVs.
> > > >
> > > > If you want "dumb" denormalization with simple append only cases (or
> > more
> > > > general cases that don't require a read before write on update) you
> are
> > > > maybe better off with batched denormalized atomics writes.
> > > >
> > > > The main benefit of MVs is if you need denormalization to sort or
> > filter
> > > by
> > > > a non-primary key field.
> > > >
> > > > 2017-02-10 19:31 GMT+01:00 Kant Kodali :
> > > >
> > > > > yes thanks for the clarification.  But why would I ever have MV
> with
> > > the
> > > > > same partition key? if it is the same partition key I could just
> read
> > > > from
> > > > > the base table right? our MV Partition key contains the columns
> from
> > > the
> > > > > base table partition key but in a different order plus an
> additional
> > > > column
> > > > > (which is allowed as of today)
> > > > >
> > > > > On Fri, Feb 10, 2017 at 10:23 AM, Benjamin Roth <
> > > benjamin.r...@jaumo.com
> > > > >
> > > > > wrote:
> > > > >
> > > > > > It depends on your model.
> > > > > > If the base table + MV have the same partition key, then the MV
> > > > mutations
> > > > > > are applied synchronously, so they are written as soon the write
> > > > request
> > > > > > returns.
> > > > > > => In this case you can rely on the R+F > RF
> > > > > >
> > > > > > If the partition key of the MV is different, the partition of the
> > MV
> > > is
> > > > > > probably placed on a different host (or said differently it
> cannot
> > be
> > > > > > guaranteed that it is on the same host). In this case, the MV
> > updates
> > > > are
> > > > > > executed async in a logged batch. So it can be guaranteed they
> will
> > > be
> > > > > > applied eventually but not at the time the write request returns.
> > > > > > => You cannot rely and there is no possibility to absolutely
> > > guarantee
> > > > > > anything, not matter what CL you choose. A MV update may always
> > > "arrive
> > > > > > late". I guess it has been implemented like this to not block in
> > case
> > > > of
> > > > > > remote request to prefer the cluster sanity over consistency.
> > > > > >
> > > > > > Is it now 100% clear?
> > > > > >
> > > > > > 2017-02-10 19:17 GMT+01:00 Kant Kodali :
> > > > > >
> > > > > > > So R+W > RF doesnt apply for reads on MV right because say I
> set
> > > > QUORUM
> > > > > > > level consistency for both reads and writes then there can be a
> > > > > scenario
> > > > > > > where a write is successful to the base table and then say
> > > > immediately
> > > > > I
> > > > > > do
> > > > > > > a read through MV but prior to MV getting the update from the
> > base
> > > > > table.
> > > > > > > so there isn't any way to make sure to read after MV had been
> > > > > > successfully
> > > > > > > updated. is that correct?
> > > > > > >
> > > > > > > On Fri, Feb 10, 2017 at 6:30 AM, Benjamin Roth <
> > > > > benjamin.r...@jaumo.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Kant
> > > > > > > >
> > > > > > > > Is it clear now?
> > > > > > > > Sorry for the confusion!
> > > > > > > >
> > > > > > > > Have a nice one
> > > > > > > >
> > > > > > > > Am 10.02.2017 09:17 schrieb "Kant Kodali"  >:
> > > > > > > >
> > > > > > > > thanks!
> > > > > > > >
> > > > > > > > On Thu, Feb 9, 2017 at 8:51 PM, Benjamin Roth <
> > > > > benjamin.r...@jaumo.com
> > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Yes it is
> > > > > > > > >
> > > > > > > > > Am 10.02.2017 00:46 schrieb "Kant Kodali" <
> k...@peernova.com
> > >:
> > > > > > > > >
> > > > > > > > > > If reading from materialized view with a consistency
> level
> > of
> > > > 

Re: Github pull requests

2016-08-26 Thread Russell Spitzer
This is one of my favorite aspects of how contributions to Spark work. This
also makes it easier to have automated testing on new branches
automatically occurring.

-Russ

On Fri, Aug 26, 2016 at 8:45 AM Ben Coverston 
wrote:

> I think it would certainly make contributing to Cassandra more
> straightforward.
>
> I'm not a committer, so I don't regularly create patches, and every time I
> do I have to search/verify that I'm doing it right.
>
> But pull requests? I make pull requests every day, and GitHub makes that
> process work the same everywhere.
>
> On Fri, Aug 26, 2016 at 9:33 AM, Jonathan Ellis  wrote:
>
> > Hi all,
> >
> > Historically we've insisted that people go through the process of
> creating
> > a Jira issue and attaching a patch or linking a branch to demonstrate
> > intent-to-contribute and to make sure we have a unified record of changes
> > in Jira.
> >
> > But I understand that other Apache projects are now recognizing a github
> > pull request as intent-to-contribute [1] and some are even making github
> > the official repo, with an Apache mirror, rather than the other way
> > around.  (Maybe this is required to accept pull requests, I am not sure.)
> >
> > Should we revisit our policy here?
> >
> > [1] e.g. https://github.com/apache/spark/pulls?q=is%3Apr+is%3Aclosed
> >
> > --
> > Jonathan Ellis
> > Project Chair, Apache Cassandra
> > co-founder, http://www.datastax.com
> > @spyced
> >
>
>
>
> --
> Ben Coverston
> DataStax -- The Apache Cassandra Company
>


Re: Infinite loop in org.apache.cassandra.hadoop.cql3.CqlBulkRecordWriter

2019-04-03 Thread Russell Spitzer
I would recommend using the Spark Cassandra Connector instead of the Hadoop
based writers. The Hadoop code has not had a lot of love in a long time. See

https://github.com/datastax/spark-cassandra-connector

On Wed, Apr 3, 2019 at 12:21 PM Brett Marcott 
wrote:

> Hi folks,
>
> I am noticing my spark jobs being stuck when using the
> org.apache.cassandra.hadoop.cql3.CqlBulkRecordWriter/CqlBulkOutputFormat.
>
>
> It seems that whenever there is a stream failure it may be expected
> behavior based on the code to infinite loop.
>
> Here are one executors logs:
> 19/04/03 15:35:06 INFO streaming.StreamResultFuture: [Stream
> #59290530-5625-11e9-a2bb-8bc7b49d56b0] Session with /10.82.204.173 is
> complete
> 19/04/03 15:35:06 WARN streaming.StreamResultFuture: [Stream
> #59290530-5625-11e9-a2bb-8bc7b49d56b0] Stream failed
>
>
> On stream failure it seems StreamResultFuture sets the exception for the
> AbstractFuture.
> AFAIK this should cause the Abstract future to return a new
> ExecutionException.
>
> The problem seems to lie in the fact that the CqlBulkRecordWriter swallows
> the Execution exception and continues in a while loop:
>
> https://github.com/apache/cassandra/blob/207c80c1fd63dfbd8ca7e615ec8002ee8983c5d6/src/java/org/apache/cassandra/hadoop/cql3/CqlBulkRecordWriter.java#L256-L274
> <
> https://github.com/apache/cassandra/blob/207c80c1fd63dfbd8ca7e615ec8002ee8983c5d6/src/java/org/apache/cassandra/hadoop/cql3/CqlBulkRecordWriter.java#L256-L274
> >
>
> When taking consecutive thread dumps on the same process I see that the
> only thread doing work is constantly creating new ExecutionExceptions (the
> memory location for ExecutionException was different on each thread dump):
> java.lang.Throwable.fillInStackTrace(Native Method)
> java.lang.Throwable.fillInStackTrace(Throwable.java:783) => holding
> Monitor(java.util.concurrent.ExecutionException@80240763})
> java.lang.Throwable.(Throwable.java:310)
> java.lang.Exception.(Exception.java:102)
> java.util.concurrent.ExecutionException.(ExecutionException.java:90)
>
> com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:476)
>
> com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:357)
>
> org.apache.cassandra.hadoop.cql3.CqlBulkRecordWriter.close(CqlBulkRecordWriter.java:257)
>
> org.apache.cassandra.hadoop.cql3.CqlBulkRecordWriter.close(CqlBulkRecordWriter.java:237)
>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12$$anonfun$apply$5.apply$mcV$sp(PairRDDFunctions.scala:1131)
>
> org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1359)
>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1131)
>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1102)
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
> org.apache.spark.scheduler.Task.run(Task.scala:99)
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:285)
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> java.lang.Thread.run(Thread.java:748)
>
> It seems the logic that lies right below the while loop in linked code
> above that checks for failed hosts/streamsessions maybe should have been
> within the while loop?
>
> Thanks,
>
> Brett


Re: Can we upgrade Guava to the same version as master on 3.11 branch?

2019-12-15 Thread Russell Spitzer
The hadoop formats should be compatible with any Cassandra version
regardless of which Cassandra-all you include since they communicate with
the driver under the hood and not Cassandra internal libraries. This means
you should feel free to use Cassandra 4 in your integration without fear of
losing backwards compatibility. In fact it should be able to speak to
Cassandra 2.x as well.

On Sun, Dec 15, 2019, 10:24 PM Tomo Suzuki 
wrote:

> Hi Russell,
>
> Yes, Apache Beam uses hadoop format for Cassandra IO [1]. That test
> (HadoopFormatIOCassandraTest) failed [2] when I tried to upgrade Guava
> version. Added this information to the ticket.
>
> [1]: https://beam.apache.org/documentation/io/built-in/hadoop/
> [2]:
> https://github.com/GoogleCloudPlatform/cloud-opensource-java/issues/1028#issuecomment-557680928
>
> On Sun, Dec 15, 2019 at 10:36 PM Russell Spitzer
>  wrote:
> >
> > Why does the beam integration rely on Cassandra all, does it use the
> hadoop
> > formats?
> >
> > On Sun, Dec 15, 2019, 9:07 PM Tomo Suzuki 
> > wrote:
> >
> > > Hi Cassandra developers,
> > >
> > > I want to backport the Guava version upgrade (CASSANDRA-15248) into
> > > 3.11 branch, so that cassandra-all:3.11.X works with higher version of
> > > Guava.
> > > I just created a ticket
> > > https://issues.apache.org/jira/browse/CASSANDRA-15453 explaining
> > > background.
> > >
> > > Before committing anything, I'd like to hear any opinion on the
> > > backporting. What do you think?
> > >
> > > Regards,
> > > Tomo
> > >
> > > -
> > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > For additional commands, e-mail: dev-h...@cassandra.apache.org
> > >
> > >
>
>
>
> --
> Regards,
> Tomo
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Re: Can we upgrade Guava to the same version as master on 3.11 branch?

2019-12-15 Thread Russell Spitzer
Why does the beam integration rely on Cassandra all, does it use the hadoop
formats?

On Sun, Dec 15, 2019, 9:07 PM Tomo Suzuki 
wrote:

> Hi Cassandra developers,
>
> I want to backport the Guava version upgrade (CASSANDRA-15248) into
> 3.11 branch, so that cassandra-all:3.11.X works with higher version of
> Guava.
> I just created a ticket
> https://issues.apache.org/jira/browse/CASSANDRA-15453 explaining
> background.
>
> Before committing anything, I'd like to hear any opinion on the
> backporting. What do you think?
>
> Regards,
> Tomo
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>