Re: Time for a new 3.0/3.11 release?

2019-07-03 Thread Jay Zhuang
I'd like to raise some attention for the following 2 tickets, they're
patch-ready and deployed on all our production clusters:
* CASSANDRA-15098: "Endpoints no longer owning tokens are not removed for
vnode"
  For vNode cluster, the replaced node may not be removed from gossiper
(and system.peers, every time a node restart, they will be re-populated
into gossiper from system.peers). The patch is pretty small and
straightforward.
* CASSANDRA-15097: "Avoid updating unchanged gossip state"
  I think this is a bug because it's not only causing large pending gossip
tasks, it's also causing long token-metadata update lock for large vNode
cluster.

Here is another improvement we made for vNode to avoid gossip block during
removeNode: CASSANDRA-15141. But I think it can wait until 4.x. It mostly
causes problem for 1000+ vNode clusters.

Thanks,
Jay

On Tue, Jul 2, 2019 at 10:28 PM Mick Semb Wever  wrote:

>
>
> > Is there any chance to get CASSANDRA-15005 (ready, with PR) into a
> > 3.11.5 release?
>
>
> I doubt it Soroka. It's not a bug and there's no patch for it, so I'd see
> no reason why Michael would wait for this when he generously finds time to
> cut a release.
>
> Maybe the author and reviewer decides to push it to both 3.11.x and 4.0,
> but that's irrelevant to this thread.
>
> regards,
> Mick
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Re: Recommended circleci settings for DTest

2018-09-28 Thread Jay Zhuang
Great, thanks Ariel. I assume it also works for uTest, right? Do you think
it worth updating the doc for that
https://github.com/apache/cassandra/blob/trunk/doc/source/development/testing.rst#circleci



On Fri, Sep 28, 2018 at 2:55 PM Ariel Weisberg  wrote:

> Hi,
>
> Apply the following diff and if you have access to the higher memory
> containers it should run the dtests with whatever you have. You may need to
> adjust parallelism to match whatever you paid for.
>
> diff --git a/.circleci/config.yml b/.circleci/config.yml
> index 5a84f724fc..76a2c9f841 100644
> --- a/.circleci/config.yml
> +++ b/.circleci/config.yml
> @@ -58,16 +58,16 @@ with_dtest_jobs_only: _dtest_jobs_only
>- build
>  # Set env_settings, env_vars, and workflows/build_and_run_tests based on
> environment
>  env_settings: _settings
> -<<: *default_env_settings
> -#<<: *high_capacity_env_settings
> +#<<: *default_env_settings
> +<<: *high_capacity_env_settings
>  env_vars: _vars
> -<<: *resource_constrained_env_vars
> -#<<: *high_capacity_env_vars
> +#<<: *resource_constrained_env_vars
> +<<: *high_capacity_env_vars
>  workflows:
>  version: 2
> -build_and_run_tests: *default_jobs
> +#build_and_run_tests: *default_jobs
>  #build_and_run_tests: *with_dtest_jobs_only
> -#build_and_run_tests: *with_dtest_jobs
> +build_and_run_tests: *with_dtest_jobs
>  docker_image: _image kjellman/cassandra-test:0.4.3
>  version: 2
>  jobs:
>
> Ariel
>
> On Fri, Sep 28, 2018, at 5:47 PM, Jay Zhuang wrote:
> > Hi,
> >
> > Do we have a recommended circleci setup for DTest? For example, what's
> the
> > minimal container number I need to finish the DTest in a reasonable
> time. I
> > know the free account (4 containers) is not good enough for the DTest.
> But
> > if the community member can pay for the cost, what's the recommended
> > settings and steps to run that?
> >
> > Thanks,
> > Jay
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Recommended circleci settings for DTest

2018-09-28 Thread Jay Zhuang
Hi,

Do we have a recommended circleci setup for DTest? For example, what's the
minimal container number I need to finish the DTest in a reasonable time. I
know the free account (4 containers) is not good enough for the DTest. But
if the community member can pay for the cost, what's the recommended
settings and steps to run that?

Thanks,
Jay


Re: QA signup

2018-09-26 Thread Jay Zhuang
+1 for publishing official snapshot artifacts for 4.0 and even other
branches.

We're publishing snapshot artifacts to our internal artifactory. One minor
bug we found is: currently build.xml won't publish any snapshot artifact:
https://issues.apache.org/jira/browse/CASSANDRA-12704

On Thu, Sep 20, 2018 at 11:35 PM Dinesh Joshi
 wrote:

> I favor versioned nightlies for testing so everyone is using the exact
> binary distribution.
>
> As far as actually building the packages go, I would prefer a Docker based
> solution like Jon mentioned. It provides a controlled, reproducible, clean
> room environment. Ideally the build script should ensure that the git
> branch is clean and that there aren't any local changes if the packages are
> being published to maven.
>
> Does anyone see a need to publish the git branch metadata in the build
> like the git-sha, branch and repo url? I am not sure if this is already
> captured somewhere. Its useful to trace a build's provenance.
>
> Dinesh
>
> > On Sep 20, 2018, at 2:26 PM, Jonathan Haddad  wrote:
> >
> > Sure - I'm not disagreeing with you that pre-built packages would be nice
> > to have.  That said, if someone's gone through the trouble of building an
> > entire testing infrastructure and has hundreds of machines available,
> > running `docker-compose up build-deb` is likely not a major issue.  If
> I'm
> > trying to decide between solving the 2 problems I'd prefer to make builds
> > easier as very few people actually know how to do it.  I'm also biased
> > because I'm working on a tool that does _exactly_ that (build arbitrary
> C*
> > debs and deploy them to AWS for perf testing with tlp-stress which we've
> > already open sourced https://github.com/thelastpickle/tlp-stress).
> >
> > I'll building it for internal TLP use but there's not much TLP specific
> > stuff, we'll be open sourcing it as soon as we can.
> >
> > TL;DR: we need both things
> >
> > On Thu, Sep 20, 2018 at 2:12 PM Scott Andreas 
> wrote:
> >
> >> Mick – Got it, thanks and sorry to have misunderstood. No fault in your
> >> writing at all; that was my misreading.
> >>
> >> Agreed with you and Kurt; I can’t think of a pressing need or immediate
> >> use for the Maven artifacts. As you mentioned, all of the examples I’d
> >> listed require binary artifacts only.
> >>
> >> Re: Jon’s question:
> >>> It seems to me that improving / simplifying the process of building the
> >> packages might solve this problem better.
> >>
> >> Agreed that making builds easy is important, and that manually-applied
> >> patches were involved in a couple cases I’d cited. My main motivation is
> >> toward making it easier for developers who’d like to produce
> >> fully-automated test pipelines to do so using common artifacts, rather
> than
> >> each replicating the build/packaging step for tarball artifacts
> themselves.
> >>
> >> Publishing binary artifacts in a common location would enable developers
> >> to configure testing and benchmarking pipelines to pick up those
> artifacts
> >> on a daily basis without intervention. In the case of a build landing
> DOA
> >> due to an issue with a commit, it’d be enough for zero-touch automation
> to
> >> pick up a new build with the fix the following day and run an extended
> >> suite across a large number of machines and publish results, for
> example.
> >>
> >>
> >> On September 19, 2018 at 8:17:05 PM, kurt greaves (k...@instaclustr.com
> >> ) wrote:
> >>
> >> It's pretty much only third party plugins. I need it for the LDAP
> >> authenticator, and StratIO's lucene plugin will also need it. I know
> there
> >> are users out there with their own custom plugins that would benefit
> from
> >> it as well (and various other open source projects). It would make it
> >> easier, however it certainly is feasible for these devs to just build
> the
> >> jars themselves (and I've done this so far). If it's going to be easy I
> >> think there's value in generating and hosting nightly jars, but if it's
> >> difficult I can just write some docs for DIY.
> >>
> >> On Thu, 20 Sep 2018 at 12:20, Mick Semb Wever  wrote:
> >>
> >>> Sorry about the terrible english in my last email.
> >>>
> >>>
>  On the target audience:
> 
>  [snip]
>  For developers building automation around testing and
>  validation, it’d be great to have a common build to work from rather
>  than each developer producing these builds themselves.
> >>>
> >>>
> >>> Sure. My question was only in context of maven artefacts.
> >>> It seems to me all the use-cases you highlight would be for the binary
> >>> artefacts.
> >>>
> >>> If that's the case we don't need to worry about publishing snapshots
> >> maven
> >>> artefacts, and can just focus on uploading nightly builds to
> >>> https://dist.apache.org/repos/dist/dev/cassandra/
> >>>
> >>> Or is there a use-case I'm missing that needs the maven artefacts?
> >>>
> >>> 

Re: NGCC 2018?

2018-08-31 Thread Jay Zhuang
Are we going to have a dev event next month? Or anything this year? We may
also be able to provide space in bay area and help to organize it. (Please
let us know, so we could get final approval for that).

On Fri, Jul 27, 2018 at 10:05 AM Jonathan Haddad  wrote:

> My interpretation of Nate's statement was that since there would be a bunch
> of us at Lynn's event, we might as well do NGCC at the same time.
>
> On Thu, Jul 26, 2018 at 9:03 PM Ben Bromhead  wrote:
>
> > It sounds like there may be an appetite for something, but the NGCC in
> its
> > current format is likely to not be that useful?
> >
> > Is a bay area event focused on C* developers something that is
> interesting
> > for the broader dev community? In whatever format that may be?
> >
> > On Tue, Jul 24, 2018 at 5:02 PM Nate McCall  wrote:
> >
> > > This was discussed amongst the PMC recently. We did not come to a
> > > conclusion and there were not terribly strong feelings either way.
> > >
> > > I don't feel like we need to hustle to get "NGCC" in place,
> > > particularly given our decided focus on 4.0. However, that should not
> > > stop us from doing an additional 'c* developer' event in sept. to
> > > coincide with distributed data summit.
> > >
> > > On Wed, Jul 25, 2018 at 5:03 AM, Patrick McFadin 
> > > wrote:
> > > > Ben,
> > > >
> > > > Lynn Bender had offered a space the day before Distributed Data
> Summit
> > in
> > > > September (http://distributeddatasummit.com/) since we are both
> > platinum
> > > > sponsors. I thought he and Nate had talked about that being a good
> > place
> > > > for NGCC since many of us will be in town already.
> > > >
> > > > Nate, now that I've spoken for you, you can clarify, :D
> > > >
> > > > Patrick
> > > >
> > >
> > > -
> > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > For additional commands, e-mail: dev-h...@cassandra.apache.org
> > >
> > > --
> > Ben Bromhead
> > CTO | Instaclustr 
> > +1 650 284 9692
> > Reliability at Scale
> > Cassandra, Spark, Elasticsearch on AWS, Azure, GCP and Softlayer
> >
>
>
> --
> Jon Haddad
> http://www.rustyrazorblade.com
> twitter: rustyrazorblade
>


Re: [Discuss] Accept GoCQL driver donation

2018-08-31 Thread Jay Zhuang
That's great. Could that be in the same repo as Cassandra or a
separate repo?

On Fri, Aug 31, 2018 at 7:14 AM Nate McCall  wrote:

> Hi folks,
> So I was recently talking with, Chris Bannister  the gocql [0]
> maintainer, and he expressed an interest in donating the driver to the
> ASF.
>
> We could accept this along the same lines as how we took in the dtest
> donation - going through the incubator IP clearance process [1], but
> in this case it's much simpler as an individual (Chris) owns the
> copyright.
>
> I think the end goal here is to have a reference protocol
> implementation controlled by the project at the least, potentially
> replace cqlsh with a GoLang statically compiled binary eventually (?).
>
> What are other folks' thoughts about this? (we are discussing, not voting).
>
> [0] https://github.com/gocql/gocql
> [1] https://incubator.apache.org/guides/ip_clearance.html
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Re: Tombstone passed GC period causes un-repairable inconsistent data

2018-06-25 Thread Jay Zhuang
Thanks Jeff. CASSANDRA-6434 is exactly the issue. Do we have a plan/ticket
to get rid of GCGS (and make only_purge_repaired_tombstones default)? Will
it be covered in CASSANDRA-14145?

I created a ticket CASSANDRA-14543 for purgeable tombstone hints replaying,
which doesn't fix the root cause but reduces the chance to cause this
issue, please comment if you have any suggestion.

On Thu, Jun 21, 2018 at 12:55 PM Jeff Jirsa  wrote:

> Think he's talking about
> https://issues.apache.org/jira/browse/CASSANDRA-6434
>
> Doesn't solve every problem if you don't run repair at all, but if you're
> not running repairs, you're nearly guaranteed problems with resurrection
> after gcgs anyway.
>
>
>
> On Thu, Jun 21, 2018 at 11:33 AM, Jay Zhuang 
> wrote:
>
> > Yes, I also agree that the user should run (incremental) repair within
> GCGS
> > to prevent it from happening.
> >
> > @Sankalp, would you please point us the patch you mentioned from Marcus?
> > The problem is basically the same as
> > https://issues.apache.org/jira/browse/CASSANDRA-14145
> >
> > CASSANDRA-11427 <https://issues.apache.org/jira/browse/CASSANDRA-11427>
> is
> > actually the opposite of this problem. As purgeable tombstone is
> repaired,
> > this un-repairable problem cannot be reproduced. I tried 2.2.5 (before
> the
> > fix), it's able to repair the purgeable tombstone from node1 to node2, so
> > the data is deleted as expected. But it doesn't mean that's the right
> > behave, as it will also cause purgeable tombstones keeps bouncing around
> > the nodes.
> > I think https://issues.apache.org/jira/browse/CASSANDRA-14145 will fix
> the
> > problem by detecting the repaired/un-repaired data.
> >
> > How about having hints dispatch to deliver/replay purgeable (not live)
> > tombstones? It will reduce the chance to have this issue, especially when
> > GCGS < hinted handoff window.
> >
> > On Wed, Jun 20, 2018 at 9:36 AM sankalp kohli 
> > wrote:
> >
> > > I agree with Stefan that we should use incremental repair and use
> patches
> > > from Marcus to drop tombstones only from repaired data.
> > > Regarding deep repair, you can bump the read repair and run the repair.
> > The
> > > issue will be that you will stream lot of data and also your blocking
> > read
> > > repair will go up when you bump the gc grace to higher value.
> > >
> > > On Wed, Jun 20, 2018 at 1:10 AM Stefan Podkowinski 
> > > wrote:
> > >
> > > > Sounds like an older issue that I tried to address two years ago:
> > > > https://issues.apache.org/jira/browse/CASSANDRA-11427
> > > >
> > > > As you can see, the result hasn't been as expected and we got some
> > > > unintended side effects based on the patch. I'm not sure I'd be
> willing
> > > > to give this another try, considering the behaviour we like to fix in
> > > > the first place is rather harmless and the read repairs shouldn't
> > happen
> > > > at all to any users who regularly run repairs within gc_grace.
> > > >
> > > > What I'd suggest is to think more into the direction of a
> > > > post-full-repair-world and to fully embrace incremental repairs, as
> > > > fixed by Blake in 4.0. In that case, we should stop doing read
> repairs
> > > > at all for repaired data, as described in
> > > > https://issues.apache.org/jira/browse/CASSANDRA-13912. RRs are
> > certainly
> > > > useful, but can be very risky if not very very carefully implemented.
> > So
> > > > I'm wondering if we shouldn't disable RRs for everything but
> unrepaired
> > > > data. I'd btw also be interested to hear any opinions on this in
> > context
> > > > of transient replicas.
> > > >
> > > >
> > > > On 20.06.2018 03:07, Jay Zhuang wrote:
> > > > > Hi,
> > > > >
> > > > > We know that the deleted data may re-appear if repair is not run
> > within
> > > > > gc_grace_seconds. When the tombstone is not propagated to all
> nodes,
> > > the
> > > > > data will re-appear. But it's also causing following 2 issues
> before
> > > the
> > > > > tombstone is compacted away:
> > > > > a. inconsistent query result
> > > > >
> > > > > With consistency level ONE or QUORUM, it may or may not return the
> > > value.
> > > > > b. lots of read repairs, but doesn't repair anything
> > > > >
> >

Re: Tombstone passed GC period causes un-repairable inconsistent data

2018-06-21 Thread Jay Zhuang
Yes, I also agree that the user should run (incremental) repair within GCGS
to prevent it from happening.

@Sankalp, would you please point us the patch you mentioned from Marcus?
The problem is basically the same as
https://issues.apache.org/jira/browse/CASSANDRA-14145

CASSANDRA-11427 <https://issues.apache.org/jira/browse/CASSANDRA-11427> is
actually the opposite of this problem. As purgeable tombstone is repaired,
this un-repairable problem cannot be reproduced. I tried 2.2.5 (before the
fix), it's able to repair the purgeable tombstone from node1 to node2, so
the data is deleted as expected. But it doesn't mean that's the right
behave, as it will also cause purgeable tombstones keeps bouncing around
the nodes.
I think https://issues.apache.org/jira/browse/CASSANDRA-14145 will fix the
problem by detecting the repaired/un-repaired data.

How about having hints dispatch to deliver/replay purgeable (not live)
tombstones? It will reduce the chance to have this issue, especially when
GCGS < hinted handoff window.

On Wed, Jun 20, 2018 at 9:36 AM sankalp kohli 
wrote:

> I agree with Stefan that we should use incremental repair and use patches
> from Marcus to drop tombstones only from repaired data.
> Regarding deep repair, you can bump the read repair and run the repair. The
> issue will be that you will stream lot of data and also your blocking read
> repair will go up when you bump the gc grace to higher value.
>
> On Wed, Jun 20, 2018 at 1:10 AM Stefan Podkowinski 
> wrote:
>
> > Sounds like an older issue that I tried to address two years ago:
> > https://issues.apache.org/jira/browse/CASSANDRA-11427
> >
> > As you can see, the result hasn't been as expected and we got some
> > unintended side effects based on the patch. I'm not sure I'd be willing
> > to give this another try, considering the behaviour we like to fix in
> > the first place is rather harmless and the read repairs shouldn't happen
> > at all to any users who regularly run repairs within gc_grace.
> >
> > What I'd suggest is to think more into the direction of a
> > post-full-repair-world and to fully embrace incremental repairs, as
> > fixed by Blake in 4.0. In that case, we should stop doing read repairs
> > at all for repaired data, as described in
> > https://issues.apache.org/jira/browse/CASSANDRA-13912. RRs are certainly
> > useful, but can be very risky if not very very carefully implemented. So
> > I'm wondering if we shouldn't disable RRs for everything but unrepaired
> > data. I'd btw also be interested to hear any opinions on this in context
> > of transient replicas.
> >
> >
> > On 20.06.2018 03:07, Jay Zhuang wrote:
> > > Hi,
> > >
> > > We know that the deleted data may re-appear if repair is not run within
> > > gc_grace_seconds. When the tombstone is not propagated to all nodes,
> the
> > > data will re-appear. But it's also causing following 2 issues before
> the
> > > tombstone is compacted away:
> > > a. inconsistent query result
> > >
> > > With consistency level ONE or QUORUM, it may or may not return the
> value.
> > > b. lots of read repairs, but doesn't repair anything
> > >
> > > With consistency level ALL, it always triggers a read repair.
> > > With consistency level QUORUM, it also very likely (2/3) causes a read
> > > repair. But it doesn't repair the data, so it's causing repair every
> > time.
> > >
> > >
> > > Here are the reproducing steps:
> > >
> > > 1. Create a 3 nodes cluster
> > > 2. Create a table (with small gc_grace_seconds):
> > >
> > > CREATE KEYSPACE foo WITH replication = {'class': 'SimpleStrategy',
> > > 'replication_factor': 3};
> > > CREATE TABLE foo.bar (
> > > id int PRIMARY KEY,
> > > name text
> > > ) WITH gc_grace_seconds=30;
> > >
> > > 3. Insert data with consistency all:
> > >
> > > INSERT INTO foo.bar (id, name) VALUES(1, 'cstar');
> > >
> > > 4. stop 1 node
> > >
> > > $ ccm node2 stop
> > >
> > > 5. Delete the data with consistency quorum:
> > >
> > > DELETE FROM foo.bar WHERE id=1;
> > >
> > > 6. Wait 30 seconds and then start node2:
> > >
> > > $ ccm node2 start
> > >
> > > Now the tombstone is on node1 and node3 but not on node2.
> > >
> > > With quorum read, it may or may not return value, and read repair will
> > send
> > > the data from node2 to node1 and node3, but it doesn't repair anything.
> > >
> > > I'd like to discuss a few potential solutions and workarounds:
> > >
> > > 1. Can hints replay sends GCed tombstone?
> > >
> > > 2. Can we have a "deep repair" which detects such issue and repair the
> > GCed
> > > tombstone? Or temperately increase the gc_grace_seconds for repair?
> > >
> > > What other suggestions you have if the user is having such issue?
> > >
> > >
> > > Thanks,
> > >
> > > Jay
> > >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> >
>


Tombstone passed GC period causes un-repairable inconsistent data

2018-06-19 Thread Jay Zhuang
Hi,

We know that the deleted data may re-appear if repair is not run within
gc_grace_seconds. When the tombstone is not propagated to all nodes, the
data will re-appear. But it's also causing following 2 issues before the
tombstone is compacted away:
a. inconsistent query result

With consistency level ONE or QUORUM, it may or may not return the value.
b. lots of read repairs, but doesn't repair anything

With consistency level ALL, it always triggers a read repair.
With consistency level QUORUM, it also very likely (2/3) causes a read
repair. But it doesn't repair the data, so it's causing repair every time.


Here are the reproducing steps:

1. Create a 3 nodes cluster
2. Create a table (with small gc_grace_seconds):

CREATE KEYSPACE foo WITH replication = {'class': 'SimpleStrategy',
'replication_factor': 3};
CREATE TABLE foo.bar (
id int PRIMARY KEY,
name text
) WITH gc_grace_seconds=30;

3. Insert data with consistency all:

INSERT INTO foo.bar (id, name) VALUES(1, 'cstar');

4. stop 1 node

$ ccm node2 stop

5. Delete the data with consistency quorum:

DELETE FROM foo.bar WHERE id=1;

6. Wait 30 seconds and then start node2:

$ ccm node2 start

Now the tombstone is on node1 and node3 but not on node2.

With quorum read, it may or may not return value, and read repair will send
the data from node2 to node1 and node3, but it doesn't repair anything.

I'd like to discuss a few potential solutions and workarounds:

1. Can hints replay sends GCed tombstone?

2. Can we have a "deep repair" which detects such issue and repair the GCed
tombstone? Or temperately increase the gc_grace_seconds for repair?

What other suggestions you have if the user is having such issue?


Thanks,

Jay


Re: Rocksandra performance test result

2018-06-03 Thread Jay Zhuang
We just do double (triple) writes on the application side. We're shadowing
partial of the traffic to a smaller staging cluster for new release test,
performance/configuration tuning.

On Sat, Jun 2, 2018 at 7:47 PM Nate McCall  wrote:

> > Thanks for sharing, Jay.
> >
> > Could you say a bit more about how you’ve approached shadowing traffic
> against an alternate cluster? (Asking not so much with regard to the
> Rocksandra test, but toward shadowing methodology in general).
>
> +1 to this. If someone out there had a method to transparently tee
> read and write traffic to a Cassandra cluster, I would like to hear
> about it as that would be hugely valuable to the community.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Rocksandra performance test result

2018-06-01 Thread Jay Zhuang
We're shadowing some production traffics to a Rocksandra cluster (
https://github.com/Instagram/cassandra/tree/rocks_3.0), the P99 latency is
significantly improved (about 6x for read, 12x for write). Here are the
test details:

https://docs.google.com/document/d/1cEM8ZqB5tOYVdsh1LpqSZ-eLasumWfzn_Tk_lDjxOTk/

Thanks,
Jay


Re: CommitLogSegmentManager verbose debug log

2018-04-07 Thread Jay Zhuang
 Make senses to me. Not sure if I should just push a Ninja fix, created the 
ticket anyway: CASSANDRA-14370. Are you interested in creating a quick patch 
for it?
On Tuesday, April 3, 2018, 3:06:28 AM PDT, Nicolas Guyomar 
<nicolas.guyo...@gmail.com> wrote:  
 
 Hi Jay,

Well the log in itself does not provide useful information (like segment
number or sthg like that), so IMHO trace would be a better level for this
one

I agree that one log per sec may not be seen that verbose !

Thank you

On 30 March 2018 at 06:36, Jay Zhuang <jay.zhu...@yahoo.com> wrote:

> It's changed to trace() in cassandra-3.0 with CASSANDRA-10241:
> https://github.com/pauloricardomg/cassandra/commit/
> 3ef1b18fa76dce7cd65b73977fc30e51301f3fed#diff-
> d07279710c482983e537aed26df80400
>
> In cassandra-3.11 (and trunk), it's changed back to debug() with
> CASSANDRA-10202:
> https://github.com/apache/cassandra/commit/e8907c16abcd84021a39cdaac79b60
> 9fcc64a43c#diff-85e13493c70723764c539dd222455979
>
> The message is logged when a new commit-log is created, so it's not that
> verbose from my point of view. But I'm also fine to change it back to trace.
>
> Here is a sample of debug.log while running cassandra-stress:
> https://gist.githubusercontent.com/cooldoger/
> 12f507da9b41b232d8869bbcd2bfd02b/raw/241cd8f0639269966aa53e2b10cee6
> 13f8ed8cfe/gistfile1.txt
>
>
>
> On Thursday, March 29, 2018, 8:47:54 AM PDT, Nicolas Guyomar <
> nicolas.guyo...@gmail.com> wrote:
>
>
> Hi guys,
>
> I'm trying to understand the meaning of the following log
> in org.apache.cassandra.db.commitlog.CommitLogSegmentManager.java
>
> logger.debug("No segments in reserve; creating a fresh one");
>
> I feel like it could be remove, as it seems to be kind of a continuous task
> of providing segment
>
> Any thought on removing this log ? (my debug.log is quite full of it)
>
> Thank you
>
> Nicolas
>
  

Re: CommitLogSegmentManager verbose debug log

2018-03-29 Thread Jay Zhuang
 It's changed to trace() in cassandra-3.0 with 
CASSANDRA-10241:https://github.com/pauloricardomg/cassandra/commit/3ef1b18fa76dce7cd65b73977fc30e51301f3fed#diff-d07279710c482983e537aed26df80400

In cassandra-3.11 (and trunk), it's changed back to debug() with 
CASSANDRA-10202:https://github.com/apache/cassandra/commit/e8907c16abcd84021a39cdaac79b609fcc64a43c#diff-85e13493c70723764c539dd222455979

The message is logged when a new commit-log is created, so it's not that 
verbose from my point of view. But I'm also fine to change it back to trace.
Here is a sample of debug.log while running 
cassandra-stress:https://gist.githubusercontent.com/cooldoger/12f507da9b41b232d8869bbcd2bfd02b/raw/241cd8f0639269966aa53e2b10cee613f8ed8cfe/gistfile1.txt



On Thursday, March 29, 2018, 8:47:54 AM PDT, Nicolas Guyomar 
 wrote:  
 
 Hi guys,

I'm trying to understand the meaning of the following log
in org.apache.cassandra.db.commitlog.CommitLogSegmentManager.java

logger.debug("No segments in reserve; creating a fresh one");

I feel like it could be remove, as it seems to be kind of a continuous task
of providing segment

Any thought on removing this log ? (my debug.log is quite full of it)

Thank you

Nicolas
  

Re: RE: how to fix constantly getting out of memory (3.11)

2018-03-19 Thread Jay Zhuang
 Hi,
For CASSANDRA-13929, The patch is available for review. Anyone interested in 
reviewing it?
Thanks,Jay
On Tuesday, December 12, 2017, 5:02:14 AM PST, Steinmaurer, Thomas 
 wrote:  
 
 Hi,

if you are talking about on-heap troubles, then the following might be related 
in 3.11.x:
https://issues.apache.org/jira/browse/CASSANDRA-13929

Thomas

-Original Message-
From: Micha [mailto:mich...@fantasymail.de]
Sent: Dienstag, 12. Dezember 2017 09:24
To: dev@cassandra.apache.org
Subject: how to fix constantly getting out of memory (3.11)

Hi,

I have seven nodes, debian stretch with c*3.11, each with 2TB disk (500G free), 
32G Ram.
I have a keyspace with seven tables. At the moment the cluster doesn't work at 
all reliably. Every morning at least 2 nodes are shut down due to out of 
memory. Repair afterwards fails with "some repair failed".
I use G1 with 16G heap on 6 six nodes and cms with 8G heap on one node to see a 
difference. In munin it's easy to see a constantly rising memory consumption. 
There are no other services running. I cannot understand who is not releasing 
the memory.
Some tables have some big rows (as mentioned in my last mail to the list). Can 
this be a source of the memory consumption?

How do you track down this?  Is there memory which doesn't get released and 
accumulates over time? I have not yet debugged such gc/memory issues.

cheers
 Michael


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

  

Re: penn state academic paper - "scalable" bloom filters

2018-02-22 Thread Jay Zhuang
I think there's a similar idea here to dynamically resize the BF:
https://issues.apache.org/jira/browse/CASSANDRA-6633, but I don't quite
understand the idea there.


On Thu, Feb 22, 2018 at 7:45 AM, Carl Mueller 
wrote:

> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.
> 62.7953=rep1=pdf
>
> looks to be an adaptive approach where the "initial guess" bloom filters
> are enhanced with more layers of ones generated after usage stats are
> gained.
>
> Disclaimer: I suck at reading academic papers.
>


Re: CDC usability and future development

2018-02-01 Thread Jay Zhuang
We did a POC to improve CDC feature as an interface (
https://github.com/ngcc/ngcc2017/blob/master/CassandraDataIngestion.pdf),
so the user doesn't have to read the commit log directly. We deployed the
change to a test cluster and doing more tests for production traffics, will
send out the design proposal, POC and the test result soon.

We have the same problem to get the full row value for CDC downstream
pipeline. We used to do a readback, right now our CDC downstream stores all
the data (in Hive), so no need to read back. For Cassandra CDC feature, I
don't think it should provide the full row value, as it supposed to be
Change Data Capture. But it's still a problem for the range delete, as it
cannot read back deleted data. So we're purposing an option to expand the
range delete in CDC if the user really wants it.


On Wed, Jan 31, 2018 at 7:32 AM Josh McKenzie  wrote:

> CDC provides only the mutation as opposed to the full column value, which
>> tends to be of limited use for us. Applications might want to know the full
>> column value, without having to issue a read back. We also see value in
>> being able to publish the full column value both before and after the
>> update. This is especially true when deleting a column since this stream
>> may be joined with others, or consumers may require other fields to
>> properly process the delete.
>
>
> Philosophically, my first pass at the feature prioritized minimizing
> impact to node performance first and usability second, punting a lot of the
> de-duplication and RbW implications of having full column values, or
> materializing stuff off-heap for consumption from a user and flagging as
> persisted to disk etc, for future work on the feature. I don't personally
> have any time to devote to moving the feature forward now but as Jeff
> indicates, Jay and Simon are both active in the space and taking up the
> torch.
>
>
> On Tue, Jan 30, 2018 at 8:35 PM, Jeff Jirsa  wrote:
>
>> Here's a deck of some proposed additions, discussed at one of the NGCC
>> sessions last fall:
>>
>> https://github.com/ngcc/ngcc2017/blob/master/CassandraDataIngestion.pdf
>>
>>
>>
>> On Tue, Jan 30, 2018 at 5:10 PM, Andrew Prudhomme  wrote:
>>
>> > Hi all,
>> >
>> > We are currently designing a system that allows our Cassandra clusters
>> to
>> > produce a stream of data updates. Naturally, we have been evaluating if
>> CDC
>> > can aid in this endeavor. We have found several challenges in using CDC
>> for
>> > this purpose.
>> >
>> > CDC provides only the mutation as opposed to the full column value,
>> which
>> > tends to be of limited use for us. Applications might want to know the
>> full
>> > column value, without having to issue a read back. We also see value in
>> > being able to publish the full column value both before and after the
>> > update. This is especially true when deleting a column since this stream
>> > may be joined with others, or consumers may require other fields to
>> > properly process the delete.
>> >
>> > Additionally, there is some difficulty with processing CDC itself such
>> as:
>> > - Updates not being immediately available (addressed by CASSANDRA-12148)
>> > - Each node providing an independent streams of updates that must be
>> > unified and deduplicated
>> >
>> > Our question is, what is the vision for CDC development? The current
>> > implementation could work for some use cases, but is a ways from a
>> general
>> > streaming solution. I understand that the nature of Cassandra makes this
>> > quite complicated, but are there any thoughts or desires on the future
>> > direction of CDC?
>> >
>> > Thanks
>> >
>> >
>>
>
>


Re: Cassandra Dtests: skip upgrade tests

2017-12-08 Thread Jay Zhuang
 Here is how cassandra-builds jenkins job do:$ rm -r upgrade_tests/
https://github.com/apache/cassandra-builds/blob/master/build-scripts/cassandra-dtest.sh#L50

On Friday, December 8, 2017, 1:28:34 AM PST, Sergey 
 wrote:  
 
 Hi!

How to completely skip upgrade tests when running dtests?

Best regards,
Sergey  

Re: Flakey Dtests

2017-11-27 Thread Jay Zhuang
I fixed one CDC uTest, please 
review:https://issues.apache.org/jira/browse/CASSANDRA-14066


On Friday, November 17, 2017 6:34 AM, Josh McKenzie  
wrote:
 

 >
> Do we have any volunteers to fix the broken Materialized Views and CDC
> DTests?

I'll try to take a look at the CDC tests next week; looks like one of the
base unit tests is failing as well.

On Fri, Nov 17, 2017 at 12:09 AM, Michael Kjellman <
mkjell...@internalcircle.com> wrote:

> Quick update re: dtests and off-heap memtables:
>
> I’ve filed CASSANDRA-14056 (Many dtests fail with ConfigurationException:
> offheap_objects are not available in 3.0 when OFFHEAP_MEMTABLES=“true”)
>
> Looks like we’re gonna need to do some work to test this configuration and
> right now it’s pretty broken...
>
> Do we have any volunteers to fix the broken Materialized Views and CDC
> DTests?
>
> best,
> kjellman
>
>
> > On Nov 15, 2017, at 5:59 PM, Michael Kjellman <
> mkjell...@internalcircle.com> wrote:
> >
> > yes - true- some are flaky, but almost all of the ones i filed fail 100%
> () of the time. i look forward to triaging just the remaining flaky ones
> (hopefully - without powers combined - by the end of this month!!)
> >
> > appreciate everyone’s help - no matter how small... i already personally
> did a few “fun” random-python-class-is-missing-return-after-method stuff.
> >
> > we’ve wanted this for a while and now is our time to actually execute
> and make good on our previous dev list promises.
> >
> > best,
> > kjellman
> >
> >> On Nov 15, 2017, at 5:45 PM, Jeff Jirsa  wrote:
> >>
> >> In lieu of a weekly wrap-up, here's a pre-Thanksgiving call for help.
> >>
> >> If you haven't been paying attention to JIRA, you likely didn't notice
> that
> >> Josh went through and triage/categorized a bunch of issues by adding
> >> components, and Michael took the time to open a bunch of JIRAs for
> failing
> >> tests.
> >>
> >> How many is a bunch? Something like 35 or so just for tests currently
> >> failing on trunk.  If you're a regular contributor, you already know
> that
> >> dtests are flakey - it'd be great if a few of us can go through and fix
> a
> >> few. Even incremental improvements are improvements. Here's an easy
> search
> >> to find them:
> >>
> >> https://issues.apache.org/jira/secure/IssueNavigator.
> jspa?reset=true=project+%3D+CASSANDRA+AND+
> component+%3D+Testing+ORDER+BY+updated+DESC%2C+priority+
> DESC%2C+created+ASC=hide
> >>
> >> If you're a new contributor, fixing tests is often a good way to learn a
> >> new part of the codebase. Many of these are dtests, which live in a
> >> different repo ( https://github.com/apache/cassandra-dtest ) and are in
> >> python, but have no fear, the repo has instructions for setting up and
> >> running dtests(
> >> https://github.com/apache/cassandra-dtest/blob/master/INSTALL.md )
> >>
> >> Normal contribution workflow applies: self-assign the ticket if you
> want to
> >> work on it, click on 'start progress' to indicate that you're working on
> >> it, mark it 'patch available' when you've uploaded code to be reviewed
> (in
> >> a github branch, or as a standalone patch file attached to the JIRA). If
> >> you have questions, feel free to email the dev list (that's what it's
> here
> >> for).
> >>
> >> Many thanks will be given,
> >> - Jeff
>
>

   

Re: Do not use Cassandra 3.11.0+ or Cassandra 3.0.12+

2017-08-28 Thread Jay Zhuang
We're using 3.0.12+ for a few months and haven't seen the issue like
that. Do we know what could trigger the problem? Or is 3.0.x really
impacted?

Thanks,
Jay

On 8/28/17 6:02 AM, Hannu Kröger wrote:
> Hello,
> 
> Current latest Cassandra version (3.11.0, possibly also 3.0.12+) has a race
> condition that causes Cassandra to create broken sstables (stats file in
> sstables to be precise).
> 
> Bug described here:
> https://issues.apache.org/jira/browse/CASSANDRA-13752
> 
> This change might be causing it (but not sure):
> https://issues.apache.org/jira/browse/CASSANDRA-13038
> 
> Other related issues:
> https://issues.apache.org/jira/browse/CASSANDRA-13718
> https://issues.apache.org/jira/browse/CASSANDRA-13756
> 
> I would not recommend using 3.11.0 nor upgrading to 3.0.12 or higher before
> this is fixed.
> 
> Cheers,
> Hannu
> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: CASSANDRA-9472 Reintroduce off heap memtables - patch to 3.0

2017-07-29 Thread Jay Zhuang
Should we consider back-porting it to 3.0 for the community? I think
this is a performance regression instead of new feature. And we have the
feature in 2.1, 2.2.

On 7/27/17 11:52 PM, Andrew Whang wrote:
> Yes, seeing latency improvement after backporting 9472 to 3.0.13. We are
> measuring p99 latency, thus moving objects off heap improved gc stalls,
> which directly affects our read/write p99 latency.
> 
> On Thu, Jul 27, 2017 at 10:54 PM, Jeff Jirsa <jji...@gmail.com> wrote:
> 
>> This is after you backported 9472 to 3.0?
>>
>> --
>> Jeff Jirsa
>>
>>
>>> On Jul 27, 2017, at 10:33 PM, Andrew Whang <andrewgwh...@gmail.com>
>> wrote:
>>>
>>> Jay,
>>>
>>> We see ~20% write latency improvement on 3.0.13 in a write-heavy
>> workload,
>>> using offheap_objects. offheap_buffers only offered minimal improvement.
>>>
>>> On Thu, Jul 27, 2017 at 10:06 PM, Jay Zhuang
>> <jay.zhu...@yahoo.com.invalid>
>>> wrote:
>>>
>>>> Hi Andrew,
>>>>
>>>> Do you see performance gain from reintroducing off-heap memtables for
>>>> 3.0.x? When we were on 2.2.x we saw big improvements from enabling
>>>> off-heap memtables.
>>>>
>>>> Thanks,
>>>> Jay
>>>>
>>>>> On 7/27/17 9:37 PM, Andrew Whang wrote:
>>>>> I'm wondering if anyone has been able to patch CASSANDRA-9472 to 3.0,
>>>>> without breaking unit tests. The patch was introduced in 3.4, but 3.0.x
>>>>> contains unit tests and code from later 3.x releases, which makes
>>>> debugging
>>>>> unit test failures difficult - i.e. SSTableCorruptionDetectionTest,
>>>> which
>>>>> was introduced in 3.7 and is found in 3.0.14, but not in 3.4.
>>>>>
>>>>
>>>> -
>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>>>
>>>>
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>
>>
> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: CASSANDRA-9472 Reintroduce off heap memtables - patch to 3.0

2017-07-27 Thread Jay Zhuang
Hi Andrew,

Do you see performance gain from reintroducing off-heap memtables for
3.0.x? When we were on 2.2.x we saw big improvements from enabling
off-heap memtables.

Thanks,
Jay

On 7/27/17 9:37 PM, Andrew Whang wrote:
> I'm wondering if anyone has been able to patch CASSANDRA-9472 to 3.0,
> without breaking unit tests. The patch was introduced in 3.4, but 3.0.x
> contains unit tests and code from later 3.x releases, which makes debugging
> unit test failures difficult - i.e. SSTableCorruptionDetectionTest, which
> was introduced in 3.7 and is found in 3.0.14, but not in 3.4.
> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



commitlog_total_space_in_mb tuning

2017-07-05 Thread Jay Zhuang
Hi,

commitlog_total_space_in_mb is increased from 1G to 8G in
CASSANDRA-7031. Sometimes we saw the number of dropped mutations spikes.
Not sure if it's a sign that we should increase the
commitlog_total_space_in_mb?

For bean:
org.apache.cassandra.metrics:name=WaitingOnSegmentAllocation,type=CommitLog
Mean is 48684 microseconds
Max is 1386179 microseconds

I think it's relatively high, compare to our other clusters. Does anyone
tune that parameter? Any suggestion on that?

Thanks,
Jay

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: NGCC Proposal (Was Re: NGCC?)

2017-06-20 Thread Jay Zhuang
Just one day this year?

On 6/13/17 12:34 PM, Jonathan Haddad wrote:
> Agreed with Jeff & Jason.
> 
> On Tue, Jun 13, 2017 at 11:45 AM Jeff Jirsa  wrote:
> 
>> Looks great to me - especially the venue.  Date wise, Tuesday (19th) lets
>> people fly in on Monday instead of costing a weekend, so selfishly that
>> seems better to me.
>>
>>
>>
>> On Mon, Jun 12, 2017 at 1:30 PM, Gary Dusbabek 
>> wrote:
>>
>>> ## Cassandra list email
>>>
>>> Hi everybody,
>>>
>>> Here are current thoughts about structure and timing. Your feedback is
>>> welcome.
>>>
>>> Date: One of 18, 19, or 22 September 2017. We are aware the Tokyo user
>>> group is planning an event the first week in October. We’re doing our
>> best
>>> to give a buffer there.
>>>
>>> Venue: After evaluating a few options in San Antonio, the event space at
>>> Geekdom seems like a good fit and the right balance of cost and services
>> (
>>> goo.gl/tViC72).
>>>
>>> Format: This will be a one day event, starting in the morning and running
>>> through the evening. Here is a proposed agenda we can iterate on:
>>>
>>> * 8-9am - catered breakfast, conversation
>>> * 9-12 - presentations, including a coffee/snack break
>>> * 12-1:30pm - catered lunch
>>> * 1:30-4:00pm - presentations, including a coffee/snack break
>>> * 4-5pm - lightning talks
>>> * 5-6:30pm - happy hour and a half
>>> * 6:30-9:30pm - dinner at a local restaurant
>>>
>>> We hope to film the presentations and make them available on Youtube.
>>>
>>> A few have already reached out offering various resources or assistance.
>>> Thank you! Eric or I will be reaching out to coordinate as soon as we are
>>> ready to finalize plans.
>>>
>>> Please chime in if you have suggestions, especially those relating to
>>> format.
>>>
>>> Cheers,
>>>
>>> Gary
>>>
>>
> 

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Is concurrent_batchlog_writes option used/implemented?

2017-06-13 Thread Jay Zhuang
Looks like it's a document issue. The option is removed in 3.0.0 for 
CASSANDRA-9673: 
https://github.com/apache/cassandra/commit/53a177a9150586e56408f25c959f75110a2997e7


Thanks,
Jay

On 6/13/17 5:33 AM, Tomas Repik wrote:

Hi,

while browsing the options for setting up Cassandra at [1] I found an option 
`concurrent_batchlog_writes`. This is mentioned only in this documentation but 
I could not find it in the config file nor in the source code. Any comments 
regarding this are welcome.

Thanks

Tomas

[1] 
https://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/configCassandra_yaml.html#configCassandra_yaml__commonProps

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Status on new nodes for builds.apache.org

2017-06-05 Thread Jay Zhuang
circleci is pretty good. Thanks for adding that. Would be better to have 
DTest too.


One problem for circleci is it takes much longer than a typical host 
(70 minutes vs. 40 minutes on a macbook pro). With more parallel 
test.runners, local host could run even faster:

https://issues.apache.org/jira/browse/CASSANDRA-13078

I'll add circleci test result to the patches in the future. But seems we 
still need a committer to run the DTest for us.


Thanks,
Jay

On 6/5/17 2:02 PM, Jeff Jirsa wrote:

We haven't really talked about it - if we can do it without giving out full 
admin that seems reasonable but resources are sorta limited and the asf 
hardware runs pretty slow.

Have you tried circleci ? Free and requires no privileges - works reasonably 
well.





-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Status on new nodes for builds.apache.org

2017-06-05 Thread Jay Zhuang
Is there any plan to give the CI permission to non-committers? It would 
be great if we could also use that.


Thanks,
Jay

On 6/2/17 10:24 AM, Stefan Podkowinski wrote:

Just a quick heads up for everyone interested in the jobs history at
builds.apache.org or who wants to run devbranch jobs there. A couple of
Jenkins nodes are not working correctly, which is causing jobs to abort
abnormally during start. You'd either have to rebuild until you hit a
working node, or wait until this issue has been resolved (follow
INFRA-14153 for that).

Btw, thanks for Instclustr for donating much needed resources! Those
nodes will be much appreciated once they are working :)

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Fwd: Potential block issue for 3.0.13: schema version id mismatch while upgrading

2017-05-30 Thread Jay Zhuang
Seems the mail is marked as spam. So try forwarding with another email
account.

Thanks,
Jay

-- Forwarded message --
From: Jay Zhuang <jay.zhu...@yahoo.com.invalid>
Date: Tue, May 30, 2017 at 2:22 PM
Subject: Potential block issue for 3.0.13: schema version id mismatch while
upgrading
To: dev@cassandra.apache.org


Hi,

While upgrading to 3.0.13 we found that the schema id is changed for the
same schema. Which could cause cassandra unable to start and other issues
related to UnknownColumnFamilyException. Ticket: CASSANDRA-13559

The problem is because the order of SchemaKeyspace tables is changed. Then
the digest for the same schema is also changed:
https://github.com/apache/cassandra/blob/cassandra-3.0/src/
java/org/apache/cassandra/schema/SchemaKeyspace.java#L311

I would suggest to have the older list back for digest calculation. But it
also means 3.0.13 -> 3.0.14 upgrade will have the same problem. Any
suggestion on that?

Thanks,
Jay

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org


Potential block issue for 3.0.13: schema version id mismatch while upgrading

2017-05-30 Thread Jay Zhuang

Hi,

While upgrading to 3.0.13 we found that the schema id is changed for the 
same schema. Which could cause cassandra unable to start and other 
issues related to UnknownColumnFamilyException. Ticket: CASSANDRA-13559


The problem is because the order of SchemaKeyspace tables is changed. 
Then the digest for the same schema is also changed: 
https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/schema/SchemaKeyspace.java#L311


I would suggest to have the older list back for digest calculation. But 
it also means 3.0.13 -> 3.0.14 upgrade will have the same problem. Any 
suggestion on that?


Thanks,
Jay

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: How to read CDC from Cassandra?

2017-02-15 Thread Jay Zhuang

I tried this CASSANDRA-11575 for 3.8. Works great.

Thanks,
Jay

On 2/15/17 3:08 PM, S G wrote:

Hi,

I have gone through several resources mentioned in
http://cassandra.apache.org/doc/latest/operating/cdc.html

The only thing mentioned about reading the CDC is that it is fairly
straightforward with a link to
https://github.com/apache/cassandra/blob/e31e216234c6b57a531cae607e0355666007deb2/src/java/org/apache/cassandra/db/commitlog/CommitLogReplayer.java#L132-L140

This is way too high level.

Can someone please explain or provide me the code to read CDC data after
enabling this feature in Cassandra?


Thanks

SG



Re: Have a CDC commitLog process option in Cassandra

2017-02-09 Thread Jay Zhuang
No. It's going to have Cassandra to manage the CDC logs, instead of 
having another daemon process to handle that.


Here is CDC design JIRA: CASSANDRA-8844. The pain point is to develop 
and manage the daemon. If they're integrated, it's going to be easier to 
manage and monitor that.


Thanks,
Jay

On 2/9/17 3:57 PM, Dikang Gu wrote:

Is it for testing purpose?

On Thu, Feb 9, 2017 at 3:54 PM, Jay Zhuang <jay.zhu...@yahoo.com.invalid>
wrote:


Hi,

To process the CDC commitLogs, it requires a separate Daemon process, Carl
has a Daemon example here: CASSANDRA-11575.

Does it make sense to integrate it into Cassandra? So the user doesn't
have to manage another JVM on the same box. Then provide an ITrigger like
interface (https://github.com/apache/cassandra/blob/trunk/src/java/org
/apache/cassandra/triggers/ITrigger.java#L49) to process the data.

Or maybe provide an interface option to handle the CDC commitLog in
SegmentManager(https://github.com/apache/cassandra/blob/trun
k/src/java/org/apache/cassandra/db/commitlog/CommitLogSegmen
tManagerCDC.java#L68).

Any comments? If it make sense, I could create a JIRA for that.

Thanks,
Jay







Have a CDC commitLog process option in Cassandra

2017-02-09 Thread Jay Zhuang

Hi,

To process the CDC commitLogs, it requires a separate Daemon process, 
Carl has a Daemon example here: CASSANDRA-11575.


Does it make sense to integrate it into Cassandra? So the user doesn't 
have to manage another JVM on the same box. Then provide an ITrigger 
like interface 
(https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/triggers/ITrigger.java#L49) 
to process the data.


Or maybe provide an interface option to handle the CDC commitLog in 
SegmentManager(https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/CommitLogSegmentManagerCDC.java#L68).


Any comments? If it make sense, I could create a JIRA for that.

Thanks,
Jay


Re: Current Branch Merge Path - PLEASE READ!

2016-12-12 Thread Jay Zhuang

Thanks Jeff. And I assume that the new features should only go to 3.x.

What about the backport process? For example, CASSANDRA-12941 is asking 
for backport a fix, should that be accepted? I could argue that it's a 
bug fix for Materialized View instead of new feature.


Thanks,
Jay


On 12/11/16 4:09 AM, Jeff Jirsa wrote:

It depends on severity, but generally… If you find a bug in 3.0, you should 
work back to 2.1 to see if it exists in older versions. We don’t put minor 
fixes into 2.1 (or really 2.2 at this point) – 2.1 is critical fixes only, and 
2.2 is getting to that point as well.

If it’s a minor minor bug, fix it in 3.0 and generate patches for versions 
newer than that. If it’s a critical bug, go back to 2.1 and see if it exists 
there as well.



On 12/10/16, 6:03 PM, "Jay Zhuang" <jay.zhu...@yahoo.com.INVALID> wrote:


I'm new to the community, sorry if it’s obvious question. Are there any
general guidance on choosing which branch we should start with? For
example, if I find a bug in 3.0, should I try to reproduce it in the
lowest version (2.1) and work from there?

Thanks,
Jay

On 12/8/16 10:29 AM, Michael Shuler wrote:

The current branch merge path is, in full:

cassandra-2.1
 |
 cassandra-2.2
  |
  cassandra-3.0
   |
   cassandra-3.11
|
cassandra-3.X
 |
 trunk

Wherever you start, please follow through the complete path to trunk.

I reopened JIRAs #12768, #12817, and #12694 for skipping cassandra-3.11.
Owners of those tickets, please commit to the cassandra-3.11 branch and
merge up. There were too many conflicts for me to comfortably try to
resolve on a straight merge from 3.0.



Re: Current Branch Merge Path - PLEASE READ!

2016-12-10 Thread Jay Zhuang
I'm new to the community, sorry if it’s obvious question. Are there any 
general guidance on choosing which branch we should start with? For 
example, if I find a bug in 3.0, should I try to reproduce it in the 
lowest version (2.1) and work from there?


Thanks,
Jay

On 12/8/16 10:29 AM, Michael Shuler wrote:

The current branch merge path is, in full:

cassandra-2.1
 |
 cassandra-2.2
  |
  cassandra-3.0
   |
   cassandra-3.11
|
cassandra-3.X
 |
 trunk

Wherever you start, please follow through the complete path to trunk.

I reopened JIRAs #12768, #12817, and #12694 for skipping cassandra-3.11.
Owners of those tickets, please commit to the cassandra-3.11 branch and
merge up. There were too many conflicts for me to comfortably try to
resolve on a straight merge from 3.0.