Re: Check Cluster Health

2018-06-28 Thread Rahul Singh


When you run TPstats or Tablestats subcommands in nodetool you are actually 
accessing data inside Cassandra via JMX.

You can start there at first.

Rahul
On Jun 28, 2018, 10:55 AM -0500, Thouraya TH , wrote:
> Hi,
>
> Please, how can check the health of my cluster / data center using cassandra ?
> In fact i'd like to generate a hitory of the state of each node. an history 
> about the failure of my cluster ( 20% of failure in a day, 40% of failure in 
> a day etc...)
>
> Thank you so much.
> Kind regards.


Re: Cassandra read/sec and write/sec

2018-06-28 Thread Randy Lynn
Abdul - I use DataDog
I track the "latency one minute rate" for both read/writes.
But I'm interested to see what others say and if I got that right?

On Thu, Jun 28, 2018 at 6:19 PM, Abdul Patel  wrote:

> Hi all
>
> We use prometheus to monitor cassandra and then put it on graphana for
> dashboard.
> Whats the parameter to m3asure throughput  of cassandra?
>



-- 
Randy Lynn
rl...@getavail.com

office:
859.963.1616 <+1-859-963-1616> ext 202
163 East Main Street - Lexington, KY 40507 - USA

 getavail.com 


Re: C* in multiple AWS AZ's

2018-06-28 Thread Randy Lynn
So we have two data centers already running..

AP-SYDNEY, and US-EAST.. I'm using Ec2Snitch over a site-to-site tunnel..
I'm wanting to move the current US-EAST from AZ 1a to 1e..
I know all docs say use ec2multiregion for multi-DC.

I like the GPFS idea. would that work with the multi-DC too?
What's the downside? status would report rack of 1a, even though in 1e?

Thanks in advance for the help/thoughts!!


On Thu, Jun 28, 2018 at 6:20 PM, kurt greaves  wrote:

> There is a need for a repair with both DCs as rebuild will not stream all
> replicas, so unless you can guarantee you were perfectly consistent at time
> of rebuild you'll want to do a repair after rebuild.
>
> On another note you could just replace the nodes but use GPFS instead of
> EC2 snitch, using the same rack name.
>
> On Fri., 29 Jun. 2018, 00:19 Rahul Singh, 
> wrote:
>
>> Parallel load is the best approach and then switch your Data access code
>> to only access the new hardware. After you verify that there are no local
>> read / writes on the OLD dc and that the updates are only via Gossip, then
>> go ahead and change the replication factor on the key space to have zero
>> replicas in the old DC. Then you can decommissioned.
>>
>> This way you are hundred percent sure that you aren’t missing any new
>> data. No need for a DC to DC repair but a repair is always healthy.
>>
>> Rahul
>> On Jun 28, 2018, 9:15 AM -0500, Randy Lynn , wrote:
>>
>> Already running with Ec2.
>>
>> My original thought was a new DC parallel to the current, and then
>> decommission the other DC.
>>
>> Also my data load is small right now.. I know small is relative term..
>> each node is carrying about 6GB..
>>
>> So given the data size, would you go with parallel DC or let the new AZ
>> carry a heavy load until the others are migrated over?
>> and then I think "repair" to cleanup the replications?
>>
>>
>> On Thu, Jun 28, 2018 at 10:09 AM, Rahul Singh <
>> rahul.xavier.si...@gmail.com> wrote:
>>
>>> You don’t have to use EC2 snitch on AWS but if you have already started
>>> with it , it may put a node in a different DC.
>>>
>>> If your data density won’t be ridiculous You could add 3 to different
>>> DC/ Region and then sync up. After the new DC is operational you can remove
>>> one at a time on the old DC and at the same time add to the new one.
>>>
>>> Rahul
>>> On Jun 28, 2018, 9:03 AM -0500, Randy Lynn , wrote:
>>>
>>> I have a 6-node cluster I'm migrating to the new i3 types.
>>> But at the same time I want to migrate to a different AZ.
>>>
>>> What happens if I do the "running node replace method" with 1 node at a
>>> time moving to the new AZ. Meaning, I'll have temporarily;
>>>
>>> 5 nodes in AZ 1c
>>> 1 new node in AZ 1e.
>>>
>>> I'll wash-rinse-repeat till all 6 are on the new machine type and in the
>>> new AZ.
>>>
>>> Any thoughts about whether this gets weird with the Ec2Snitch and a RF 3?
>>>
>>> --
>>> Randy Lynn
>>> rl...@getavail.com
>>>
>>> office:
>>> 859.963.1616 <+1-859-963-1616> ext 202
>>> 163 East Main Street - Lexington, KY 40507 - USA
>>> 
>>>
>>>  getavail.com 
>>>
>>>
>>
>>
>> --
>> Randy Lynn
>> rl...@getavail.com
>>
>> office:
>> 859.963.1616 <+1-859-963-1616> ext 202
>> 163 East Main Street - Lexington, KY 40507 - USA
>> 
>>
>>  getavail.com 
>>
>>


-- 
Randy Lynn
rl...@getavail.com

office:
859.963.1616 <+1-859-963-1616> ext 202
163 East Main Street - Lexington, KY 40507 - USA

 getavail.com 


Re: C* in multiple AWS AZ's

2018-06-28 Thread kurt greaves
There is a need for a repair with both DCs as rebuild will not stream all
replicas, so unless you can guarantee you were perfectly consistent at time
of rebuild you'll want to do a repair after rebuild.

On another note you could just replace the nodes but use GPFS instead of
EC2 snitch, using the same rack name.

On Fri., 29 Jun. 2018, 00:19 Rahul Singh, 
wrote:

> Parallel load is the best approach and then switch your Data access code
> to only access the new hardware. After you verify that there are no local
> read / writes on the OLD dc and that the updates are only via Gossip, then
> go ahead and change the replication factor on the key space to have zero
> replicas in the old DC. Then you can decommissioned.
>
> This way you are hundred percent sure that you aren’t missing any new
> data. No need for a DC to DC repair but a repair is always healthy.
>
> Rahul
> On Jun 28, 2018, 9:15 AM -0500, Randy Lynn , wrote:
>
> Already running with Ec2.
>
> My original thought was a new DC parallel to the current, and then
> decommission the other DC.
>
> Also my data load is small right now.. I know small is relative term..
> each node is carrying about 6GB..
>
> So given the data size, would you go with parallel DC or let the new AZ
> carry a heavy load until the others are migrated over?
> and then I think "repair" to cleanup the replications?
>
>
> On Thu, Jun 28, 2018 at 10:09 AM, Rahul Singh <
> rahul.xavier.si...@gmail.com> wrote:
>
>> You don’t have to use EC2 snitch on AWS but if you have already started
>> with it , it may put a node in a different DC.
>>
>> If your data density won’t be ridiculous You could add 3 to different DC/
>> Region and then sync up. After the new DC is operational you can remove one
>> at a time on the old DC and at the same time add to the new one.
>>
>> Rahul
>> On Jun 28, 2018, 9:03 AM -0500, Randy Lynn , wrote:
>>
>> I have a 6-node cluster I'm migrating to the new i3 types.
>> But at the same time I want to migrate to a different AZ.
>>
>> What happens if I do the "running node replace method" with 1 node at a
>> time moving to the new AZ. Meaning, I'll have temporarily;
>>
>> 5 nodes in AZ 1c
>> 1 new node in AZ 1e.
>>
>> I'll wash-rinse-repeat till all 6 are on the new machine type and in the
>> new AZ.
>>
>> Any thoughts about whether this gets weird with the Ec2Snitch and a RF 3?
>>
>> --
>> Randy Lynn
>> rl...@getavail.com
>>
>> office:
>> 859.963.1616 <+1-859-963-1616> ext 202
>> 163 East Main Street - Lexington, KY 40507 - USA
>> 
>>
>>  getavail.com 
>>
>>
>
>
> --
> Randy Lynn
> rl...@getavail.com
>
> office:
> 859.963.1616 <+1-859-963-1616> ext 202
> 163 East Main Street - Lexington, KY 40507 - USA
>
>  getavail.com 
>
>


Re: JVM Heap erratic

2018-06-28 Thread Elliott Sims
Odd.  Your "post-GC" heap level seems a lot lower than your max, which
implies that you should be OK with ~10GB.  I'm guessing either you're
genuinely getting a huge surge in needed heap and running out, or it's
falling behind and garbage is building up.  If the latter, there might be
some tweaking you can do.  Probably worth turning on GC logging and digging
through exactly what's happening.

CMS is kind of hard to tune and can have problems with heap fragmentation
since it doesn't compact, but if it's working for you I'd say stick with it.

On Thu, Jun 28, 2018 at 3:14 PM, Randy Lynn  wrote:

> Thanks for the feedback..
>
> Getting tons of OOM lately..
>
> You mentioned overprovisioned heap size... well...
> tried 8GB = OOM
> tried 12GB = OOM
> tried 20GB w/ G1 = OOM (and long GC pauses usually over 2 secs)
> tried 20GB w/ CMS = running
>
> we're java 8 update 151.
> 3.11.1.
>
> We've got one table that's got a 400MB partition.. that's the max.. the
> 99th is < 100MB, and 95th < 30MB..
> So I'm not sure that I'm overprovisioned, I'm just not quite yet to the
> heap size based on our partition sizes.
> All queries use cluster key, so I'm not accidentally reading a whole
> partition.
> The last place I'm looking - which maybe should be the first - is
> tombstones.
>
> sorry for the afternoon rant! thanks for your eyes!
>
> On Thu, Jun 28, 2018 at 5:54 PM, Elliott Sims 
> wrote:
>
>> It depends a bit on which collector you're using, but fairly normal.
>> Heap grows for a while, then the JVM decides via a variety of metrics that
>> it's time to run a collection.  G1GC is usually a bit steadier and less
>> sawtooth than the Parallel Mark Sweep , but if your heap's a lot bigger
>> than needed I could see it producing that pattern.
>>
>> On Thu, Jun 28, 2018 at 9:23 AM, Randy Lynn  wrote:
>>
>>> I have datadog monitoring JVM heap.
>>>
>>> Running 3.11.1.
>>> 20GB heap
>>> G1 for GC.. all the G1GC settings are out-of-the-box
>>>
>>> Does this look normal?
>>>
>>> https://drive.google.com/file/d/1hLMbG53DWv5zNKSY88BmI3Wd0ic
>>> _KQ07/view?usp=sharing
>>>
>>> I'm a C# .NET guy, so I have no idea if this is normal Java behavior.
>>>
>>>
>>>
>>> --
>>> Randy Lynn
>>> rl...@getavail.com
>>>
>>> office:
>>> 859.963.1616 <+1-859-963-1616> ext 202
>>> 163 East Main Street - Lexington, KY 40507 - USA
>>> 
>>>
>>>  getavail.com 
>>>
>>
>>
>
>
> --
> Randy Lynn
> rl...@getavail.com
>
> office:
> 859.963.1616 <+1-859-963-1616> ext 202
> 163 East Main Street - Lexington, KY 40507 - USA
> 
>
>  getavail.com 
>


Cassandra read/sec and write/sec

2018-06-28 Thread Abdul Patel
Hi all

We use prometheus to monitor cassandra and then put it on graphana for
dashboard.
Whats the parameter to m3asure throughput  of cassandra?


Re: JVM Heap erratic

2018-06-28 Thread Randy Lynn
Thanks for the feedback..

Getting tons of OOM lately..

You mentioned overprovisioned heap size... well...
tried 8GB = OOM
tried 12GB = OOM
tried 20GB w/ G1 = OOM (and long GC pauses usually over 2 secs)
tried 20GB w/ CMS = running

we're java 8 update 151.
3.11.1.

We've got one table that's got a 400MB partition.. that's the max.. the
99th is < 100MB, and 95th < 30MB..
So I'm not sure that I'm overprovisioned, I'm just not quite yet to the
heap size based on our partition sizes.
All queries use cluster key, so I'm not accidentally reading a whole
partition.
The last place I'm looking - which maybe should be the first - is
tombstones.

sorry for the afternoon rant! thanks for your eyes!

On Thu, Jun 28, 2018 at 5:54 PM, Elliott Sims  wrote:

> It depends a bit on which collector you're using, but fairly normal.  Heap
> grows for a while, then the JVM decides via a variety of metrics that it's
> time to run a collection.  G1GC is usually a bit steadier and less sawtooth
> than the Parallel Mark Sweep , but if your heap's a lot bigger than needed
> I could see it producing that pattern.
>
> On Thu, Jun 28, 2018 at 9:23 AM, Randy Lynn  wrote:
>
>> I have datadog monitoring JVM heap.
>>
>> Running 3.11.1.
>> 20GB heap
>> G1 for GC.. all the G1GC settings are out-of-the-box
>>
>> Does this look normal?
>>
>> https://drive.google.com/file/d/1hLMbG53DWv5zNKSY88BmI3Wd0ic
>> _KQ07/view?usp=sharing
>>
>> I'm a C# .NET guy, so I have no idea if this is normal Java behavior.
>>
>>
>>
>> --
>> Randy Lynn
>> rl...@getavail.com
>>
>> office:
>> 859.963.1616 <+1-859-963-1616> ext 202
>> 163 East Main Street - Lexington, KY 40507 - USA
>> 
>>
>>  getavail.com 
>>
>
>


-- 
Randy Lynn
rl...@getavail.com

office:
859.963.1616 <+1-859-963-1616> ext 202
163 East Main Street - Lexington, KY 40507 - USA

 getavail.com 


Re: JVM Heap erratic

2018-06-28 Thread Elliott Sims
It depends a bit on which collector you're using, but fairly normal.  Heap
grows for a while, then the JVM decides via a variety of metrics that it's
time to run a collection.  G1GC is usually a bit steadier and less sawtooth
than the Parallel Mark Sweep , but if your heap's a lot bigger than needed
I could see it producing that pattern.

On Thu, Jun 28, 2018 at 9:23 AM, Randy Lynn  wrote:

> I have datadog monitoring JVM heap.
>
> Running 3.11.1.
> 20GB heap
> G1 for GC.. all the G1GC settings are out-of-the-box
>
> Does this look normal?
>
> https://drive.google.com/file/d/1hLMbG53DWv5zNKSY88BmI3Wd0ic_
> KQ07/view?usp=sharing
>
> I'm a C# .NET guy, so I have no idea if this is normal Java behavior.
>
>
>
> --
> Randy Lynn
> rl...@getavail.com
>
> office:
> 859.963.1616 <+1-859-963-1616> ext 202
> 163 East Main Street - Lexington, KY 40507 - USA
> 
>
>  getavail.com 
>


Re: Cassandra backup to alternate location

2018-06-28 Thread Jeff Jirsa
No - they'll hardlink into the snapshot folder on each data directory. They
are true hardlinks, so even if you could move it, it'd still be on the same
filesystem.

Typical behavior is to issue a snapshot, and then copy the data out as
needed (using something like https://github.com/JeremyGrosser/tablesnap ).

On Thu, Jun 28, 2018 at 10:00 AM, Lohchab, Sanjeev  wrote:

> Hi All,
>
>
>
> I am trying to backup Cassandra DB, but by default it is saving the
> snapshots in the default location.
>
> Is there any way we can specific the location where we want to store the
> snapshots.
>
>
>
> Regards
>
> Sanjeev
>


Cassandra backup to alternate location

2018-06-28 Thread Lohchab, Sanjeev
Hi All,

I am trying to backup Cassandra DB, but by default it is saving the snapshots 
in the default location.
Is there any way we can specific the location where we want to store the 
snapshots.

Regards
Sanjeev


JVM Heap erratic

2018-06-28 Thread Randy Lynn
I have datadog monitoring JVM heap.

Running 3.11.1.
20GB heap
G1 for GC.. all the G1GC settings are out-of-the-box

Does this look normal?

https://drive.google.com/file/d/1hLMbG53DWv5zNKSY88BmI3Wd0ic_KQ07/view?usp=sharing

I'm a C# .NET guy, so I have no idea if this is normal Java behavior.



-- 
Randy Lynn
rl...@getavail.com

office:
859.963.1616 <+1-859-963-1616> ext 202
163 East Main Street - Lexington, KY 40507 - USA

 getavail.com 


Check Cluster Health

2018-06-28 Thread Thouraya TH
Hi,

Please, how can check the health of my cluster / data center using
cassandra ?
In fact i'd like to generate a hitory of the state of each node. an history
about the failure of my cluster ( 20% of failure in a day, 40% of failure
in a day etc...)

Thank you so much.
Kind regards.


Re: Problem to activate mode DEBUG to see the slow queries

2018-06-28 Thread Nitan Kainth
You can also enable traceprobability: /opt/cassandra/bin/nodetool
settraceprobability 1
It will populate system_traces keyspace where you can see details on queries

On Thu, Jun 28, 2018 at 5:32 AM, Jean Carlo 
wrote:

> Thank you ahmed!
>
>
> Saludos
>
> Jean Carlo
>
> "The best way to predict the future is to invent it" Alan Kay
>
> On Thu, Jun 28, 2018 at 11:21 AM, Ahmed Eljami 
> wrote:
>
>> ​Hello Jean Carlo,
>>
>> To activate Debug mode,  you should edit "logback.xml " not
>> "log4j-server.properties"
>>
>>
>> Ahmed.
>>
>
>


Re: C* in multiple AWS AZ's

2018-06-28 Thread Rahul Singh
Parallel load is the best approach and then switch your Data access code to 
only access the new hardware. After you verify that there are no local read / 
writes on the OLD dc and that the updates are only via Gossip, then go ahead 
and change the replication factor on the key space to have zero replicas in the 
old DC. Then you can decommissioned.

This way you are hundred percent sure that you aren’t missing any new data. No 
need for a DC to DC repair but a repair is always healthy.

Rahul
On Jun 28, 2018, 9:15 AM -0500, Randy Lynn , wrote:
> Already running with Ec2.
>
> My original thought was a new DC parallel to the current, and then 
> decommission the other DC.
>
> Also my data load is small right now.. I know small is relative term.. each 
> node is carrying about 6GB..
>
> So given the data size, would you go with parallel DC or let the new AZ carry 
> a heavy load until the others are migrated over?
> and then I think "repair" to cleanup the replications?
>
>
> > On Thu, Jun 28, 2018 at 10:09 AM, Rahul Singh 
> >  wrote:
> > > You don’t have to use EC2 snitch on AWS but if you have already started 
> > > with it , it may put a node in a different DC.
> > >
> > > If your data density won’t be ridiculous You could add 3 to different DC/ 
> > > Region and then sync up. After the new DC is operational you can remove 
> > > one at a time on the old DC and at the same time add to the new one.
> > >
> > > Rahul
> > > On Jun 28, 2018, 9:03 AM -0500, Randy Lynn , wrote:
> > > > I have a 6-node cluster I'm migrating to the new i3 types.
> > > > But at the same time I want to migrate to a different AZ.
> > > >
> > > > What happens if I do the "running node replace method" with 1 node at a 
> > > > time moving to the new AZ. Meaning, I'll have temporarily;
> > > >
> > > > 5 nodes in AZ 1c
> > > > 1 new node in AZ 1e.
> > > >
> > > > I'll wash-rinse-repeat till all 6 are on the new machine type and in 
> > > > the new AZ.
> > > >
> > > > Any thoughts about whether this gets weird with the Ec2Snitch and a RF 
> > > > 3?
> > > >
> > > > --
> > > > Randy Lynn
> > > > rl...@getavail.com
> > > >
> > > > office:
> > > > 859.963.1616 ext 202
> > > > 163 East Main Street - Lexington, KY 40507 - USA
> > > >
> > > > getavail.com
>
>
>
> --
> Randy Lynn
> rl...@getavail.com
>
> office:
> 859.963.1616 ext 202
> 163 East Main Street - Lexington, KY 40507 - USA
>
> getavail.com


Re: C* in multiple AWS AZ's

2018-06-28 Thread Randy Lynn
Already running with Ec2.

My original thought was a new DC parallel to the current, and then
decommission the other DC.

Also my data load is small right now.. I know small is relative term.. each
node is carrying about 6GB..

So given the data size, would you go with parallel DC or let the new AZ
carry a heavy load until the others are migrated over?
and then I think "repair" to cleanup the replications?


On Thu, Jun 28, 2018 at 10:09 AM, Rahul Singh 
wrote:

> You don’t have to use EC2 snitch on AWS but if you have already started
> with it , it may put a node in a different DC.
>
> If your data density won’t be ridiculous You could add 3 to different DC/
> Region and then sync up. After the new DC is operational you can remove one
> at a time on the old DC and at the same time add to the new one.
>
> Rahul
> On Jun 28, 2018, 9:03 AM -0500, Randy Lynn , wrote:
>
> I have a 6-node cluster I'm migrating to the new i3 types.
> But at the same time I want to migrate to a different AZ.
>
> What happens if I do the "running node replace method" with 1 node at a
> time moving to the new AZ. Meaning, I'll have temporarily;
>
> 5 nodes in AZ 1c
> 1 new node in AZ 1e.
>
> I'll wash-rinse-repeat till all 6 are on the new machine type and in the
> new AZ.
>
> Any thoughts about whether this gets weird with the Ec2Snitch and a RF 3?
>
> --
> Randy Lynn
> rl...@getavail.com
>
> office:
> 859.963.1616 <+1-859-963-1616> ext 202
> 163 East Main Street - Lexington, KY 40507 - USA
> 
>
>  getavail.com 
>
>


-- 
Randy Lynn
rl...@getavail.com

office:
859.963.1616 <+1-859-963-1616> ext 202
163 East Main Street - Lexington, KY 40507 - USA

 getavail.com 


Re: C* in multiple AWS AZ's

2018-06-28 Thread Rahul Singh
You don’t have to use EC2 snitch on AWS but if you have already started with it 
, it may put a node in a different DC.

If your data density won’t be ridiculous You could add 3 to different DC/ 
Region and then sync up. After the new DC is operational you can remove one at 
a time on the old DC and at the same time add to the new one.

Rahul
On Jun 28, 2018, 9:03 AM -0500, Randy Lynn , wrote:
> I have a 6-node cluster I'm migrating to the new i3 types.
> But at the same time I want to migrate to a different AZ.
>
> What happens if I do the "running node replace method" with 1 node at a time 
> moving to the new AZ. Meaning, I'll have temporarily;
>
> 5 nodes in AZ 1c
> 1 new node in AZ 1e.
>
> I'll wash-rinse-repeat till all 6 are on the new machine type and in the new 
> AZ.
>
> Any thoughts about whether this gets weird with the Ec2Snitch and a RF 3?
>
> --
> Randy Lynn
> rl...@getavail.com
>
> office:
> 859.963.1616 ext 202
> 163 East Main Street - Lexington, KY 40507 - USA
>
> getavail.com


Re: C* in multiple AWS AZ's

2018-06-28 Thread Jeff Jirsa
The single node in 1e will be a replica for every range (and you won’t be able 
to tolerate an outage in 1c), potentially putting it under significant load

-- 
Jeff Jirsa


> On Jun 28, 2018, at 7:02 AM, Randy Lynn  wrote:
> 
> I have a 6-node cluster I'm migrating to the new i3 types.
> But at the same time I want to migrate to a different AZ.
> 
> What happens if I do the "running node replace method" with 1 node at a time 
> moving to the new AZ. Meaning, I'll have temporarily;
> 
> 5 nodes in AZ 1c
> 1 new node in AZ 1e.
> 
> I'll wash-rinse-repeat till all 6 are on the new machine type and in the new 
> AZ.
> 
> Any thoughts about whether this gets weird with the Ec2Snitch and a RF 3?
> 
> -- 
> Randy Lynn 
> rl...@getavail.com 
> 
> office: 
> 859.963.1616 ext 202 
> 163 East Main Street - Lexington, KY 40507 - USA 
> 
>   getavail.com


C* in multiple AWS AZ's

2018-06-28 Thread Randy Lynn
I have a 6-node cluster I'm migrating to the new i3 types.
But at the same time I want to migrate to a different AZ.

What happens if I do the "running node replace method" with 1 node at a
time moving to the new AZ. Meaning, I'll have temporarily;

5 nodes in AZ 1c
1 new node in AZ 1e.

I'll wash-rinse-repeat till all 6 are on the new machine type and in the
new AZ.

Any thoughts about whether this gets weird with the Ec2Snitch and a RF 3?

-- 
Randy Lynn
rl...@getavail.com

office:
859.963.1616 <+1-859-963-1616> ext 202
163 East Main Street - Lexington, KY 40507 - USA

 getavail.com 


Re: Problem to activate mode DEBUG to see the slow queries

2018-06-28 Thread Jean Carlo
Thank you ahmed!


Saludos

Jean Carlo

"The best way to predict the future is to invent it" Alan Kay

On Thu, Jun 28, 2018 at 11:21 AM, Ahmed Eljami 
wrote:

> ​Hello Jean Carlo,
>
> To activate Debug mode,  you should edit "logback.xml " not
> "log4j-server.properties"
>
>
> Ahmed.
>


Re: Problem to activate mode DEBUG to see the slow queries

2018-06-28 Thread Ahmed Eljami
​Hello Jean Carlo,

To activate Debug mode,  you should edit "logback.xml " not
"log4j-server.properties"


Ahmed.


Problem to activate mode DEBUG to see the slow queries

2018-06-28 Thread Jean Carlo
hello,

concerning to slow queries
https://issues.apache.org/jira/browse/CASSANDRA-12403

how do I activate the logging debug for the slow queries ?

I tried with nodetool setlogging org.apache.cassandra.db.monitoring DEBUG
and also with log4j-server.properties and I did not get the expected result

Thx

Jean Carlo

"The best way to predict the future is to invent it" Alan Kay


Re:Re: Re: Re: stream failed when bootstrap

2018-06-28 Thread dayu
Only do rolling restart can't solve problem. But I thought I found a way.


There are two same name folder with different suffix in data dictionary. e.g. 
dayu_123 and dayu_234, the dayu_123 folder is empty and dayu_234 folder is not.
then I use cql to query system_schema.tables,the id of table named 'dayu' is 
123。
because the data in table 'dayu' can be re-generate. I simplly remove dayu_234 
folder when rolling restart.


Now when running nodetool resetlocalschema, there is no error. and I am 
bootstrap new node again. I will see is there any other problem left.


I hope my experience can help others.


Thanks for all your help!
Dayu






At 2018-06-28 13:53:16, "kurt greaves"  wrote:

Yeah, but you only really need to drain, restart Cassandra one by one. Not that 
the others will hurt, but they aren't strictly necessary. 


On 28 June 2018 at 05:38, dayu  wrote:

Hi kurt, a rolling restart means run disablebinary, disablethrift, 
disablegossip, drain, stop cassandra and start cassandra command one by one, 
right?
Only one node is executed at a time


Dayu




At 2018-06-28 11:37:43, "kurt greaves"  wrote:

Best off trying a rolling restart.


On 28 June 2018 at 03:18, dayu  wrote:

the output of nodetool describecluster
Cluster Information:
Name: online-xxx
Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Schema versions:
c3f00d61-1ad7-3702-8703-af2a29e401c1: [10.136.71.43]


0568e8c1-48ba-3fb0-bb3c-462438978d7b: [10.136.71.33, ]


after I run nodetool resetlocalschema, a error log outcome


ERROR [InternalResponseStage:209417] 2018-06-28 11:14:12,904 
MigrationTask.java:96 - Configuration
exception merging remote schema
org.apache.cassandra.exceptions.ConfigurationException: Column family ID 
mismatch (found 5552bba0-2
dc6-11e8-9b5c-254242d97235; expected 53f6d520-2dc6-11e8-948d-ab7caa3c8c36)
at 
org.apache.cassandra.config.CFMetaData.validateCompatibility(CFMetaData.java:790)
 ~[apac
he-cassandra-3.0.10.jar:3.0.10]
at org.apache.cassandra.config.CFMetaData.apply(CFMetaData.java:750) 
~[apache-cassandra-3.0
.10.jar:3.0.10]
at org.apache.cassandra.config.Schema.updateTable(Schema.java:661) 
~[apache-cassandra-3.0.1
0.jar:3.0.10]
at 
org.apache.cassandra.schema.SchemaKeyspace.updateKeyspace(SchemaKeyspace.java:1348)
 ~[ap
ache-cassandra-3.0.10.jar:3.0.10]






At 2018-06-28 10:01:52, "Jeff Jirsa"  wrote:
You can sometimes bounce your way through it (or use nodetool resetlocalschema 
if it’s a single node that’s wrong), but there are some edge cases from which 
it’s very hard to recover


What’s the output of nodetool describecluster?


If you select from the schema tables, do you see that CFID on any real tables? 


-- 
Jeff Jirsa



On Jun 27, 2018, at 7:58 PM, dayu  wrote:


That sound reasonable, I have seen schema mismatch error before.
So any advise to deal with schema mismatches?
Dayu


At 2018-06-28 09:50:37, "Jeff Jirsa"  wrote:
>That log message says you did:
>
> CF 53f6d520-2dc6-11e8-948d-ab7caa3c8c36 was dropped during streaming
>
>If you’re absolutely sure you didn’t, you should look for schema mismatches in 
>your cluster
>
>
>-- 
>Jeff Jirsa
>
>
>> On Jun 27, 2018, at 7:49 PM, dayu  wrote:
>> 
>> CF 53f6d520-2dc6-11e8-948d-ab7caa3c8c36 was dropped during streaming
>
>-
>To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>For additional commands, e-mail: user-h...@cassandra.apache.org




 





 







 




Re: Re: Re: stream failed when bootstrap

2018-06-28 Thread kurt greaves
Yeah, but you only really need to drain, restart Cassandra one by one. Not
that the others will hurt, but they aren't strictly necessary.

On 28 June 2018 at 05:38, dayu  wrote:

> Hi kurt, a rolling restart means run disablebinary, disablethrift, 
> disablegossip, drain,
> stop cassandra and start cassandra command one by one, right?
> Only one node is executed at a time
>
> Dayu
>
>
>
> At 2018-06-28 11:37:43, "kurt greaves"  wrote:
>
> Best off trying a rolling restart.
>
> On 28 June 2018 at 03:18, dayu  wrote:
>
>> the output of nodetool describecluster
>> Cluster Information:
>> Name: online-xxx
>> Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch
>> Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
>> Schema versions:
>> c3f00d61-1ad7-3702-8703-af2a29e401c1: [10.136.71.43]
>>
>> 0568e8c1-48ba-3fb0-bb3c-462438978d7b: [10.136.71.33, ]
>>
>> after I run nodetool resetlocalschema, a error log outcome
>>
>> ERROR [InternalResponseStage:209417] 2018-06-28 11:14:12,904
>> MigrationTask.java:96 - Configuration
>> exception merging remote schema
>> org.apache.cassandra.exceptions.ConfigurationException: Column family ID
>> mismatch (found 5552bba0-2
>> dc6-11e8-9b5c-254242d97235; expected 53f6d520-2dc6-11e8-948d-ab7caa
>> 3c8c36)
>> at 
>> org.apache.cassandra.config.CFMetaData.validateCompatibility(CFMetaData.java:790)
>> ~[apac
>> he-cassandra-3.0.10.jar:3.0.10]
>> at org.apache.cassandra.config.CFMetaData.apply(CFMetaData.java:750)
>> ~[apache-cassandra-3.0
>> .10.jar:3.0.10]
>> at org.apache.cassandra.config.Schema.updateTable(Schema.java:661)
>> ~[apache-cassandra-3.0.1
>> 0.jar:3.0.10]
>> at 
>> org.apache.cassandra.schema.SchemaKeyspace.updateKeyspace(SchemaKeyspace.java:1348)
>> ~[ap
>> ache-cassandra-3.0.10.jar:3.0.10]
>>
>>
>>
>>
>>
>> At 2018-06-28 10:01:52, "Jeff Jirsa"  wrote:
>>
>> You can sometimes bounce your way through it (or use nodetool
>> resetlocalschema if it’s a single node that’s wrong), but there are some
>> edge cases from which it’s very hard to recover
>>
>> What’s the output of nodetool describecluster?
>>
>> If you select from the schema tables, do you see that CFID on any real
>> tables?
>>
>> --
>> Jeff Jirsa
>>
>>
>> On Jun 27, 2018, at 7:58 PM, dayu  wrote:
>>
>> That sound reasonable, I have seen schema mismatch error before.
>> So any advise to deal with schema mismatches?
>>
>> Dayu
>>
>> At 2018-06-28 09:50:37, "Jeff Jirsa"  wrote:
>> >That log message says you did:
>> >
>> > CF 53f6d520-2dc6-11e8-948d-ab7caa3c8c36 was dropped during streaming
>> >
>> >If you’re absolutely sure you didn’t, you should look for schema mismatches 
>> >in your cluster
>> >
>> >
>> >--
>> >Jeff Jirsa
>> >
>> >
>> >> On Jun 27, 2018, at 7:49 PM, dayu  wrote:
>> >>
>> >> CF 53f6d520-2dc6-11e8-948d-ab7caa3c8c36 was dropped during streaming
>> >
>> >-
>> >To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> >For additional commands, e-mail: user-h...@cassandra.apache.org
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
>
>
>
>