Re: Dropped messages on random nodes.

2017-01-23 Thread Brandon Williams
The lion's share of your drops are from cross-node timeouts, which require
clock synchronization, so check that first.  If your clocks are synced,
that means not only are you showing eager dropping based on time, but
despite the eager dropping you are still facing overload.

That local, non-gc pause is also troubling. (I assume non-gc since there
wasn't anything logged by the GC inspector.)

On Mon, Jan 23, 2017 at 12:36 AM, Dikang Gu  wrote:

> Hello there,
>
> We have a 100 nodes ish cluster, I find that there are dropped messages on
> random nodes in the cluster, which caused error spikes and P99 latency
> spikes as well.
>
> I tried to figure out the cause. I do not see any obvious bottleneck in
> the cluster, the C* nodes still have plenty of cpu idle/disk io. But I do
> see some suspicious gossip events around that time, not sure if it's
> related.
>
> 2017-01-21_16:43:56.71033 WARN  16:43:56 [GossipTasks:1]: Not marking
> nodes down due to local pause of 13079498815 > 50
> 2017-01-21_16:43:56.85532 INFO  16:43:56 [ScheduledTasks:1]: MUTATION
> messages were dropped in last 5000 ms: 65 for internal timeout and 10895
> for cross node timeout
> 2017-01-21_16:43:56.85533 INFO  16:43:56 [ScheduledTasks:1]: READ messages
> were dropped in last 5000 ms: 33 for internal timeout and 7867 for cross
> node timeout
> 2017-01-21_16:43:56.85534 INFO  16:43:56 [ScheduledTasks:1]: Pool Name
>Active   Pending  Completed   Blocked  All Time Blocked
> 2017-01-21_16:43:56.85534 INFO  16:43:56 [ScheduledTasks:1]: MutationStage
>   128 47794 1015525068 0 0
> 2017-01-21_16:43:56.85535
> 2017-01-21_16:43:56.85535 INFO  16:43:56 [ScheduledTasks:1]: ReadStage
>64 20202  450508940 0 0
>
> Any suggestions?
>
> Thanks!
>
> --
> Dikang
>
>


Re: Dropped messages on random nodes.

2017-01-23 Thread Roopa Tangirala
Dikang,

Did you take a look at the heap health on those nodes? A quick heap
histogram or dump would help you figure out if it is related to data
issue(wide rows, or bad model)  where few nodes may be coming under heap
pressure and dropping messages.

Thanks,
Roopa



*Regards,*

*Roopa Tangirala*

Engineering Manager CDE

*(408) 438-3156 - mobile*





On Mon, Jan 23, 2017 at 4:55 PM, Blake Eggleston 
wrote:

> Hi Dikang,
>
> Do you have any GC logging or metrics you can correlate with the dropped
> messages? A 13 second pause sounds like a bad GC pause.
>
> Thanks,
>
> Blake
>
>
> On January 22, 2017 at 10:37:22 PM, Dikang Gu (dikan...@gmail.com) wrote:
>
> Btw, the C* version is 2.2.5, with several backported patches.
>
> On Sun, Jan 22, 2017 at 10:36 PM, Dikang Gu  wrote:
>
> > Hello there,
> >
> > We have a 100 nodes ish cluster, I find that there are dropped messages
> on
> > random nodes in the cluster, which caused error spikes and P99 latency
> > spikes as well.
> >
> > I tried to figure out the cause. I do not see any obvious bottleneck in
> > the cluster, the C* nodes still have plenty of cpu idle/disk io. But I do
> > see some suspicious gossip events around that time, not sure if it's
> > related.
> >
> > 2017-01-21_16:43:56.71033 WARN 16:43:56 [GossipTasks:1]: Not marking
> > nodes down due to local pause of 13079498815 > 50
> > 2017-01-21_16:43:56.85532 INFO 16:43:56 [ScheduledTasks:1]: MUTATION
> > messages were dropped in last 5000 ms: 65 for internal timeout and 10895
> > for cross node timeout
> > 2017-01-21_16:43:56.85533 INFO 16:43:56 [ScheduledTasks:1]: READ messages
> > were dropped in last 5000 ms: 33 for internal timeout and 7867 for cross
> > node timeout
> > 2017-01-21_16:43:56.85534 INFO 16:43:56 [ScheduledTasks:1]: Pool Name
> > Active Pending Completed Blocked All Time Blocked
> > 2017-01-21_16:43:56.85534 INFO 16:43:56 [ScheduledTasks:1]: MutationStage
> > 128 47794 1015525068 0 0
> > 2017-01-21_16:43:56.85535
> > 2017-01-21_16:43:56.85535 INFO 16:43:56 [ScheduledTasks:1]: ReadStage
> > 64 20202 450508940 0 0
> >
> > Any suggestions?
> >
> > Thanks!
> >
> > --
> > Dikang
> >
> >
>
>
> --
> Dikang
>


Re: Dropped messages on random nodes.

2017-01-23 Thread Blake Eggleston
Hi Dikang,

Do you have any GC logging or metrics you can correlate with the dropped 
messages? A 13 second pause sounds like a bad GC pause.

Thanks,

Blake


On January 22, 2017 at 10:37:22 PM, Dikang Gu (dikan...@gmail.com) wrote:

Btw, the C* version is 2.2.5, with several backported patches. 

On Sun, Jan 22, 2017 at 10:36 PM, Dikang Gu  wrote: 

> Hello there, 
> 
> We have a 100 nodes ish cluster, I find that there are dropped messages on 
> random nodes in the cluster, which caused error spikes and P99 latency 
> spikes as well. 
> 
> I tried to figure out the cause. I do not see any obvious bottleneck in 
> the cluster, the C* nodes still have plenty of cpu idle/disk io. But I do 
> see some suspicious gossip events around that time, not sure if it's 
> related. 
> 
> 2017-01-21_16:43:56.71033 WARN 16:43:56 [GossipTasks:1]: Not marking 
> nodes down due to local pause of 13079498815 > 50 
> 2017-01-21_16:43:56.85532 INFO 16:43:56 [ScheduledTasks:1]: MUTATION 
> messages were dropped in last 5000 ms: 65 for internal timeout and 10895 
> for cross node timeout 
> 2017-01-21_16:43:56.85533 INFO 16:43:56 [ScheduledTasks:1]: READ messages 
> were dropped in last 5000 ms: 33 for internal timeout and 7867 for cross 
> node timeout 
> 2017-01-21_16:43:56.85534 INFO 16:43:56 [ScheduledTasks:1]: Pool Name 
> Active Pending Completed Blocked All Time Blocked 
> 2017-01-21_16:43:56.85534 INFO 16:43:56 [ScheduledTasks:1]: MutationStage 
> 128 47794 1015525068 0 0 
> 2017-01-21_16:43:56.85535 
> 2017-01-21_16:43:56.85535 INFO 16:43:56 [ScheduledTasks:1]: ReadStage 
> 64 20202 450508940 0 0 
> 
> Any suggestions? 
> 
> Thanks! 
> 
> -- 
> Dikang 
> 
> 


-- 
Dikang 


Re: WriteTimeoutException when doing paralel DELETE IF EXISTS

2017-01-23 Thread Blake Eggleston
Hi Jaroslav,

That's pretty much expected behavior for the current LWT implementation, which 
has problems with key contention (the usage pattern you're describing here). 
Typically, you want to avoid having multiple clients doing LWT operations on 
the same partition key at the same time.

Thanks,

Blake
On January 20, 2017 at 4:25:05 AM, Jaroslav KamenĂ­k (jaros...@kamenik.cz) wrote:

Hi,  

I would like to ask here before posting new bug. I am trying to make a  
simple system  
for distribution preallocated tickets between concurrent clients using C*  
LWTs.  
It is simply one partition containing tickets for one domain, client reads  
the first one  
and tries to delete it conditionally, success = it owns it, fail = try  
again..  

It works well, but it starts to fail with WTEs under load. So I tried to  
make simple  
test with 16 concurrent threads competing for one row with 1000 cols. It  
was running  
on cluster with 5 C* 3.0.9 with default configuration, replication factor  
3.  

Surprisingly, it failed immediately after few requests. It takes longer  
time with  
less threads, but even 2 clients are enough to crash it.  

I am wondering, if it Is problem in Cassandra or normal behaviour or bad  
use of LWT?  

Thanks,  

Jaroslav  


Re: [VOTE] Release Apache Cassandra 3.10 (Take 4)

2017-01-23 Thread Michael Shuler
This vote is being failed for CASSANDRA-13058 (committed after tentative
tag) and CASSANDRA-13025 (patch available).

Vote count was 5 binding +1, 1 binding -1, and one non-binding -1.

I'll re-roll a "Take 5" when CASSANDRA-13025 gets committed, tests
appear stable, and we'll try again.

-- 
Kind regards,
Michael

On 01/13/2017 06:46 PM, Michael Shuler wrote:
> I propose the following artifacts for release as 3.10.
> 
> sha1: 9c2ab25556fad06a6a4d58f4bb652719a8a1bc27
> Git:
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/3.10-tentative
> Artifacts:
> https://repository.apache.org/content/repositories/orgapachecassandra-1136/org/apache/cassandra/apache-cassandra/3.10/
> Staging repository:
> https://repository.apache.org/content/repositories/orgapachecassandra-1136/
> 
> The Debian packages are available here: http://people.apache.org/~mshuler
> 
> The vote will be open for 72 hours (longer if needed).
> 
> [1]: (CHANGES.txt) https://goo.gl/WaAEVn
> [2]: (NEWS.txt) https://goo.gl/7deAsG
> 
> All of the unit tests passed and the main dtest job passed.
> 
> https://cassci.datastax.com/job/cassandra-3.11_testall/47/
> https://cassci.datastax.com/job/cassandra-3.11_utest/55/
> https://cassci.datastax.com/job/cassandra-3.11_utest_cdc/25/
> https://cassci.datastax.com/job/cassandra-3.11_utest_compression/23/
> https://cassci.datastax.com/job/cassandra-3.11_dtest/31/
> 




signature.asc
Description: OpenPGP digital signature


Re: [VOTE] Release Apache Cassandra 3.10 (Take 4)

2017-01-23 Thread Nate McCall
Indeed I conflated the two - thanks Sylvain.

On Mon, Jan 23, 2017 at 11:19 PM, Sylvain Lebresne  wrote:
> On Mon, Jan 23, 2017 at 2:31 AM, Nate McCall  wrote:
>
>> What was the resolution on this?
>>
>> Looks like we resolved/Fixed CASSANDRA-13058. Can we re-roll and go again?
>>
>
> As I mentioned, CASSANDRA-13025 is also a regression and should be fix
> before we re-roll. It's ready for review if someone's interested.
>
>
>>
>> On Tue, Jan 17, 2017 at 4:26 AM, Sylvain Lebresne 
>> wrote:
>> > I'm a bit sorry about it, but I'm kind of -1 on the account of
>> > https://issues.apache.org/jira/browse/CASSANDRA-13025. It's a genuine
>> > regression during upgrade that we should really fix before it's released
>> in
>> > the wild. I apologize for not having bump the priority on this ticket
>> > sooner but I think we need the fix in.
>> >
>> > On Mon, Jan 16, 2017 at 2:25 AM, Paulo Motta 
>> > wrote:
>> >
>> >> -1 since CASSANDRA-13058
>> >>  introduces a
>> >> regression that prevents successful decommission when the
>> decommissioning
>> >> node has hints to transfer. While this is relatively minor and there is
>> a
>> >> workaround (force hint replay before decommission), there is already a
>> >> patch available so I committed this to cassandra-3.11 and upper
>> branches so
>> >> we will also have a green testboard for cassandra-3.11_novnode_dtest
>> >> > >> cassandra-3.11_novnode_dtest/>
>> >> .
>> >>
>> >> If there are no objections on getting this in, can you re-roll this once
>> >> again Michael? Sorry for the late update on this, I had other things on
>> my
>> >> plate and could only get to this now.
>> >>
>> >> 2017-01-15 10:48 GMT-02:00 Aleksey Yeschenko :
>> >>
>> >> > +1
>> >> >
>> >> > --
>> >> > AY
>> >> >
>> >> > On 14 January 2017 at 00:47:08, Michael Shuler (
>> mich...@pbandjelly.org)
>> >> > wrote:
>> >> >
>> >> > I propose the following artifacts for release as 3.10.
>> >> >
>> >> > sha1: 9c2ab25556fad06a6a4d58f4bb652719a8a1bc27
>> >> > Git:
>> >> > http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=
>> >> > shortlog;h=refs/tags/3.10-tentative
>> >> > Artifacts:
>> >> > https://repository.apache.org/content/repositories/
>> >> > orgapachecassandra-1136/org/apache/cassandra/apache-cassandra/3.10/
>> >> > Staging repository:
>> >> > https://repository.apache.org/content/repositories/
>> >> > orgapachecassandra-1136/
>> >> >
>> >> > The Debian packages are available here: http://people.apache.org/~
>> >> mshuler
>> >> >
>> >> > The vote will be open for 72 hours (longer if needed).
>> >> >
>> >> > [1]: (CHANGES.txt) https://goo.gl/WaAEVn
>> >> > [2]: (NEWS.txt) https://goo.gl/7deAsG
>> >> >
>> >> > All of the unit tests passed and the main dtest job passed.
>> >> >
>> >> > https://cassci.datastax.com/job/cassandra-3.11_testall/47/
>> >> > https://cassci.datastax.com/job/cassandra-3.11_utest/55/
>> >> > https://cassci.datastax.com/job/cassandra-3.11_utest_cdc/25/
>> >> > https://cassci.datastax.com/job/cassandra-3.11_utest_compression/23/
>> >> > https://cassci.datastax.com/job/cassandra-3.11_dtest/31/
>> >> >
>> >> > --
>> >> > Kind regards,
>> >> > Michael Shuler
>> >> >
>> >> >
>> >>
>>


Re: [VOTE] Release Apache Cassandra 3.10 (Take 4)

2017-01-23 Thread Sylvain Lebresne
On Mon, Jan 23, 2017 at 2:31 AM, Nate McCall  wrote:

> What was the resolution on this?
>
> Looks like we resolved/Fixed CASSANDRA-13058. Can we re-roll and go again?
>

As I mentioned, CASSANDRA-13025 is also a regression and should be fix
before we re-roll. It's ready for review if someone's interested.


>
> On Tue, Jan 17, 2017 at 4:26 AM, Sylvain Lebresne 
> wrote:
> > I'm a bit sorry about it, but I'm kind of -1 on the account of
> > https://issues.apache.org/jira/browse/CASSANDRA-13025. It's a genuine
> > regression during upgrade that we should really fix before it's released
> in
> > the wild. I apologize for not having bump the priority on this ticket
> > sooner but I think we need the fix in.
> >
> > On Mon, Jan 16, 2017 at 2:25 AM, Paulo Motta 
> > wrote:
> >
> >> -1 since CASSANDRA-13058
> >>  introduces a
> >> regression that prevents successful decommission when the
> decommissioning
> >> node has hints to transfer. While this is relatively minor and there is
> a
> >> workaround (force hint replay before decommission), there is already a
> >> patch available so I committed this to cassandra-3.11 and upper
> branches so
> >> we will also have a green testboard for cassandra-3.11_novnode_dtest
> >>  >> cassandra-3.11_novnode_dtest/>
> >> .
> >>
> >> If there are no objections on getting this in, can you re-roll this once
> >> again Michael? Sorry for the late update on this, I had other things on
> my
> >> plate and could only get to this now.
> >>
> >> 2017-01-15 10:48 GMT-02:00 Aleksey Yeschenko :
> >>
> >> > +1
> >> >
> >> > --
> >> > AY
> >> >
> >> > On 14 January 2017 at 00:47:08, Michael Shuler (
> mich...@pbandjelly.org)
> >> > wrote:
> >> >
> >> > I propose the following artifacts for release as 3.10.
> >> >
> >> > sha1: 9c2ab25556fad06a6a4d58f4bb652719a8a1bc27
> >> > Git:
> >> > http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=
> >> > shortlog;h=refs/tags/3.10-tentative
> >> > Artifacts:
> >> > https://repository.apache.org/content/repositories/
> >> > orgapachecassandra-1136/org/apache/cassandra/apache-cassandra/3.10/
> >> > Staging repository:
> >> > https://repository.apache.org/content/repositories/
> >> > orgapachecassandra-1136/
> >> >
> >> > The Debian packages are available here: http://people.apache.org/~
> >> mshuler
> >> >
> >> > The vote will be open for 72 hours (longer if needed).
> >> >
> >> > [1]: (CHANGES.txt) https://goo.gl/WaAEVn
> >> > [2]: (NEWS.txt) https://goo.gl/7deAsG
> >> >
> >> > All of the unit tests passed and the main dtest job passed.
> >> >
> >> > https://cassci.datastax.com/job/cassandra-3.11_testall/47/
> >> > https://cassci.datastax.com/job/cassandra-3.11_utest/55/
> >> > https://cassci.datastax.com/job/cassandra-3.11_utest_cdc/25/
> >> > https://cassci.datastax.com/job/cassandra-3.11_utest_compression/23/
> >> > https://cassci.datastax.com/job/cassandra-3.11_dtest/31/
> >> >
> >> > --
> >> > Kind regards,
> >> > Michael Shuler
> >> >
> >> >
> >>
>