[jira] [Commented] (CASSANDRA-11070) Dispatcher.Flusher's control has duplicated/conflict control

2016-01-27 Thread fujian (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15120485#comment-15120485
 ] 

fujian commented on CASSANDRA-11070:


thanks for your replay and total agree with you. after all, as you talked "it 
has been fairly thoroughly tested" and no any research can conclude remove 
donework condition is benefit for performance.

> Dispatcher.Flusher's control has duplicated/conflict control
> 
>
> Key: CASSANDRA-11070
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11070
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
>Reporter: fujian
>  Labels: performance
> Attachments: 0001-fix-CASSANDRA-11070.patch
>
>
> org.apache.cassandra.transport.Message.Dispatcher.Flusher
> remove duplicated control for flush message control …
> Motivation:
> the !doneWork's control is duplicated and confused with runsSinceFlush > 2
> if on the first run:the queue size is 20
> donework will be set to true and not do flush due to the size<50 and 
> runsSinceFlush<2.
> if on the second run. the queue size is 0,
> donework will be reset to false and not set to true due to no new items in 
> queue, but the flush will be triggered due to:
>   if (!doneWork || runsSinceFlush > 2 || flushed.size() > 50)
> now the runsSinceFlush is 2. so in actual, its function is similar with 
> runsSinceFlush>1.
> so it is no need to keep it so that the code is confused and duplicated.
>   
> Modifications:
> remove it
> Result:
> after remove it, it will more clear and no confused.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11070) Dispatcher.Flusher's control has duplicated/conflict control

2016-01-27 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15118992#comment-15118992
 ] 

Benedict commented on CASSANDRA-11070:
--

It is not >10us vs >20us; it is n ~>= 10us, and choosing between n and 2n; on 
some systems under high load n could be significantly greater than 10us, 
perhaps 100us, perhaps 1ms; making it twice n is potentially worse than just 
adding 10us.

In general, this code is pretty critical to the behaviour of the system.  The 
current behaviour was chosen somewhat arbitrarily by me, but it has been fairly 
thoroughly tested.  With sufficient testing a different arbitrary behaviour 
would be fine.

I would personally tend towards simply always flushing either on the second 
run, if we were to aim for simplicity; possibly with a zero delay (so that it 
just runs at the end of a natural batch), as this is less work, more 
responsive, and in all likelihood the main benefit is from batching the natural 
groupings of messages.

I also, however, don't think this guard is a meaningful contributor to 
complexity in the codebase, and that we've already spent more time discussing 
the _guard_ than it deserves.  However, analysis of the client batching and 
improvements to it are central to system behaviour, so further research is 
obviously welcomed.  Since this ticket is discussing duplicated behaviour which 
is demonstrably absent, I'm closing as Won't Fix.

> Dispatcher.Flusher's control has duplicated/conflict control
> 
>
> Key: CASSANDRA-11070
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11070
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
>Reporter: fujian
>  Labels: performance
> Attachments: 0001-fix-CASSANDRA-11070.patch
>
>
> org.apache.cassandra.transport.Message.Dispatcher.Flusher
> remove duplicated control for flush message control …
> Motivation:
> the !doneWork's control is duplicated and confused with runsSinceFlush > 2
> if on the first run:the queue size is 20
> donework will be set to true and not do flush due to the size<50 and 
> runsSinceFlush<2.
> if on the second run. the queue size is 0,
> donework will be reset to false and not set to true due to no new items in 
> queue, but the flush will be triggered due to:
>   if (!doneWork || runsSinceFlush > 2 || flushed.size() > 50)
> now the runsSinceFlush is 2. so in actual, its function is similar with 
> runsSinceFlush>1.
> so it is no need to keep it so that the code is confused and duplicated.
>   
> Modifications:
> remove it
> Result:
> after remove it, it will more clear and no confused.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11070) Dispatcher.Flusher's control has duplicated/conflict control

2016-01-26 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15117032#comment-15117032
 ] 

Benedict commented on CASSANDRA-11070:
--

I think you're confusing {{>}} with {{>=}}; this guard flushes if it's the 
third flush in a row that produced work, or the second which did _not_ produce 
work; i.e. it flushes if:

# there have been three batches appear together (each could be singleton items, 
with arbitrary temporal separation)
# there has been one batch followed by a >10us pause
# there are more than fifty messages waiting

Your guard changes this to only 1 and 3.

> Dispatcher.Flusher's control has duplicated/conflict control
> 
>
> Key: CASSANDRA-11070
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11070
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
>Reporter: fujian
>  Labels: performance
> Attachments: 0001-fix-CASSANDRA-11070.patch
>
>
> org.apache.cassandra.transport.Message.Dispatcher.Flusher
> remove duplicated control for flush message control …
> Motivation:
> the !doneWork's control is duplicated and confused with runsSinceFlush > 2
> if on the first run:the queue size is 20
> donework will be set to true and not do flush due to the size<50 and 
> runsSinceFlush<2.
> if on the second run. the queue size is 0,
> donework will be reset to false and not set to true due to no new items in 
> queue, but the flush will be triggered due to:
>   if (!doneWork || runsSinceFlush > 2 || flushed.size() > 50)
> now the runsSinceFlush is 2. so in actual, its function is similar with 
> runsSinceFlush>1.
> so it is no need to keep it so that the code is confused and duplicated.
>   
> Modifications:
> remove it
> Result:
> after remove it, it will more clear and no confused.
> check the link: 
> https://github.com/jiafu1115/cassandra/commit/5279884e6a392d36b4adc5e29f9ca5d0666cb275



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11070) Dispatcher.Flusher's control has duplicated/conflict control

2016-01-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116919#comment-15116919
 ] 

ASF GitHub Bot commented on CASSANDRA-11070:


Github user jiafu1115 commented on the pull request:

https://github.com/apache/cassandra/pull/62#issuecomment-174888659
  
@blerer  I had request in 
https://issues.apache.org/jira/browse/CASSANDRA-11070, can you take a look? 3ks


> Dispatcher.Flusher's control has duplicated/conflict control
> 
>
> Key: CASSANDRA-11070
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11070
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
>Reporter: fujian
>  Labels: performance
> Attachments: 0001-fix-CASSANDRA-11070.patch
>
>
> org.apache.cassandra.transport.Message.Dispatcher.Flusher
> remove duplicated control for flush message control …
> Motivation:
> the !doneWork's control is duplicated and confused with runsSinceFlush > 2
> if on the first run:the queue size is 20
> donework will be set to true and not do flush due to the size<50 and 
> runsSinceFlush<2.
> if on the second run. the queue size is 0,
> donework will be reset to false and not set to true due to no new items in 
> queue, but the flush will be triggered due to:
>   if (!doneWork || runsSinceFlush > 2 || flushed.size() > 50)
> now the runsSinceFlush is 2. so in actual, its function is similar with 
> runsSinceFlush>1.
> so it is no need to keep it so that the code is confused and duplicated.
>   
> Modifications:
> remove it
> Result:
> after remove it, it will more clear and no confused.
> check the link: 
> https://github.com/jiafu1115/cassandra/commit/5279884e6a392d36b4adc5e29f9ca5d0666cb275



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11070) Dispatcher.Flusher's control has duplicated/conflict control

2016-01-26 Thread fujian (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15118347#comment-15118347
 ] 

fujian commented on CASSANDRA-11070:


[~benedict]

3ks for your reply. and check my compare between done and without done.

https://cloud.githubusercontent.com/assets/5654180/12599960/ede28bcc-c4d0-11e5-9a2a-5ebd68ced78b.jpg

so if we change > to >=. we can't handle the 2:there has been one batch 
followed by a >10us pause

but we can handle: 2:there has been one batch followed by a >20us pause

what's more, we will have two benefits:

(1) code more clear with less condition judge.
(2) we won't have some no need loop when had no data when 1 or 2

so what's the difference:


there has been one batch followed by a >10us pause
there has been one batch followed by a >20us pause

I think it isn't important compare with more clear code and reduce one no need 
loop.what's your view?

3ks again.








> Dispatcher.Flusher's control has duplicated/conflict control
> 
>
> Key: CASSANDRA-11070
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11070
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
>Reporter: fujian
>  Labels: performance
> Attachments: 0001-fix-CASSANDRA-11070.patch
>
>
> org.apache.cassandra.transport.Message.Dispatcher.Flusher
> remove duplicated control for flush message control …
> Motivation:
> the !doneWork's control is duplicated and confused with runsSinceFlush > 2
> if on the first run:the queue size is 20
> donework will be set to true and not do flush due to the size<50 and 
> runsSinceFlush<2.
> if on the second run. the queue size is 0,
> donework will be reset to false and not set to true due to no new items in 
> queue, but the flush will be triggered due to:
>   if (!doneWork || runsSinceFlush > 2 || flushed.size() > 50)
> now the runsSinceFlush is 2. so in actual, its function is similar with 
> runsSinceFlush>1.
> so it is no need to keep it so that the code is confused and duplicated.
>   
> Modifications:
> remove it
> Result:
> after remove it, it will more clear and no confused.
> check the link: 
> https://github.com/jiafu1115/cassandra/commit/5279884e6a392d36b4adc5e29f9ca5d0666cb275



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11070) Dispatcher.Flusher's control has duplicated/conflict control

2016-01-26 Thread fujian (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15118348#comment-15118348
 ] 

fujian commented on CASSANDRA-11070:


3ks for your reply. and check my compare between done and without done.
https://cloud.githubusercontent.com/assets/5654180/12599960/ede28bcc-c4d0-11e5-9a2a-5ebd68ced78b.jpg
so if we change > to >=. we can't handle the 2:there has been one batch 
followed by a >10us pause
but we can handle: 2:there has been one batch followed by a >20us pause
what's more, we will have two benefits:
(1) code more clear with less condition judge.
(2) we won't have some no need loop when had no data when 1 or 2
so what's the difference:
there has been one batch followed by a >10us pause
there has been one batch followed by a >20us pause
I think it isn't important compare with more clear code and reduce one no need 
loop.what's your view?
3ks again.

> Dispatcher.Flusher's control has duplicated/conflict control
> 
>
> Key: CASSANDRA-11070
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11070
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
>Reporter: fujian
>  Labels: performance
> Attachments: 0001-fix-CASSANDRA-11070.patch
>
>
> org.apache.cassandra.transport.Message.Dispatcher.Flusher
> remove duplicated control for flush message control …
> Motivation:
> the !doneWork's control is duplicated and confused with runsSinceFlush > 2
> if on the first run:the queue size is 20
> donework will be set to true and not do flush due to the size<50 and 
> runsSinceFlush<2.
> if on the second run. the queue size is 0,
> donework will be reset to false and not set to true due to no new items in 
> queue, but the flush will be triggered due to:
>   if (!doneWork || runsSinceFlush > 2 || flushed.size() > 50)
> now the runsSinceFlush is 2. so in actual, its function is similar with 
> runsSinceFlush>1.
> so it is no need to keep it so that the code is confused and duplicated.
>   
> Modifications:
> remove it
> Result:
> after remove it, it will more clear and no confused.
> check the link: 
> https://github.com/jiafu1115/cassandra/commit/5279884e6a392d36b4adc5e29f9ca5d0666cb275



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)