[jira] [Commented] (CASSANDRA-11070) Dispatcher.Flusher's control has duplicated/conflict control
[ https://issues.apache.org/jira/browse/CASSANDRA-11070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15120485#comment-15120485 ] fujian commented on CASSANDRA-11070: thanks for your replay and total agree with you. after all, as you talked "it has been fairly thoroughly tested" and no any research can conclude remove donework condition is benefit for performance. > Dispatcher.Flusher's control has duplicated/conflict control > > > Key: CASSANDRA-11070 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11070 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging >Reporter: fujian > Labels: performance > Attachments: 0001-fix-CASSANDRA-11070.patch > > > org.apache.cassandra.transport.Message.Dispatcher.Flusher > remove duplicated control for flush message control … > Motivation: > the !doneWork's control is duplicated and confused with runsSinceFlush > 2 > if on the first run:the queue size is 20 > donework will be set to true and not do flush due to the size<50 and > runsSinceFlush<2. > if on the second run. the queue size is 0, > donework will be reset to false and not set to true due to no new items in > queue, but the flush will be triggered due to: > if (!doneWork || runsSinceFlush > 2 || flushed.size() > 50) > now the runsSinceFlush is 2. so in actual, its function is similar with > runsSinceFlush>1. > so it is no need to keep it so that the code is confused and duplicated. > > Modifications: > remove it > Result: > after remove it, it will more clear and no confused. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11070) Dispatcher.Flusher's control has duplicated/conflict control
[ https://issues.apache.org/jira/browse/CASSANDRA-11070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15118992#comment-15118992 ] Benedict commented on CASSANDRA-11070: -- It is not >10us vs >20us; it is n ~>= 10us, and choosing between n and 2n; on some systems under high load n could be significantly greater than 10us, perhaps 100us, perhaps 1ms; making it twice n is potentially worse than just adding 10us. In general, this code is pretty critical to the behaviour of the system. The current behaviour was chosen somewhat arbitrarily by me, but it has been fairly thoroughly tested. With sufficient testing a different arbitrary behaviour would be fine. I would personally tend towards simply always flushing either on the second run, if we were to aim for simplicity; possibly with a zero delay (so that it just runs at the end of a natural batch), as this is less work, more responsive, and in all likelihood the main benefit is from batching the natural groupings of messages. I also, however, don't think this guard is a meaningful contributor to complexity in the codebase, and that we've already spent more time discussing the _guard_ than it deserves. However, analysis of the client batching and improvements to it are central to system behaviour, so further research is obviously welcomed. Since this ticket is discussing duplicated behaviour which is demonstrably absent, I'm closing as Won't Fix. > Dispatcher.Flusher's control has duplicated/conflict control > > > Key: CASSANDRA-11070 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11070 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging >Reporter: fujian > Labels: performance > Attachments: 0001-fix-CASSANDRA-11070.patch > > > org.apache.cassandra.transport.Message.Dispatcher.Flusher > remove duplicated control for flush message control … > Motivation: > the !doneWork's control is duplicated and confused with runsSinceFlush > 2 > if on the first run:the queue size is 20 > donework will be set to true and not do flush due to the size<50 and > runsSinceFlush<2. > if on the second run. the queue size is 0, > donework will be reset to false and not set to true due to no new items in > queue, but the flush will be triggered due to: > if (!doneWork || runsSinceFlush > 2 || flushed.size() > 50) > now the runsSinceFlush is 2. so in actual, its function is similar with > runsSinceFlush>1. > so it is no need to keep it so that the code is confused and duplicated. > > Modifications: > remove it > Result: > after remove it, it will more clear and no confused. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11070) Dispatcher.Flusher's control has duplicated/conflict control
[ https://issues.apache.org/jira/browse/CASSANDRA-11070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15117032#comment-15117032 ] Benedict commented on CASSANDRA-11070: -- I think you're confusing {{>}} with {{>=}}; this guard flushes if it's the third flush in a row that produced work, or the second which did _not_ produce work; i.e. it flushes if: # there have been three batches appear together (each could be singleton items, with arbitrary temporal separation) # there has been one batch followed by a >10us pause # there are more than fifty messages waiting Your guard changes this to only 1 and 3. > Dispatcher.Flusher's control has duplicated/conflict control > > > Key: CASSANDRA-11070 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11070 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging >Reporter: fujian > Labels: performance > Attachments: 0001-fix-CASSANDRA-11070.patch > > > org.apache.cassandra.transport.Message.Dispatcher.Flusher > remove duplicated control for flush message control … > Motivation: > the !doneWork's control is duplicated and confused with runsSinceFlush > 2 > if on the first run:the queue size is 20 > donework will be set to true and not do flush due to the size<50 and > runsSinceFlush<2. > if on the second run. the queue size is 0, > donework will be reset to false and not set to true due to no new items in > queue, but the flush will be triggered due to: > if (!doneWork || runsSinceFlush > 2 || flushed.size() > 50) > now the runsSinceFlush is 2. so in actual, its function is similar with > runsSinceFlush>1. > so it is no need to keep it so that the code is confused and duplicated. > > Modifications: > remove it > Result: > after remove it, it will more clear and no confused. > check the link: > https://github.com/jiafu1115/cassandra/commit/5279884e6a392d36b4adc5e29f9ca5d0666cb275 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11070) Dispatcher.Flusher's control has duplicated/conflict control
[ https://issues.apache.org/jira/browse/CASSANDRA-11070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116919#comment-15116919 ] ASF GitHub Bot commented on CASSANDRA-11070: Github user jiafu1115 commented on the pull request: https://github.com/apache/cassandra/pull/62#issuecomment-174888659 @blerer I had request in https://issues.apache.org/jira/browse/CASSANDRA-11070, can you take a look? 3ks > Dispatcher.Flusher's control has duplicated/conflict control > > > Key: CASSANDRA-11070 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11070 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging >Reporter: fujian > Labels: performance > Attachments: 0001-fix-CASSANDRA-11070.patch > > > org.apache.cassandra.transport.Message.Dispatcher.Flusher > remove duplicated control for flush message control … > Motivation: > the !doneWork's control is duplicated and confused with runsSinceFlush > 2 > if on the first run:the queue size is 20 > donework will be set to true and not do flush due to the size<50 and > runsSinceFlush<2. > if on the second run. the queue size is 0, > donework will be reset to false and not set to true due to no new items in > queue, but the flush will be triggered due to: > if (!doneWork || runsSinceFlush > 2 || flushed.size() > 50) > now the runsSinceFlush is 2. so in actual, its function is similar with > runsSinceFlush>1. > so it is no need to keep it so that the code is confused and duplicated. > > Modifications: > remove it > Result: > after remove it, it will more clear and no confused. > check the link: > https://github.com/jiafu1115/cassandra/commit/5279884e6a392d36b4adc5e29f9ca5d0666cb275 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11070) Dispatcher.Flusher's control has duplicated/conflict control
[ https://issues.apache.org/jira/browse/CASSANDRA-11070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15118347#comment-15118347 ] fujian commented on CASSANDRA-11070: [~benedict] 3ks for your reply. and check my compare between done and without done. https://cloud.githubusercontent.com/assets/5654180/12599960/ede28bcc-c4d0-11e5-9a2a-5ebd68ced78b.jpg so if we change > to >=. we can't handle the 2:there has been one batch followed by a >10us pause but we can handle: 2:there has been one batch followed by a >20us pause what's more, we will have two benefits: (1) code more clear with less condition judge. (2) we won't have some no need loop when had no data when 1 or 2 so what's the difference: there has been one batch followed by a >10us pause there has been one batch followed by a >20us pause I think it isn't important compare with more clear code and reduce one no need loop.what's your view? 3ks again. > Dispatcher.Flusher's control has duplicated/conflict control > > > Key: CASSANDRA-11070 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11070 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging >Reporter: fujian > Labels: performance > Attachments: 0001-fix-CASSANDRA-11070.patch > > > org.apache.cassandra.transport.Message.Dispatcher.Flusher > remove duplicated control for flush message control … > Motivation: > the !doneWork's control is duplicated and confused with runsSinceFlush > 2 > if on the first run:the queue size is 20 > donework will be set to true and not do flush due to the size<50 and > runsSinceFlush<2. > if on the second run. the queue size is 0, > donework will be reset to false and not set to true due to no new items in > queue, but the flush will be triggered due to: > if (!doneWork || runsSinceFlush > 2 || flushed.size() > 50) > now the runsSinceFlush is 2. so in actual, its function is similar with > runsSinceFlush>1. > so it is no need to keep it so that the code is confused and duplicated. > > Modifications: > remove it > Result: > after remove it, it will more clear and no confused. > check the link: > https://github.com/jiafu1115/cassandra/commit/5279884e6a392d36b4adc5e29f9ca5d0666cb275 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11070) Dispatcher.Flusher's control has duplicated/conflict control
[ https://issues.apache.org/jira/browse/CASSANDRA-11070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15118348#comment-15118348 ] fujian commented on CASSANDRA-11070: 3ks for your reply. and check my compare between done and without done. https://cloud.githubusercontent.com/assets/5654180/12599960/ede28bcc-c4d0-11e5-9a2a-5ebd68ced78b.jpg so if we change > to >=. we can't handle the 2:there has been one batch followed by a >10us pause but we can handle: 2:there has been one batch followed by a >20us pause what's more, we will have two benefits: (1) code more clear with less condition judge. (2) we won't have some no need loop when had no data when 1 or 2 so what's the difference: there has been one batch followed by a >10us pause there has been one batch followed by a >20us pause I think it isn't important compare with more clear code and reduce one no need loop.what's your view? 3ks again. > Dispatcher.Flusher's control has duplicated/conflict control > > > Key: CASSANDRA-11070 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11070 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging >Reporter: fujian > Labels: performance > Attachments: 0001-fix-CASSANDRA-11070.patch > > > org.apache.cassandra.transport.Message.Dispatcher.Flusher > remove duplicated control for flush message control … > Motivation: > the !doneWork's control is duplicated and confused with runsSinceFlush > 2 > if on the first run:the queue size is 20 > donework will be set to true and not do flush due to the size<50 and > runsSinceFlush<2. > if on the second run. the queue size is 0, > donework will be reset to false and not set to true due to no new items in > queue, but the flush will be triggered due to: > if (!doneWork || runsSinceFlush > 2 || flushed.size() > 50) > now the runsSinceFlush is 2. so in actual, its function is similar with > runsSinceFlush>1. > so it is no need to keep it so that the code is confused and duplicated. > > Modifications: > remove it > Result: > after remove it, it will more clear and no confused. > check the link: > https://github.com/jiafu1115/cassandra/commit/5279884e6a392d36b4adc5e29f9ca5d0666cb275 -- This message was sent by Atlassian JIRA (v6.3.4#6332)