On Tue, 22 Oct 2013, Rainer Gerhards wrote:

On Tue, Oct 22, 2013 at 12:39 PM, David Lang <[email protected]> wrote:

On Tue, 22 Oct 2013, Pavel Levshin wrote:

even the load balancing hack works 'well enough' once you accept that you
are balancing per batch rather than per message (even if you did balance
per message, you really have no idea how expensive a particulare message is
going to be, so you are not really balancing the work precisely, you are
only doing so statistically, and balancing per batch rather than per
message is just as valid statistically)


No, it will not work in general. Suppose you have a batch of 256 and 8
actions to select. 256 divides by 8, so you will put all you load on just
one server. Selecting "right" numbers is hard.


actualy it will, you see 256 is the max batch size, rsyslog will grab
_up_to_ 256 messages for the batch, so if you incrament per message you
will seldom be incrimenting by 256 (unless your system is always running
behind)

but if I am understanding things correctly, putting set $/counter =
$/counter +1; in your config will end up incrimenting counter by 1 per
batch, not by the number of messages in the batch, with the race condition
that two different threads may end up racing and ending up with $/counter
only going up by 1 when it should go up by 2.


No! See why I say it's very unintuitive? ;)

It count's *every* message, but does so

a) in one loop
b) before advancing to the next action

so if you have (pseudo-code)

1. set $/glob = $/glob +1;
2. action write $/glob

and 4 messages a,b,c,d, this will happen. All loops are unrolled in the
pseudo-execution:

set $glob = 1
set $glob = 2
set $glob = 3
set $glob = 4
write 4
write 4
write 4
write 4

Again, IMHO this is *extremely far* from what you expect. I really doubt
that load-balancing based on this counter will work really well, even
stastitcally over a longer period.

as long as you are not always at max batch size, this will end up randomizing over time.

if you have more than one thread working (which I would assume you do if you have a very high load), then if one thread starts getting most of the traffic, another thread will get less than a batch and it will break the cycle.

yes, it would be better if the spreading factor and batchsize are not multiples of each other

David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to