On Wed, Jan 31, 2024 at 2:23 PM Melih Mutlu <m.melihmu...@gmail.com> wrote: >> That seems like it might be a useful refinement of Melih Mutlu's >> original proposal, but consider a message stream that consists of >> messages exactly 8kB in size. If that message stream begins when the >> buffer is empty, all messages are sent directly. If it begins when >> there are any number of bytes in the buffer, we buffer every message >> forever. That's kind of an odd artifact, but maybe it's fine in >> practice. I say again that it's good to test out a bunch of scenarios >> and see what shakes out. > > Isn't this already the case? Imagine sending exactly 8kB messages, the first > pq_putmessage() call will buffer 8kB. Any call after this point simply sends > a 8kB message already buffered from the previous call and buffers a new 8kB > message. Only difference here is we keep the message in the buffer for a > while instead of sending it directly. In theory, the proposed idea should not > bring any difference in the number of flushes and the size of data we send in > each time, but can remove unnecessary copies to the buffer in this case. I > guess the behaviour is also the same with or without the patch in case the > buffer has already some bytes.
Yes, it's never worse than today in terms of number of buffer flushes, but it doesn't feel like great behavior, either. Users tend not to like it when the behavior of an algorithm depends heavily on incidental factors that shouldn't really be relevant, like whether the buffer starts with 1 byte in it or 0 at the beginning of a long sequence of messages. They see the performance varying "for no reason" and they dislike it. They don't say "even the bad performance is no worse than earlier versions so it's fine." > You're right and I'm open to doing more legwork. I'd also appreciate any > suggestion about how to test this properly and/or useful scenarios to test. > That would be really helpful. I think experimenting to see whether the long-short-long-short behavior that Heikki postulated emerges in practice would be a really good start. Another experiment that I think would be interesting is: suppose you create a patch that sends EVERY message without buffering and compare that to master. My naive expectation would be that this will lose if you pump short messages through that connection and win if you pump long messages through that connection. Is that true? If yes, at what point do we break even on performance? Does it depend on whether the connection is local or over a network? Does it depend on whether it's with or without SSL? Does it depend on Linux vs. Windows vs. whateverBSD? What happens if you twiddle the 8kB buffer size up or, say, down to just below the Ethernet frame size? I think that what we really want to understand here is under what circumstances the extra layer of buffering is a win vs. being a loss. If all the stuff I just mentioned doesn't really matter and the answer is, say, that an 8kB buffer is great and the breakpoint where extra buffering makes sense is also 8kB, and that's consistent regardless of other variables, then your algorithm or Jelte's variant or something of that nature is probably just right. But if it turns out, say, that the extra buffering is only a win for sub-1kB messages, that would be rather nice to know before we finalize the approach. Also, if it turns out that the answer differs dramatically based on whether you're using a UNIX socket or TCP, that would also be nice to know before finalizing an algorithm. > I understand that I should provide more/better analysis around this change to > prove that it doesn't hurt (hopefully) but improves some cases even though > not all the cases. That may even help us to find a better approach than > what's already proposed. Just to clarify, I don't think anyone here suggests > that the bar should be at "if it can't lose relative to today, it's good > enough". IMHO "a change that improves some cases, but regresses nowhere" does > not translate to that. Well, I thought those were fairly similar sentiments, so maybe I'm not quite understanding the statement in the way it was meant. -- Robert Haas EDB: http://www.enterprisedb.com