On Thu, Apr 9, 2020 at 7:49 AM Dilip Kumar <dilipbal...@gmail.com> wrote: > I agree that if the leader switches the role, then it is possible that > sometimes the leader might not produce the work before the queue is > empty. OTOH, the problem with the approach you are suggesting is that > the work will be generated on-demand, i.e. there is no specific > process who is generating the data while workers are busy inserting > the data.
I think you have a point. The way I think things could go wrong if we don't have a leader is if it tends to happen that everyone wants new work at the same time. In that case, everyone will wait at once, whereas if there is a designated process that aggressively queues up work, we could perhaps avoid that. Note that you really have to have the case where everyone wants new work at the exact same moment, because otherwise they just all take turns finding work for themselves, and everything is fine, because nobody's waiting for anybody else to do any work, so everyone is always making forward progress. Now on the other hand, if we do have a leader, and for some reason it's slow in responding, everyone will have to wait. That could happen either because the leader also has other responsibilities, like reading data or helping with the main work when the queue is full, or just because the system is really busy and the leader doesn't get scheduled on-CPU for a while. I am inclined to think that's likely to be a more serious problem. The thing is, the problem of everyone needing new work at the same time can't really keep on repeating. Say that everyone finishes processing their first chunk at the same time. Now everyone needs a second chunk, and in a leaderless system, they must take turns getting it. So they will go in some order. The ones who go later will presumably also finish later, so the end times for the second and following chunks will be scattered. You shouldn't get repeated pile-ups with everyone finishing at the same time, because each time it happens, it will force a little bit of waiting that will spread things out. If they clump up again, that will happen again, but it shouldn't happen every time. But in the case where there is a leader, I don't think there's any similar protection. Suppose we go with the design Vignesh proposes where the leader switches to processing chunks when the queue is more than 75% full. If the leader has a "hiccup" where it gets swapped out or is busy with processing a chunk for a longer-than-normal time, all of the other processes have to wait for it. Now we can probably tune this to some degree by adjusting the queue size and fullness thresholds, but the optimal values for those parameters might be quite different on different systems, depending on load, I/O performance, CPU architecture, etc. If there's a system or configuration where the leader tends not to respond fast enough, it will probably just keep happening, because nothing in the algorithm will tend to shake it out of that bad pattern. I'm not 100% certain that my analysis here is right, so it will be interesting to hear from other people. However, as a general rule, I think we want to minimize the amount of work that can only be done by one process (the leader) and maximize the amount that can be done by any process with whichever one is available taking on the job. In the case of COPY FROM STDIN, the reads from the network socket can only be done by the one process connected to it. In the case of COPY from a file, even that could be rotated around, if all processes open the file individually and seek to the appropriate offset. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company