Hello, At Tue, 11 Jul 2017 10:28:51 +0200, Antonin Houska <a...@cybertec.at> wrote in <6448.1499761731@localhost> > Kyotaro HORIGUCHI <horiguchi.kyot...@lab.ntt.co.jp> wrote: > > Effectively it is a waiting-queue followed by a > > completed-list. The point of the compaction is keeping the order > > of waiting or not-yet-completed requests, which is crucial to > > avoid kind-a precedence inversion. We cannot keep the order by > > using bitmapset in such way. > > > The current code waits all waiters at once and processes all > > fired events at once. The order in the waiting-queue is > > inessential in the case. On the other hand I suppoese waiting on > > several-tens to near-hundred remote hosts is in a realistic > > target range. Keeping the order could be crucial if we process a > > part of the queue at once in the case. > > > > Putting siginificance on the deviation of response time of > > remotes, process-all-at-once is effective. In turn we should > > consider the effectiveness of the lifecycle of the larger wait > > event set. > > ok, I missed the fact that the order of es_pending_async entries is > important. I think this is worth adding a comment.
I'll put an upper limit to the number of waiters processed at once. Then add a comment like that. > Actually the reason I thought of simplification was that I noticed small > inefficiency in the way you do the compaction. In particular, I think it's not > always necessary to swap the tail and head entries. Would something like this > make sense? I'm not sure, but I suppose that it is rare that all of the first many elements in the array are not COMPLETE. In most cases the first element gets a response first. > > /* If any node completed, compact the array. */ > if (any_node_done) > { ... > for (tidx = 0; tidx < estate->es_num_pending_async; > ++tidx) > { ... > if (tail->state == ASYNCREQ_COMPLETE) > continue; > > /* > * If the array starts with one or more > incomplete requests, > * both head and tail point at the same item, > so there's no > * point in swapping. > */ > if (tidx > hidx) > { This works to skip the first several elements when all of them are ASYNCREQ_COMPLETE. I think it makes sense as long as it doesn't harm the loop. The optimization is more effective by putting out of the loop like this. | for (tidx = 0; tidx < estate->es_num_pending_async && estate->es_pending_async[tidx] == ASYNCREQ_COMPLETE; ++tidx); | for (; tidx < estate->es_num_pending_async; ++tidx) ... > And besides that, I think it'd be more intuitive if the meaning of "head" and > "tail" was reversed: if the array is iterated from lower to higher positions, > then I'd consider head to be at higher position, not tail. Yeah, but maybe the "head" is still confusing even if reversed because it is still not a head of something. It might be less confusing by rewriting it in more verbose-but-straightforwad way. | int npending = 0; | | /* Skip over not-completed items at the beginning */ | while (npending < estate->es_num_pending_async && | estate->es_pending_async[npending] != ASYNCREQ_COMPLETE) | npending++; | | /* Scan over the rest for not-completed items */ | for (i = npending + 1 ; i < estate->es_num_pending_async; ++i) | { | PendingAsyncRequest *tmp; | PendingAsyncRequest *curr = estate->es_pending_async[i]; | | if (curr->state == ASYNCREQ_COMPLETE) | continue; | | /* Move the not-completed item to the tail of the first chunk */ | tmp = estate->es_pending_async[i]; | estate->es_pending_async[nepending] = tmp; | estate->es_pending_async[i] = tmp; | ++npending; | } regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers