On 2/8/17 5:11 PM, Kyle Gearhart wrote:
Overall, wall clock improves 24%.  User time elapsed is a 430% improvement.  
About half the time is spent waiting on the IO with the callback.  With the 
regular pqRowProcessor only about 16% of the time is spent waiting on IO.

To wit...

                real    user    sys
single row      0.214   0.131   0.048
callback        0.161   0.030   0.051

Those are averaged over 11 runs.

Can you run a trace to see where all the time is going in the single row case? I don't see an obvious time-suck with a quick look through the code. It'd be interesting to see how things change if you eliminate the filler column from the SELECT.

Also, the backend should be buffering ~8kb of data before handing that to the socket. If that's more than the kernel can buffer I'd expect a serious performance hit.
