Re: [HACKERS] Speed dblink using alternate libpq tuple storage

Marko Kreen Tue, 03 Apr 2012 15:21:39 -0700

On Tue, Apr 03, 2012 at 05:32:25PM -0400, Tom Lane wrote:
> Marko Kreen <mark...@gmail.com> writes:
> > The fact remains that upper-level code must cooperate with callback.
> > Why is it useful to hijack PQgetResult() to do so?
> 
> Because that's the defined communications channel.  We're not
> "hijacking" it.  If we're going to start using pejorative words here,
> I will assert that your proposal hijacks PQisBusy to make it do
> something substantially different than the traditional understanding
> of it (more about that below).


And I would agree with it - and I already did:

    Seems we both lost sight of actual usage scenario for the early-exit
    logic - that both callback and upper-level code *must* cooperate
    for it to be useful.  Instead, we designed API for non-cooperating case,
    which is wrong.

> > Seems our concept of "async mode" is different.  I associate
> > PQisnonblocking() with it.
> 
> Well, there are really four levels to the API design:
> 
>       * Plain old PQexec.
>       * Break down PQexec into PQsendQuery and PQgetResult.
>       * Avoid waiting in PQgetResult by testing PQisBusy.
>       * Avoid waiting in PQsendQuery (ie, avoid the risk of blocking
>         on socket writes) by using PQisnonblocking.
> 
> Any given app might choose to run at any one of those four levels,
> although the first one probably isn't interesting for an app that would
> care about using a suspending row processor.  But I see no reason that
> we should design the suspension feature such that it doesn't work at the
> second level.  PQisBusy is, and should be, an optional state-testing
> function, not something that changes the set of things you can do with
> the connection.

Thats actually nice overview.  I think our basic disagreement comes
from how we map the early-exit into those modes.

I want to think of the early-exit row-processing as 5th and 6th modes:

* Row-by-row processing on sync connection (PQsendQuery() + ???)
* Row-by-row processing on async connection (PQsendQuery() + ???)

But instead you want work with almost no changes on existing modes.

And I don't like it because as I've said it previously, the upper
level must know about callback and handle it properly, so it does
not make sense keep existing loop structure in stone.

Instead we should design the new mode that is logical for user
and also logical inside libpq.

> > IMHO it's much better to remove the concept of early-exit from public
> > API completely and instead give "get" style API that does the early-exit
> > internally.
> 
> If the row processor has an early-exit option, it hardly seems
> reasonable to me to claim that that's not part of the public API.

Please keep in mind the final goal - nobody is interested in
"early-exit" on it's own, the interesting goal is processing rows
outside of libpq.

And the early-exit return code is clumsy hack to make it possible
with callbacks.

But why should we provide complex way of achieving something,
when we can provide easy and direct way?

> In particular, I flat out will not accept a design in which that option
> doesn't work unless the current call came via PQisBusy, much less some
> entirely new call like PQhasRowOrResult.  It's unusably fragile (ie,
> timing sensitive) if that has to be true.

Agreed for PQisBusy, but why is PQhasRowOrResult() fragile?

It's easy to make PQhasRowOrResult more robust - let it set flag under
PGconn and let getAnotherTuple() do early exit on it's own, thus keeping
callback completely out of loop.  Thus avoiding any chance user callback
can accidentally trigger the behaviour.

> There's another way in which I think your proposal breaks the existing
> API, which is that without an internal PQASYNC_SUSPENDED state that is
> cleared by PQgetResult, it's unsafe to probe PQisBusy multiple times
> before doing something useful.  That shouldn't be the case.  PQisBusy
> is supposed to test whether data is ready for the application to do
> something with --- it should not throw away data until a separate call
> has been made to consume the data.  Changing its behavior like that
> would risk creating subtle bugs in existing event-loop logic.  A
> possibly useful analogy is that select() and poll() don't clear a
> socket's read-ready state, you have to do a separate read() to do that.
> There are good reasons for that separation of actions.

It does look like argument for new "modes" for row-by-row processing,
preferably with new API calls.

Because adding "magic" to existing APIs seems to only cause confusion.

> > Again, note that if we would provide PQrecvRow()-style API, then we *don't*
> > need early-exit in callback API.  Nor exceptions...
> 
> AFAICS, we *do* need exceptions for dblink's usage.  So most of what's
> at stake here is not avoidable, unless you're proposing we put this
> whole set of patches off till 9.3 so we can think it over some more.

My point was that if we have an API for processing rows outside libpq,
the need for exceptions for callbacks is mostly gone.  Converting dblink
to PQhasRowOrResult() should not be hard.

I don't mind keeping exceptions, but only if we don't need to add extra
complexity just for them.

-- 
marko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speed dblink using alternate libpq tuple storage

Reply via email to