Re: [HACKERS] FDW for PostgreSQL

2013-03-28 Thread Stephen Frost
Tom, all,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
 Another thing I was wondering about, but did not change, is that if we're
 having the remote transaction inherit the local transaction's isolation
 level, shouldn't it inherit the READ ONLY property as well?

Apologies for bringing this up pretty late, but wrt writable FDW
transaction levels, I was *really* hoping that we'd be able to implement
autonomous transactions on top of writeable FDWs.  It looks like there's
no way to do this using the postgres_fdw due to it COMMIT'ing only when
the client transaction commits.  Would it be possible to have a simply
function which could be called to say commit the transaction on the
foreign side for this server/table/connection/whatever?  A nice
addition on top of that would be able to define 'auto-commit' for a
given table or server.

I'll try and find time to work on this, but I'd love feedback on if this
is possible and where the landmines are.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] FDW for PostgreSQL

2013-03-28 Thread Tom Lane
Stephen Frost sfr...@snowman.net writes:
 Apologies for bringing this up pretty late, but wrt writable FDW
 transaction levels, I was *really* hoping that we'd be able to implement
 autonomous transactions on top of writeable FDWs.  It looks like there's
 no way to do this using the postgres_fdw due to it COMMIT'ing only when
 the client transaction commits.  Would it be possible to have a simply
 function which could be called to say commit the transaction on the
 foreign side for this server/table/connection/whatever?  A nice
 addition on top of that would be able to define 'auto-commit' for a
 given table or server.

TBH I think this is a fairly bad idea.  You can get that behavior via
dblink if you need it, but there's no way to do it in an FDW without
ending up with astonishing (and not in a good way) semantics.  A commit
would force committal of everything that'd been done through that
connection, regardless of transaction/subtransaction structure up to
that point; and it would also destroy open cursors.  The only way to
make this sane at all would be to provide user control of which
operations go to which connections; which is inherent in dblink's API
but is simply not a concept in the FDW universe.  And I don't want to
try to plaster it on, either.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FDW for PostgreSQL

2013-03-28 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
 TBH I think this is a fairly bad idea.  You can get that behavior via
 dblink if you need it, 

While I appreciate that dblink can do it, I simply don't see it as a
good solution to this.

 but there's no way to do it in an FDW without
 ending up with astonishing (and not in a good way) semantics.  A commit
 would force committal of everything that'd been done through that
 connection, regardless of transaction/subtransaction structure up to
 that point; and it would also destroy open cursors.  The only way to
 make this sane at all would be to provide user control of which
 operations go to which connections; which is inherent in dblink's API
 but is simply not a concept in the FDW universe.  And I don't want to
 try to plaster it on, either.

This concern would make a lot more sense to me if we were sharing a
given FDW connection between multiple client backends/sessions; I admit
that I've not looked through the code but the documentation seems to
imply that we create one-or-more FDW connection per backend session and
there's no sharing going on.

A single backend will be operating in a linear fashion through the
commands sent to it.  As such, I'm not sure that it's quite as bad as it
may seem.

Perhaps a reasonable compromise would be to have a SERVER option which
is along the lines of 'autocommit', where a user could request that any
query to this server is automatically committed independent of the
client transaction.  I'd be happier if we could allow the user to
control it, but this would at least allow for 'log tables', which are
defined using this server definition, where long-running pl/pgsql code
could log progress where other connections could see it.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] FDW for PostgreSQL

2013-03-28 Thread Tom Lane
Stephen Frost sfr...@snowman.net writes:
 * Tom Lane (t...@sss.pgh.pa.us) wrote:
 ... The only way to
 make this sane at all would be to provide user control of which
 operations go to which connections; which is inherent in dblink's API
 but is simply not a concept in the FDW universe.  And I don't want to
 try to plaster it on, either.

 This concern would make a lot more sense to me if we were sharing a
 given FDW connection between multiple client backends/sessions; I admit
 that I've not looked through the code but the documentation seems to
 imply that we create one-or-more FDW connection per backend session and
 there's no sharing going on.

Well, ATM postgres_fdw shares connections across tables and queries;
but my point is that that's all supposed to be transparent and invisible
to the user.  I don't want to have API features that make connections
explicit, because I don't think that can be shoehorned into the FDW
model without considerable strain and weird corner cases.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FDW for PostgreSQL

2013-03-28 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
 I don't want to have API features that make connections
 explicit, because I don't think that can be shoehorned into the FDW
 model without considerable strain and weird corner cases.

It seems we're talking past each other here.  I'm not particularly
interested in exposing what connections have been made to other servers
via some API (though I could see some debugging use there).  What I was
hoping for is a way for a given user to say I want this specific
change, to this table, to be persisted immediately.  I'd love to have
that ability *without* FDWs too.  It just happens that FDWs provide a
simple way to get there from here and without a lot of muddying of the
waters, imv.

FDWs are no stranger to remote connections which don't have transactions
either, file_fdw will happily return whatever the current contents of
the file are with no concern for PG transactions.  I would expect a
writable file_fdw to act the same and immediately write out any data
written to it.

Hope that clarifies things a bit.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] FDW for PostgreSQL

2013-02-22 Thread Thom Brown
On 21 February 2013 10:30, Tom Lane t...@sss.pgh.pa.us wrote:
 Shigeru Hanada shigeru.han...@gmail.com writes:
 [ postgres_fdw.v5.patch ]

 Applied with a lot of revisions.

Bit of an issue with selecting rows:

postgres=# SELECT * FROM animals;
 id | animal_name | animal_type | lifespan
+-+-+--
  1 | cat | mammal  |   20
  2 | dog | mammal  |   12
  3 | robin   | bird|   12
  4 | dolphin | mammal  |   30
  5 | gecko   | reptile |   18
  6 | human   | mammal  |   85
  7 | elephant| mammal  |   70
  8 | tortoise| reptile |  150
(8 rows)

postgres=# SELECT animals FROM animals;
 animals
-
 (,,,)
 (,,,)
 (,,,)
 (,,,)
 (,,,)
 (,,,)
 (,,,)
 (,,,)
(8 rows)

postgres=# SELECT animals, animal_name FROM animals;
animals| animal_name
---+-
 (,cat,,)  | cat
 (,dog,,)  | dog
 (,robin,,)| robin
 (,dolphin,,)  | dolphin
 (,gecko,,)| gecko
 (,human,,)| human
 (,elephant,,) | elephant
 (,tortoise,,) | tortoise
(8 rows)

postgres=# EXPLAIN (ANALYSE, VERBOSE) SELECT animals FROM animals;
   QUERY PLAN
-
 Foreign Scan on public.animals  (cost=100.00..100.24 rows=8 width=45)
(actual time=0.253..0.255 rows=8 loops=1)
   Output: animals.*
   Remote SQL: SELECT NULL, NULL, NULL, NULL FROM public.animals
 Total runtime: 0.465 ms
(4 rows)

--
Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FDW for PostgreSQL

2013-02-22 Thread Tom Lane
Thom Brown t...@linux.com writes:
 Bit of an issue with selecting rows:

Ooops, looks like I screwed up the logic for whole-row references.
Will fix, thanks for the test case!

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FDW for PostgreSQL

2013-02-22 Thread Thom Brown
On 22 February 2013 14:10, Tom Lane t...@sss.pgh.pa.us wrote:
 Thom Brown t...@linux.com writes:
 Bit of an issue with selecting rows:

 Ooops, looks like I screwed up the logic for whole-row references.
 Will fix, thanks for the test case!

Retried after your changes and all is well.  Thanks Tom.

-- 
Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FDW for PostgreSQL

2013-02-21 Thread Tom Lane
Shigeru Hanada shigeru.han...@gmail.com writes:
 [ postgres_fdw.v5.patch ]

Applied with a lot of revisions.

There are still a number of loose ends and things that need to be
discussed:

I'm not at all happy with the planner support functions --- as designed,
that code is basically incapable of thinking about more than one remote
access path.  It needs to be restructured and then extended to be able
to generate parameterized remote paths.  The local estimation mode is
pretty bogus as well.  I thought the patch could be committed anyway,
but I'm going to go back and work on that part later.

I doubt that deparseFuncExpr is doing the right thing by printing
casts as their underlying function calls.  The comment claims that
this way is more robust but I rather think it's less so, because it's
effectively assuming that the remote server implements casts exactly
like the local one, which might be incorrect if the remote is a different
Postgres version.  I think we should probably change that, but would like
to know what the argument was for coding it like this.  Also, if this
is to be the approach to printing casts, why is RelabelType handled
differently?

As I mentioned earlier, I think it would be better to force the remote
session's search_path setting to just pg_catalog and then reduce the
amount of explicit schema naming in the queries --- any opinions about
that?

I took out the checks on collations of operators because I thought they
were thoroughly broken.  In the first place, looking at operator names
to deduce semantics is unsafe (if we were to try to distinguish equality,
looking up btree opclass membership would be the way to do that).  In the
second place, restricting only collation-sensitive operators and not
collation-sensitive functions seems just about useless for guaranteeing
safety.  But we don't have any very good handle on which functions might
be safe to send despite having collatable input types, so taking that
approach would greatly restrict our ability to send function calls at all.

The bigger picture here though is that we're already relying on the user
to make sure that remote tables have column data types matching the local
definition, so why can't we say that they've got to make sure collations
match too?  So I think this is largely a documentation issue and we don't
need any automated enforcement mechanism, or at least it's silly to try
to enforce this when we're not enforcing column type matching (yet?).

What might make sense is to try to determine whether a WHERE clause uses
any collations different from those of the contained foreign-column Vars,
and send it over only if not.  That would prevent us from sending clauses
that explicitly use collations that might not exist on the remote server.
I didn't try to code this though.

Another thing that I find fairly suspicious in this connection is that
deparse.c isn't bothering to print collations attached to Const nodes.
That may be a good idea to avoid needing the assumption that the remote
server uses the same collation names we do, but if we're going to do it
like this, those Const collations need to be considered when deciding
if the expression is safe to send at all.

A more general idea that follows on from that is that if we're relying on
the user to be sure the semantics are the same, maybe we don't need to be
quite so tight about what we'll send over.  In particular, if the user has
declared a foreign-table column of a non-built-in type, the current code
will never send any constraints for that column at all, which seems overly
conservative if you believe he's matched the type correctly.  I'm not sure
exactly how to act on that thought, but I think there's room for
improvement there.

A related issue is that as coded, is_builtin() is pretty flaky, because
what's built-in on our server might not exist at all on the remote side,
if it's a different major Postgres version.  So what we've got is that
the code is overly conservative about what it will send and yet still
perfectly capable of sending remote queries that will fail outright,
which is not a happy combination.  I have no very good idea how to deal
with that though.

Another thing I was wondering about, but did not change, is that if we're
having the remote transaction inherit the local transaction's isolation
level, shouldn't it inherit the READ ONLY property as well?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FDW for PostgreSQL

2013-02-21 Thread Albe Laurenz
Tom Lane wrote:
 Applied with a lot of revisions.

I am thrilled.

 As I mentioned earlier, I think it would be better to force the remote
 session's search_path setting to just pg_catalog and then reduce the
 amount of explicit schema naming in the queries --- any opinions about
 that?

I think that that would make the remore query much more readable.
That would improve EXPLAIN VERBOSE output, which is a user visible
improvement.

 I took out the checks on collations of operators because I thought they
 were thoroughly broken.  In the first place, looking at operator names
 to deduce semantics is unsafe (if we were to try to distinguish equality,
 looking up btree opclass membership would be the way to do that).  In the
 second place, restricting only collation-sensitive operators and not
 collation-sensitive functions seems just about useless for guaranteeing
 safety.  But we don't have any very good handle on which functions might
 be safe to send despite having collatable input types, so taking that
 approach would greatly restrict our ability to send function calls at all.
 
 The bigger picture here though is that we're already relying on the user
 to make sure that remote tables have column data types matching the local
 definition, so why can't we say that they've got to make sure collations
 match too?  So I think this is largely a documentation issue and we don't
 need any automated enforcement mechanism, or at least it's silly to try
 to enforce this when we're not enforcing column type matching (yet?).

I think that the question what to push down is a different question
from checking column data types, because there we can rely on the
type input functions to reject bad values.

Being permissive on collation issues would lead to user problems
along the lines of my query results are different when I
select from a remote table on a different operating system.
In my experience many users are blissfully ignorant of issues like
collation and encoding.

What about the following design principle:
Only push down conditions which are sure to return the correct
result, provided that the PostgreSQL system objects have not
been tampered with.

Would it be reasonable to push down operators and functions
only if
a) they are in the pg_catalog schema
b) they have been around for a couple of releases
c) they are not collation sensitive?

It makes me uncomfortable to think of a FDW that would happily
push down conditions that may lead to wrong query results.

 Another thing I was wondering about, but did not change, is that if we're
 having the remote transaction inherit the local transaction's isolation
 level, shouldn't it inherit the READ ONLY property as well?

That seems to me like it would be the right thing to do.

Yours,
Laurenz Albe


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FDW for PostgreSQL

2013-02-21 Thread Andres Freund
On 2013-02-21 14:23:35 +, Albe Laurenz wrote:
 Tom Lane wrote:
  Another thing I was wondering about, but did not change, is that if we're
  having the remote transaction inherit the local transaction's isolation
  level, shouldn't it inherit the READ ONLY property as well?
 
 That seems to me like it would be the right thing to do.

I am not 100% convinced of that. There might be valid usecases where a
standby executes queries on the primary that executes that do DML. And
there would be no way out of it I think?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FDW for PostgreSQL

2013-02-21 Thread Tom Lane
Albe Laurenz laurenz.a...@wien.gv.at writes:
 Tom Lane wrote:
 As I mentioned earlier, I think it would be better to force the remote
 session's search_path setting to just pg_catalog and then reduce the
 amount of explicit schema naming in the queries --- any opinions about
 that?

 I think that that would make the remore query much more readable.
 That would improve EXPLAIN VERBOSE output, which is a user visible
 improvement.

Yeah, that's really the main point.  OPERATOR() is tremendously ugly...

 The bigger picture here though is that we're already relying on the user
 to make sure that remote tables have column data types matching the local
 definition, so why can't we say that they've got to make sure collations
 match too?  So I think this is largely a documentation issue and we don't
 need any automated enforcement mechanism, or at least it's silly to try
 to enforce this when we're not enforcing column type matching (yet?).

 I think that the question what to push down is a different question
 from checking column data types, because there we can rely on the
 type input functions to reject bad values.

Unfortunately, that's a very myopic view of the situation: there
are many cases where datatype semantics can vary without the I/O
functions having any idea that anything is wrong.  To take one example,
what if the underlying column is type citext but the user wrote text
in the foreign table definition?  postgres_fdw would see no reason not
to push col = 'foo' across, but that clause would behave quite
differently on the remote side.  Another example is that float8 and
numeric will have different opinions about the truth of
1.1 = 1.2, so you're going
to get into trouble if you declare an FT column as one when the
underlying column is the other, even though the I/O functions for these
types will happily take each other's output.

So I think (and have written in the committed docs) that users had
better be careful to ensure that FT columns are declared as the same
type as the underlying columns, even though we can't readily enforce
that, at least not for non-builtin types.

And there's really no difference between that situation and the
collation situation, though I agree with you that the latter is a lot
more likely to bite careless users.

 What about the following design principle:
 Only push down conditions which are sure to return the correct
 result, provided that the PostgreSQL system objects have not
 been tampered with.

That's a nice, simple, and useless-in-practice design principle,
because it will toss out many situations that users will want to work;
situations that in fact *would* work as long as the users adhere to
safe coding conventions.  I do not believe that when people ask why
does performance of LIKE suck on my foreign table, they will accept an
answer of we don't allow that to be pushed across because we think
you're too stupid to make the remote collation match.

If you want something provably correct, the way to get there is
to work out a way to check if the remote types and collations
really match.  But that's a hard problem AFAICT, so if we want
something usable in the meantime, we are going to have to accept
some uncertainty about what will happen if the user messes up.

 Would it be reasonable to push down operators and functions
 only if
 a) they are in the pg_catalog schema
 b) they have been around for a couple of releases
 c) they are not collation sensitive?

We don't track (b) nor (c), so this suggestion is entirely
unimplementable in the 9.3 time frame; nor is it provably safe,
unless by (b) you mean must have been around since 7.4 or so.

On a longer time horizon this might be doable, but it would be a lot
of work to implement a solution that most people will find far too
restrictive.  I'd rather see the long-term focus be on doing
type/collation matching, so that we can expand not restrict the set
of things we can push across.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FDW for PostgreSQL

2013-02-21 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2013-02-21 14:23:35 +, Albe Laurenz wrote:
 Tom Lane wrote:
 Another thing I was wondering about, but did not change, is that if we're
 having the remote transaction inherit the local transaction's isolation
 level, shouldn't it inherit the READ ONLY property as well?

 That seems to me like it would be the right thing to do.

 I am not 100% convinced of that. There might be valid usecases where a
 standby executes queries on the primary that executes that do DML. And
 there would be no way out of it I think?

How exactly would it do that via an FDW?  Surely if the user tries to
execute INSERT/UPDATE/DELETE against a foreign table, the command would
get rejected in a read-only transaction, long before we even figure out
that the target is a foreign table?

Even granting that there's some loophole that lets the command get sent
to the foreign server, why's it a good idea to allow that?  I rather
thought the idea of READ ONLY was to prevent the transaction from making
any permanent changes.  It's not clear why changes on a remote database
would be exempted from that.

(Doubtless you could escape the restriction anyway with dblink, but that
doesn't mean that postgres_fdw should be similarly ill-defined.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FDW for PostgreSQL

2013-02-21 Thread Andres Freund
On 2013-02-21 09:58:57 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2013-02-21 14:23:35 +, Albe Laurenz wrote:
  Tom Lane wrote:
  Another thing I was wondering about, but did not change, is that if we're
  having the remote transaction inherit the local transaction's isolation
  level, shouldn't it inherit the READ ONLY property as well?
 
  That seems to me like it would be the right thing to do.
 
  I am not 100% convinced of that. There might be valid usecases where a
  standby executes queries on the primary that executes that do DML. And
  there would be no way out of it I think?
 
 How exactly would it do that via an FDW?  Surely if the user tries to
 execute INSERT/UPDATE/DELETE against a foreign table, the command would
 get rejected in a read-only transaction, long before we even figure out
 that the target is a foreign table?

I was thinking of querying a remote table thats actually a view. Which
might be using a function that does caching into a table or something.
Not a completely unreasonable design.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FDW for PostgreSQL

2013-02-21 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2013-02-21 09:58:57 -0500, Tom Lane wrote:
 How exactly would it do that via an FDW?  Surely if the user tries to
 execute INSERT/UPDATE/DELETE against a foreign table, the command would
 get rejected in a read-only transaction, long before we even figure out
 that the target is a foreign table?

 I was thinking of querying a remote table thats actually a view. Which
 might be using a function that does caching into a table or something.
 Not a completely unreasonable design.

Yeah, referencing a remote view is something that should work fine, but
it's not clear to me why it should work differently than it does on the
remote server.  If you select from that same view in a READ ONLY
transaction on the remote, won't it fail?  If so, why should that work
if it's selected from via a foreign table?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FDW for PostgreSQL

2013-02-21 Thread Andres Freund
On 2013-02-21 10:21:34 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2013-02-21 09:58:57 -0500, Tom Lane wrote:
  How exactly would it do that via an FDW?  Surely if the user tries to
  execute INSERT/UPDATE/DELETE against a foreign table, the command would
  get rejected in a read-only transaction, long before we even figure out
  that the target is a foreign table?
 
  I was thinking of querying a remote table thats actually a view. Which
  might be using a function that does caching into a table or something.
  Not a completely unreasonable design.
 
 Yeah, referencing a remote view is something that should work fine, but
 it's not clear to me why it should work differently than it does on the
 remote server.  If you select from that same view in a READ ONLY
 transaction on the remote, won't it fail?  If so, why should that work
 if it's selected from via a foreign table?

Sure, it might fail if you use READ ONLY explicitly. Or the code might
check it. The point is that one might not have choice about the READ
ONLY state of the local transaction if its a HS standby as all
transactions are READ ONLY there.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FDW for PostgreSQL

2013-02-21 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 Sure, it might fail if you use READ ONLY explicitly. Or the code might
 check it. The point is that one might not have choice about the READ
 ONLY state of the local transaction if its a HS standby as all
 transactions are READ ONLY there.

[ shrug... ]  If you want to use a remote DB to cheat on READ ONLY,
there's always dblink.  It's not apparent to me that the FDW
implementation should try to be complicit in such cheating.
(Not that it would work anyway given the command-level checks.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FDW for PostgreSQL

2013-02-21 Thread Albe Laurenz
Tom Lane wrote:
 I think that the question what to push down is a different question
 from checking column data types, because there we can rely on the
 type input functions to reject bad values.
 
 Unfortunately, that's a very myopic view of the situation: there
 are many cases where datatype semantics can vary without the I/O
 functions having any idea that anything is wrong.  To take one example,
 what if the underlying column is type citext but the user wrote text
 in the foreign table definition?  postgres_fdw would see no reason not
 to push col = 'foo' across, but that clause would behave quite
 differently on the remote side.  Another example is that float8 and
 numeric will have different opinions about the truth of
 1.1 = 1.2, so you're going
 to get into trouble if you declare an FT column as one when the
 underlying column is the other, even though the I/O functions for these
 types will happily take each other's output.

You are right.

 So I think (and have written in the committed docs) that users had
 better be careful to ensure that FT columns are declared as the same
 type as the underlying columns, even though we can't readily enforce
 that, at least not for non-builtin types.
 
 And there's really no difference between that situation and the
 collation situation, though I agree with you that the latter is a lot
 more likely to bite careless users.

That's what I am worried about.

 What about the following design principle:
 Only push down conditions which are sure to return the correct
 result, provided that the PostgreSQL system objects have not
 been tampered with.
 
 That's a nice, simple, and useless-in-practice design principle,
 because it will toss out many situations that users will want to work;
 situations that in fact *would* work as long as the users adhere to
 safe coding conventions.  I do not believe that when people ask why
 does performance of LIKE suck on my foreign table, they will accept an
 answer of we don't allow that to be pushed across because we think
 you're too stupid to make the remote collation match.

I think that it will be pretty hard to get both reliability
and performance to an optimum.

I'd rather hear complaints about bad performance than
about bad results, but that's just my opinion.

 Would it be reasonable to push down operators and functions
 only if
 a) they are in the pg_catalog schema
 b) they have been around for a couple of releases
 c) they are not collation sensitive?
 
 We don't track (b) nor (c), so this suggestion is entirely
 unimplementable in the 9.3 time frame; nor is it provably safe,
 unless by (b) you mean must have been around since 7.4 or so.

My idea was to have a (hand picked) list of functions and operators
that are considered safe, but I understand that that is rather
ugly.  There could of course be no generic method because,
as you say, b) and c) are not tracked.

 On a longer time horizon this might be doable, but it would be a lot
 of work to implement a solution that most people will find far too
 restrictive.  I'd rather see the long-term focus be on doing
 type/collation matching, so that we can expand not restrict the set
 of things we can push across.

I like that vision, and of course my above idea does not
go well with it.

Yours,
Laurenz Albe


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FDW for PostgreSQL

2013-02-20 Thread Shigeru Hanada
On Fri, Feb 15, 2013 at 12:58 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 [ rereads that... ]  Hm, I did make some good points.  But having seen
 the end result of this way, I'm still not very happy; it still looks
 like a maintenance problem.  Maybe some additional flags in ruleutils.c
 is the least evil way after all.  Needs more thought.

I'm working on revising deparser so that it uses ruleutils routines to
construct remote query, and re-found an FDW-specific problem which I
encountered some months ago.

So far ruleutils routines require deparse context, which is a list
of namespace information.  Currently deparse_context_for() seems to
fit postgres_fdw's purpose, but it always uses names stored in
catalogs (pg_class, pg_attribute and pg_namespace), though
postgres_fdw wants to replace column/table/schema name with the name
specified in relevant FDW options if any.

Proper remote query will be generated If postgres_fdw can modify
deparse context, but deparse_context is hidden detail of ruleutils.c.
IMO disclosing it is bad idea.

Given these, I'm thinking to add new deparse context generator which
basically construct namespaces from catalogs, but replace one if FDW
option *_name was specified for an object.  With this context,
existing ruleutils would generate expression-strings with proper
names, without any change.

Is this idea acceptable?

-- 
Shigeru HANADA


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FDW for PostgreSQL

2013-02-20 Thread Tom Lane
Shigeru Hanada shigeru.han...@gmail.com writes:
 On Fri, Feb 15, 2013 at 12:58 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 [ rereads that... ]  Hm, I did make some good points.  But having seen
 the end result of this way, I'm still not very happy; it still looks
 like a maintenance problem.  Maybe some additional flags in ruleutils.c
 is the least evil way after all.  Needs more thought.

 I'm working on revising deparser so that it uses ruleutils routines to
 construct remote query, and re-found an FDW-specific problem which I
 encountered some months ago.

After further review I'm unconvinced that we can really do much better
than what's there now --- the idea of sharing code with ruleutils sounds
attractive, but once you look at all the specific issues that ruleutils
would have to be taught about, it gets much less so.  (In particular
I fear we'll find that we have to do some weird stuff to deal with
cross-server-version issues.)  I've been revising the patch on the
assumption that we'll keep deparse.c more or less as is.

Having said that, I remain pretty unhappy with the namespace handling in
deparse.c.  I don't think it serves much purpose to schema-qualify
everything when we're restricting what we can access to built-in
operators and functions --- the loss of readability outweighs the
benefits IMO.  Also, there is very little point in schema-qualifying
almost everything rather than everything; if you're not 100% then you
have no safety against search_path issues.  But that's what we've got
because the code still relies on format_type to print type names.
Now we could get around that complaint by duplicating format_type as
well as ruleutils, but I don't think that's the right direction to
proceed.  I still think it might be a good idea to set search_path to
pg_catalog on the remote side, and then schema-qualify only what is not
in pg_catalog (which would be nothing, in the current code, so far as
types/functions/operators are concerned).

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FDW for PostgreSQL

2013-02-17 Thread Tom Lane
Shigeru Hanada shigeru.han...@gmail.com writes:
 On Sun, Feb 17, 2013 at 8:44 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 These don't seem to me like names that we ought to be
 exposing at the SQL command level.  Why not just schema, table,
 column?  Or perhaps schema_name, table_name, column_name if you
 feel it's essential to distinguish that these are names.

 I think not-shortened names (words used in documents of conversations)
 are better now.  I prefer table_name to table, because it would be
 easy to distinguish  as name, even if we add new options like
 table_foo.

Yeah.  I doubt that these options will be commonly used anyway ---
surely it's easier and less confusing to choose names that match the
remote table in the first place.  So there's no very good reason to
keep the option names short.

I'll go with schema_name, table_name, column_name unless someone
comes along with a contrary opinion.

 In psql \d+ result for postgres_fdw foreign tables, table and
 column are quoted, but schema is not.  Is this behavior of
 quote_ident() intentional?

That's probably a consequence of these being keywords of different
levels of reserved-ness.  If we go with the longer names it won't
happen.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FDW for PostgreSQL

2013-02-16 Thread Tom Lane
Continuing to look at this patch ... I'm wondering if any particular
discussion went into choosing the FDW option names nspname, relname,
and colname.  These don't seem to me like names that we ought to be
exposing at the SQL command level.  Why not just schema, table,
column?  Or perhaps schema_name, table_name, column_name if you
feel it's essential to distinguish that these are names.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FDW for PostgreSQL

2013-02-16 Thread Shigeru Hanada
On Sun, Feb 17, 2013 at 8:44 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Continuing to look at this patch ... I'm wondering if any particular
 discussion went into choosing the FDW option names nspname, relname,
 and colname.

IIRC, there was no deep discussion about those option names.  I simply
chose relname and nspname from pg_class and pg_namespace.  At that
time I thought users would understand those options easily if those
names are used in catalog.

 These don't seem to me like names that we ought to be
 exposing at the SQL command level.  Why not just schema, table,
 column?  Or perhaps schema_name, table_name, column_name if you
 feel it's essential to distinguish that these are names.

I think not-shortened names (words used in documents of conversations)
are better now.  I prefer table_name to table, because it would be
easy to distinguish  as name, even if we add new options like
table_foo.

Besides, I found a strange(?) behavior in psql \d+ command in
no-postfix case, though it wouldn't be a serious problem.

In psql \d+ result for postgres_fdw foreign tables, table and
column are quoted, but schema is not.  Is this behavior of
quote_ident() intentional?

postgres=# \d+ pgbench1_branches
 Foreign table
public.pgbench1_branches
  Column  | Type  | Modifiers |   FDW Options| Storage  |
Stats target | Description
--+---+---+--+--+--+-
 bid  | integer   | not null  | (column 'bid') | plain|
|
 bbalance | integer   |   |  | plain|
|
 filler   | character(88) |   |  | extended |
|
Server: pgbench1
FDW Options: (schema 'public', table 'foo')
Has OIDs: no

We can use table and column options without quoting (or with quote
of course) in CREATE/ALTER FOREIGN TABLE commands, so this is not a
barrier against choosing no-postfix names.

-- 
Shigeru HANADA


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FDW for PostgreSQL

2013-02-14 Thread Shigeru Hanada
On Thu, Feb 14, 2013 at 1:01 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 * The code seems to always use GetOuterUserId() to select the foreign
 user mapping to use.  This seems entirely wrong.  For instance it will
 do the wrong thing inside a SECURITY DEFINER function, where surely the
 relevant privileges should be those of the function owner, not the
 session user.  I would also argue that if Alice has access to a foreign
 table owned by Bob, and Alice creates a view that selects from that
 table and grants select privilege on the view to Charlie, then when
 Charlie selects from the view the user mapping to use ought to be
 Alice's.  (If anyone thinks differently about that, speak up!)

Agreed that OuterUserId is wrong for user mapping.  Also agreed that
Charlie doesn't need his own mapping for the server, if he is
accessing via a valid view.

 To implement that for queries, we need code similar to what
 ExecCheckRTEPerms does, ie rte-checkAsUser ? rte-checkAsUser :
 GetUserId().  It's a bit of a pain to get hold of the RTE from
 postgresGetForeignRelSize or postgresBeginForeignScan, but it's doable.
 (Should we modify the APIs for these functions to make that easier?)

This issue seems not specific to postgres_fdw.  Currently
GetUserMapping takes userid and server's oid as parameters, but we
would able to hide the complex rule by replacing userid with RTE or
something.

 I think possibly postgresAcquireSampleRowsFunc should use the foreign
 table's owner regardless of the current user ID - if the user has
 permission to run ANALYZE then we don't really want the command to
 succeed or fail depending on exactly who the user is.  That's perhaps
 debatable, anybody have another theory?

+1.  This allows non-owner to ANALYZE foreign tables without having
per-user mapping, though public mapping also solves this issue.

In implementation level, postgresAcquireSampleRowsFunc has Relation of
the target table, so we can get owner's oid by reading
rd_rel-relowner.

 * AFAICT, the patch expects to use a single connection for all
 operations initiated under one foreign server + user mapping pair.
 I don't think this can possibly be workable.  For instance, we don't
 really want postgresIterateForeignScan executing the entire remote query
 to completion and stashing the results locally -- what if that's many
 megabytes?

It uses single-row-mode of libpq and TuplestoreState to keep result
locally, so it uses limited memory at a time.  If the result is larger
than work_mem, overflowed tuples are written to temp file.  I think
this is similar to materializing query results.

 It ought to be pulling the rows back a few at a time, and
 that's not going to work well if multiple scans are sharing the same
 connection.  (We might be able to dodge that by declaring a cursor
 for each scan, but I'm not convinced that such a solution will scale up
 to writable foreign tables, nested queries, subtransactions, etc.)

Indeed the FDW used CURSOR in older versions.  Sorry for that I have
not looked writable foreign table patch closely yet, but it would
require (may be multiple) remote update query executions during
scanning?

 I think we'd better be prepared to allow multiple similar connections.
 The main reason I'm bringing this up now is that it breaks the
 assumption embodied in postgres_fdw_get_connections() and
 postgres_fdw_disconnect() that foreign server + user mapping can
 constitute a unique key for identifying connections.  However ...

Main reason to use single connection is to make multiple results
retrieved from same server in a local query consistent.  Shared
snapshot might be helpful for this consistency issue, but I've not
tried that with FDW.

 * I find postgres_fdw_get_connections() and postgres_fdw_disconnect()
 to be a bad idea altogether.  These connections ought to be a hidden
 implementation matter, not something that the user has a view of, much
 less control over.  Aside from the previous issue, I believe it's a
 trivial matter to crash the patch as it now stands by applying
 postgres_fdw_disconnect() to a connection that's in active use.  I can
 see the potential value in being able to shut down connections when a
 session has stopped using them, but this is a pretty badly-designed
 approach to that.  I suggest that we just drop these functions for now
 and revisit that problem later.  (One idea is some sort of GUC setting
 to control how many connections can be held open speculatively for
 future use.)

Actually these functions follows dblink's similar functions, but
having them is a bad decision because FDW can't connect explicitly.
As you mentioned, postgres_fdw_disconnect is provided for clean
shutdown on remote side (I needed it in my testing).

I agree that separate the issue from FDW core.

 * deparse.c contains a depressingly large amount of duplication of logic
 from ruleutils.c, and can only need more as we expand the set of
 constructs that can be pushed to the remote end.  This 

Re: [HACKERS] FDW for PostgreSQL

2013-02-14 Thread Albe Laurenz
Shigeru Hanada wrote:
 Tom Lane wrote:
 It ought to be pulling the rows back a few at a time, and
 that's not going to work well if multiple scans are sharing the same
 connection.  (We might be able to dodge that by declaring a cursor
 for each scan, but I'm not convinced that such a solution will scale up
 to writable foreign tables, nested queries, subtransactions, etc.)
 
 Indeed the FDW used CURSOR in older versions.  Sorry for that I have
 not looked writable foreign table patch closely yet, but it would
 require (may be multiple) remote update query executions during
 scanning?

It would for example call ExecForeignUpdate after each call to
IterateForeignScan that produces a row that meets the UPDATE
condition.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FDW for PostgreSQL

2013-02-14 Thread Shigeru Hanada
On Thu, Feb 14, 2013 at 6:45 PM, Albe Laurenz laurenz.a...@wien.gv.at wrote:
 Shigeru Hanada wrote:
 Tom Lane wrote:
 It ought to be pulling the rows back a few at a time, and
 that's not going to work well if multiple scans are sharing the same
 connection.  (We might be able to dodge that by declaring a cursor
 for each scan, but I'm not convinced that such a solution will scale up
 to writable foreign tables, nested queries, subtransactions, etc.)

 Indeed the FDW used CURSOR in older versions.  Sorry for that I have
 not looked writable foreign table patch closely yet, but it would
 require (may be multiple) remote update query executions during
 scanning?

 It would for example call ExecForeignUpdate after each call to
 IterateForeignScan that produces a row that meets the UPDATE
 condition.

Thanks!  It seems that ExecForeignUpdate needs another connection for
update query, or we need to retrieve all results at the first Iterate
call to prepare for possible subsequent update query.

-- 
Shigeru HANADA


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FDW for PostgreSQL

2013-02-14 Thread Tom Lane
Kohei KaiGai kai...@kaigai.gr.jp writes:
 2013/2/14 Tom Lane t...@sss.pgh.pa.us:
 * deparse.c contains a depressingly large amount of duplication of logic
 from ruleutils.c, and can only need more as we expand the set of
 constructs that can be pushed to the remote end.  This doesn't seem like
 a maintainable approach.  Was there a specific reason not to try to use
 ruleutils.c for this?

 Previously, you suggested to implement its own logic for query deparsing,
 then Hanada-san rewrite the relevant code.
   http://www.postgresql.org/message-id/12181.1331223...@sss.pgh.pa.us

[ rereads that... ]  Hm, I did make some good points.  But having seen
the end result of this way, I'm still not very happy; it still looks
like a maintenance problem.  Maybe some additional flags in ruleutils.c
is the least evil way after all.  Needs more thought.

 Indeed, most of the logic is duplicated. However, it is to avoid bugs in
 some corner cases, for instance, function name is not qualified with
 schema even if this function is owned by different schema in remote side.

That particular reason doesn't seem to hold a lot of water when we're
restricting the code to only push over built-in functions/operators
anyway.

I find it tempting to think about setting search_path explicitly to
pg_catalog (only) on the remote side, whereupon we'd have to
explicitly schema-qualify references to user tables, but built-in
functions/operators would not need that (and it wouldn't really matter
if ruleutils did try to qualify them).

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FDW for PostgreSQL

2013-02-14 Thread Tom Lane
Shigeru Hanada shigeru.han...@gmail.com writes:
 On Thu, Feb 14, 2013 at 1:01 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 * AFAICT, the patch expects to use a single connection for all
 operations initiated under one foreign server + user mapping pair.
 I don't think this can possibly be workable.  For instance, we don't
 really want postgresIterateForeignScan executing the entire remote query
 to completion and stashing the results locally -- what if that's many
 megabytes?

 It uses single-row-mode of libpq and TuplestoreState to keep result
 locally, so it uses limited memory at a time.  If the result is larger
 than work_mem, overflowed tuples are written to temp file.  I think
 this is similar to materializing query results.

Well, yeah, but that doesn't make it an acceptable solution.  Consider
for instance SELECT * FROM huge_foreign_table LIMIT 10.  People are
not going to be satisfied if that pulls back the entire foreign table
before handing them the 10 rows.  Comparable performance problems can
arise even without LIMIT, for instance in handling of nestloop inner
scans.

 I think we'd better be prepared to allow multiple similar connections.

 Main reason to use single connection is to make multiple results
 retrieved from same server in a local query consistent.

Hmm.  That could be a good argument, although the current patch pretty
much destroys any such advantage by being willing to use READ COMMITTED
mode on the far end --- with that, you lose any right to expect
snapshot-consistent data anyway.  I'm inclined to think that maybe we
should always use at least REPEATABLE READ mode, rather than blindly
copying the local transaction mode.  Or maybe this should be driven by a
foreign-server option instead of looking at the local mode at all?

Anyway, it does seem like maybe we need to use cursors so that we can
have several active scans that we are pulling back just a few rows at a
time from.

I'm not convinced that that gets us out of the woods though WRT needing
only one connection.  Consider a query that is scanning some foreign
table, and it calls a plpgsql function, and that function (inside an
EXCEPTION block) does a query that scans another foreign table on the
same server.  This second query gets an error on the remote side.  If
the error is caught via the exception block, and the outer query
continues, what then?  We could imagine adding enough infrastructure
to establish a remote savepoint for each local subtransaction and clean
things up on failure, but no such logic is in the patch now, and I think
it wouldn't be too simple either.  The least painful way to make this
scenario work, given what's in the patch, is to allow such a
subtransaction to use a separate connection.

In any case, I'm pretty well convinced that the connection-bookkeeping
logic needs a major rewrite to have any hope of working in
subtransactions.  I'm going to work on that first and see where it leads.

 * I find postgres_fdw_get_connections() and postgres_fdw_disconnect()
 to be a bad idea altogether.

 I agree that separate the issue from FDW core.

OK, so we'll drop these from the current version of the patch and
revisit the problem of closing connections later.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FDW for PostgreSQL

2013-02-13 Thread Tom Lane
Shigeru Hanada shigeru.han...@gmail.com writes:
 [ postgres_fdw.v5.patch ]

I started to look at this patch today.  There seems to be quite a bit
left to do to make it committable.  I'm willing to work on it, but
there are some things that need discussion:

* The code seems to always use GetOuterUserId() to select the foreign
user mapping to use.  This seems entirely wrong.  For instance it will
do the wrong thing inside a SECURITY DEFINER function, where surely the
relevant privileges should be those of the function owner, not the
session user.  I would also argue that if Alice has access to a foreign
table owned by Bob, and Alice creates a view that selects from that
table and grants select privilege on the view to Charlie, then when
Charlie selects from the view the user mapping to use ought to be
Alice's.  (If anyone thinks differently about that, speak up!)
To implement that for queries, we need code similar to what
ExecCheckRTEPerms does, ie rte-checkAsUser ? rte-checkAsUser :
GetUserId().  It's a bit of a pain to get hold of the RTE from
postgresGetForeignRelSize or postgresBeginForeignScan, but it's doable.
(Should we modify the APIs for these functions to make that easier?)
I think possibly postgresAcquireSampleRowsFunc should use the foreign
table's owner regardless of the current user ID - if the user has
permission to run ANALYZE then we don't really want the command to
succeed or fail depending on exactly who the user is.  That's perhaps
debatable, anybody have another theory?

* AFAICT, the patch expects to use a single connection for all
operations initiated under one foreign server + user mapping pair.
I don't think this can possibly be workable.  For instance, we don't
really want postgresIterateForeignScan executing the entire remote query
to completion and stashing the results locally -- what if that's many
megabytes?  It ought to be pulling the rows back a few at a time, and
that's not going to work well if multiple scans are sharing the same
connection.  (We might be able to dodge that by declaring a cursor
for each scan, but I'm not convinced that such a solution will scale up
to writable foreign tables, nested queries, subtransactions, etc.)
I think we'd better be prepared to allow multiple similar connections.
The main reason I'm bringing this up now is that it breaks the
assumption embodied in postgres_fdw_get_connections() and 
postgres_fdw_disconnect() that foreign server + user mapping can
constitute a unique key for identifying connections.  However ...

* I find postgres_fdw_get_connections() and postgres_fdw_disconnect()
to be a bad idea altogether.  These connections ought to be a hidden
implementation matter, not something that the user has a view of, much
less control over.  Aside from the previous issue, I believe it's a
trivial matter to crash the patch as it now stands by applying
postgres_fdw_disconnect() to a connection that's in active use.  I can
see the potential value in being able to shut down connections when a
session has stopped using them, but this is a pretty badly-designed
approach to that.  I suggest that we just drop these functions for now
and revisit that problem later.  (One idea is some sort of GUC setting
to control how many connections can be held open speculatively for
future use.)

* deparse.c contains a depressingly large amount of duplication of logic
from ruleutils.c, and can only need more as we expand the set of
constructs that can be pushed to the remote end.  This doesn't seem like
a maintainable approach.  Was there a specific reason not to try to use
ruleutils.c for this?  I'd much rather tweak ruleutils to expose some
additional APIs, if that's what it takes, than have all this redundant
logic.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FDW for PostgreSQL

2013-02-13 Thread Kohei KaiGai
2013/2/14 Tom Lane t...@sss.pgh.pa.us:
 * deparse.c contains a depressingly large amount of duplication of logic
 from ruleutils.c, and can only need more as we expand the set of
 constructs that can be pushed to the remote end.  This doesn't seem like
 a maintainable approach.  Was there a specific reason not to try to use
 ruleutils.c for this?  I'd much rather tweak ruleutils to expose some
 additional APIs, if that's what it takes, than have all this redundant
 logic.

The original pgsql_fdw design utilized ruleutils.c logic.
Previously, you suggested to implement its own logic for query deparsing,
then Hanada-san rewrite the relevant code.
  http://www.postgresql.org/message-id/12181.1331223...@sss.pgh.pa.us

Indeed, most of the logic is duplicated. However, it is to avoid bugs in
some corner cases, for instance, function name is not qualified with
schema even if this function is owned by different schema in remote side.
Do we add a flag on deparse_expression() to show this call intend to
construct remote executable query? It may be reasonable, but case-
branch makes code complicated in general

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FDW for PostgreSQL

2012-11-28 Thread Kohei KaiGai
2012/11/28 Shigeru Hanada shigeru.han...@gmail.com:

 On Sun, Nov 25, 2012 at 5:24 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote:

 I checked the v4 patch, and I have nothing to comment anymore.

 So, could you update the remaining EXPLAIN with VERBOSE option
 stuff?


 Thanks for the review.  Here is updated patch.

I checked the patch. The new VERBOSE option of EXPLAIN statement seems to me
working fine. I think it is time to hand over this patch to committer.

It is not a matter to be solved, but just my preference.

postgres=# EXPLAIN(VERBOSE) SELECT * FROM ftbl WHERE a  0 AND b like '%a%';
   QUERY PLAN

 Foreign Scan on public.ftbl  (cost=100.00..100.01 rows=1 width=36)
   Output: a, b
   Filter: (ftbl.b ~~ '%a%'::text)
   Remote SQL: SELECT a, b FROM public.tbl WHERE ((a OPERATOR(pg_catalog.) 0))
(4 rows)

postgres=# EXPLAIN SELECT * FROM ftbl WHERE a  0 AND b like '%a%';
 QUERY PLAN
-
 Foreign Scan on ftbl  (cost=100.00..100.01 rows=1 width=36)
   Filter: (b ~~ '%a%'::text)
(2 rows)

Do you think the qualifier being pushed-down should be explained if VERBOSE
option was not given?

 BTW, we have one more issue around naming of new FDW, and it is discussed in
 another thread.
 http://archives.postgresql.org/message-id/9e59e6e7-39c9-4ae9-88d6-bb0098579...@gmail.com

I don't have any strong option about this naming discussion.
As long as it does not conflict with existing name and is not
misleading, I think
it is reasonable. So, postgre_fdw is OK for me. pgsql_fdw is also welcome.
posugure_fdw may make sense only in Japan. pg_fdw is a bit misleading.

postgresql_fdw might be the best, but do we have some clear advantage
on this name to take an additional effort to solve the conflict with existing
built-in postgresql_fdw_validator() function?
I think, postgres_fdw is enough reasonable choice.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FDW for PostgreSQL

2012-11-28 Thread Kohei KaiGai
2012/11/28 Kohei KaiGai kai...@kaigai.gr.jp:
 it is reasonable. So, postgre_fdw is OK for me. pgsql_fdw is also welcome.

Sorry, s/postgre_fdw/postgres_fdw/g

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FDW for PostgreSQL

2012-11-24 Thread Kohei KaiGai
2012/11/21 Shigeru Hanada shigeru.han...@gmail.com:
 Thank for the comment!

 On Tue, Nov 20, 2012 at 10:23 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote:

 I also think the new use_remote_explain option is good. It works fine
 when we try to use this fdw over the network with latency more or less.
 It seems to me its default is false, thus, GetForeignRelSize will return
 the estimated cost according to ANALYZE, instead of remote EXPLAIN.
 Even though you mention the default behavior was not changed, is it
 an expected one? My preference is the current one, as is.


 Oops, I must have focused on only cost factors.
 I too think that using local statistics is better as default behavior,
 because foreign tables would be used for relatively stable tables.
 If target tables are updated often, it would cause problems about
 consistency, unless we provide full-fledged transaction mapping.


 The deparseFuncExpr() still has case handling whether it is explicit case,
 implicit cast or regular functions. If my previous proposition has no
 flaw,
 could you fix up it using regular function invocation manner? In case when
 remote node has incompatible implicit-cast definition, this logic can make
 a problem.


 Sorry, I overlooked this issue. Fixed to use function call notation
 for all of explicit function calls, explicit casts, and implicit casts.

 At InitPostgresFdwOptions(), the source comment says we don't use
 malloc() here for simplification of code. Hmm. I'm not sure why it is more
 simple. It seems to me we have no reason why to avoid malloc here, even
 though libpq options are acquired using malloc().


 I used simple because using palloc avoids null-check and error handling.
 However, many backend code use malloc to allocate memory which lives
 as long as backend process itself, so I fixed.


 Regarding to the regression test.

  [snip]

 I guess this test tries to check a case when remote column has
 incompatible
 data type with local side. Please check timestamp_out(). Its output
 format follows
 datestyle setting of GUC, and it affects OS configuration on initdb
 time.
 (Please grep datestyle at initdb.c !) I'd like to recommend to use
 another data type
 for this regression test to avoid false-positive detection.


 Good catch.  :)
 I fixed the test to use another data type, user defined enum.

Hanada-san,

I checked the v4 patch, and I have nothing to comment anymore.

So, could you update the remaining EXPLAIN with VERBOSE option
stuff?

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FDW for PostgreSQL

2012-11-22 Thread Shigeru Hanada
After playing with some big SQLs for testing, I came to feel that
showing every remote query in EXPLAIN output is annoying, especially
when SELECT * is unfolded to long column list.

AFAIK no plan node shows so many information in a line, so I'm
inclined to make postgres_fdw to show it only when VERBOSE was
specified.  This would make EXPLAIN output easy to read, even if many
foreign tables are used in a query.

Thoughts?

-- 
Shigeru HANADA


Re: [HACKERS] FDW for PostgreSQL

2012-11-22 Thread Kohei KaiGai
2012/11/22 Shigeru Hanada shigeru.han...@gmail.com:
 After playing with some big SQLs for testing, I came to feel that
 showing every remote query in EXPLAIN output is annoying, especially
 when SELECT * is unfolded to long column list.

 AFAIK no plan node shows so many information in a line, so I'm
 inclined to make postgres_fdw to show it only when VERBOSE was
 specified.  This would make EXPLAIN output easy to read, even if many
 foreign tables are used in a query.

 Thoughts?

Indeed, I also think it is reasonable solution.
-- 
KaiGai Kohei kai...@kaigai.gr.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FDW for PostgreSQL

2012-11-22 Thread Albe Laurenz
Kohei KaiGai wrote:
 2012/11/22 Shigeru Hanada shigeru.han...@gmail.com:
 After playing with some big SQLs for testing, I came to feel that
 showing every remote query in EXPLAIN output is annoying, especially
 when SELECT * is unfolded to long column list.

 AFAIK no plan node shows so many information in a line, so I'm
 inclined to make postgres_fdw to show it only when VERBOSE was
 specified.  This would make EXPLAIN output easy to read, even if many
 foreign tables are used in a query.

 Thoughts?

 Indeed, I also think it is reasonable solution.

+1

That's the way I do it for oracle_fdw.

Yours,
Laurenz Albe


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FDW for PostgreSQL

2012-11-21 Thread Kohei KaiGai
2012/11/21 Shigeru Hanada shigeru.han...@gmail.com:
 Thank for the comment!

 On Tue, Nov 20, 2012 at 10:23 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote:

 I also think the new use_remote_explain option is good. It works fine
 when we try to use this fdw over the network with latency more or less.
 It seems to me its default is false, thus, GetForeignRelSize will return
 the estimated cost according to ANALYZE, instead of remote EXPLAIN.
 Even though you mention the default behavior was not changed, is it
 an expected one? My preference is the current one, as is.


 Oops, I must have focused on only cost factors.
 I too think that using local statistics is better as default behavior,
 because foreign tables would be used for relatively stable tables.
 If target tables are updated often, it would cause problems about
 consistency, unless we provide full-fledged transaction mapping.


 The deparseFuncExpr() still has case handling whether it is explicit case,
 implicit cast or regular functions. If my previous proposition has no
 flaw,
 could you fix up it using regular function invocation manner? In case when
 remote node has incompatible implicit-cast definition, this logic can make
 a problem.


 Sorry, I overlooked this issue. Fixed to use function call notation
 for all of explicit function calls, explicit casts, and implicit casts.

 At InitPostgresFdwOptions(), the source comment says we don't use
 malloc() here for simplification of code. Hmm. I'm not sure why it is more
 simple. It seems to me we have no reason why to avoid malloc here, even
 though libpq options are acquired using malloc().


 I used simple because using palloc avoids null-check and error handling.
 However, many backend code use malloc to allocate memory which lives
 as long as backend process itself, so I fixed.


 Regarding to the regression test.

  [snip]

 I guess this test tries to check a case when remote column has
 incompatible
 data type with local side. Please check timestamp_out(). Its output
 format follows
 datestyle setting of GUC, and it affects OS configuration on initdb
 time.
 (Please grep datestyle at initdb.c !) I'd like to recommend to use
 another data type
 for this regression test to avoid false-positive detection.


 Good catch.  :)
 I fixed the test to use another data type, user defined enum.

One other thing I noticed.

At execute_query(), it stores the retrieved rows onto tuplestore of
festate-tuples at once. Doesn't it make problems when remote-
table has very big number of rows?

IIRC, the previous code used cursor feature to fetch a set of rows
to avoid over-consumption of local memory. Do we have some
restriction if we fetch a certain number of rows with FETCH?
It seems to me, we can fetch 1000 rows for example, and tentatively
store them onto the tuplestore within one PG_TRY() block (so, no
need to worry about PQclear() timing), then we can fetch remote
rows again when IterateForeignScan reached end of tuplestore.

Please point out anything if I missed something.

Anyway, I'll check this v4 patch simultaneously.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FDW for PostgreSQL

2012-11-21 Thread Shigeru Hanada
On Wed, Nov 21, 2012 at 7:31 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote:

 At execute_query(), it stores the retrieved rows onto tuplestore of
 festate-tuples at once. Doesn't it make problems when remote-
 table has very big number of rows?


No.  postgres_fdw uses single-row processing mode of libpq when
retrieving query results in execute_query, so memory usage will
be stable at a certain level.


 IIRC, the previous code used cursor feature to fetch a set of rows
 to avoid over-consumption of local memory. Do we have some
 restriction if we fetch a certain number of rows with FETCH?
 It seems to me, we can fetch 1000 rows for example, and tentatively
 store them onto the tuplestore within one PG_TRY() block (so, no
 need to worry about PQclear() timing), then we can fetch remote
 rows again when IterateForeignScan reached end of tuplestore.


As you say, postgres_fdw had used cursor to avoid possible memory
exhaust on large result set.  I switched to single-row processing mode
(it could be said protocol-level cursor), which was added in 9.2,
because it accomplish same task with less SQL calls than cursor.

Regards,
-- 
Shigeru HANADA


Re: [HACKERS] FDW for PostgreSQL

2012-11-20 Thread Kohei KaiGai
2012/11/15 Shigeru Hanada shigeru.han...@gmail.com:
 Hi Kaigai-san,

 Sorry for delayed response.   I updated the patch, although I didn't change
 any about timing issue you and Fujita-san concern.

 1) add some FDW options for cost estimation.  Default behavior is not
 changed.
 2) get rid of array of libpq option names, similary to recent change of
 dblink
 3) enhance document, especially remote query optimization
 4) rename to postgres_fdw, to avoid naming conflict with the validator which
 exists in core
 5) cope with changes about error context handling

 On Tue, Nov 6, 2012 at 7:36 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote:

 Isn't it possible to pick-up only columns to be used in targetlist or
 local qualifiers,
 without modification of baserestrictinfo?


 IMO, it's possible.  postgres_fdw doesn't modify baserestrictinfo at all; it
 just create two new lists which exclusively point RestrictInfo elements in
 baserestrictinfo.  Pulling vars up from conditions which can't be pushed
 down would gives us list of  necessary columns.  Am I missing something?

Hanada-san,

Let me comments on several points randomly.

I also think the new use_remote_explain option is good. It works fine
when we try to use this fdw over the network with latency more or less.
It seems to me its default is false, thus, GetForeignRelSize will return
the estimated cost according to ANALYZE, instead of remote EXPLAIN.
Even though you mention the default behavior was not changed, is it
an expected one? My preference is the current one, as is.

The deparseFuncExpr() still has case handling whether it is explicit case,
implicit cast or regular functions. If my previous proposition has no flaw,
could you fix up it using regular function invocation manner? In case when
remote node has incompatible implicit-cast definition, this logic can make
a problem.

At InitPostgresFdwOptions(), the source comment says we don't use
malloc() here for simplification of code. Hmm. I'm not sure why it is more
simple. It seems to me we have no reason why to avoid malloc here, even
though libpq options are acquired using malloc().

Regarding to the regression test.

[kaigai@iwashi sepgsql]$ cat contrib/postgres_fdw/regression.diffs
*** /home/kaigai/repo/sepgsql/contrib/postgres_fdw/expected/postgres_fdw.out
   Sat Nov 17 21:31:19 2012
--- /home/kaigai/repo/sepgsql/contrib/postgres_fdw/results/postgres_fdw.out
Tue Nov 20 13:53:32 2012
***
*** 621,627 
  -- ===
  ALTER FOREIGN TABLE ft1 ALTER COLUMN c5 TYPE int;
  SELECT * FROM ft1 WHERE c1 = 1;  -- ERROR
! ERROR:  invalid input syntax for integer: 1970-01-02 00:00:00
  CONTEXT:  column c5 of foreign table ft1
  ALTER FOREIGN TABLE ft1 ALTER COLUMN c5 TYPE timestamp;
  -- ===
--- 621,627 
  -- ===
  ALTER FOREIGN TABLE ft1 ALTER COLUMN c5 TYPE int;
  SELECT * FROM ft1 WHERE c1 = 1;  -- ERROR
! ERROR:  invalid input syntax for integer: Fri Jan 02 00:00:00 1970
  CONTEXT:  column c5 of foreign table ft1
  ALTER FOREIGN TABLE ft1 ALTER COLUMN c5 TYPE timestamp;
  -- ===

==

I guess this test tries to check a case when remote column has incompatible
data type with local side. Please check timestamp_out(). Its output
format follows
datestyle setting of GUC, and it affects OS configuration on initdb time.
(Please grep datestyle at initdb.c !) I'd like to recommend to use
another data type
for this regression test to avoid false-positive detection.

Elsewhere, I could not find problems right now.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FDW for PostgreSQL

2012-11-06 Thread Shigeru HANADA

Sorry for delayed response.

On Sun, Oct 21, 2012 at 3:16 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote:

2012/10/11 Etsuro Fujita fujita.ets...@lab.ntt.co.jp:

I've reviewed your patch quickly.  I noticed that the patch has been created in
a slightly different way from the guidelines:
http://www.postgresql.org/docs/devel/static/fdw-planning.html  The guidelines
say the following, but the patch identifies the clauses in
baserel-baserestrictinfo in GetForeignRelSize, not GetForeignPaths.  Also, it
has been implemented so that most sub_expressions are evaluated at the remote
end, not the local end, though I'm missing something.  For postgresql_fdw to be
a good reference for FDW developers, ISTM it would be better that it be
consistent with the guidelines.  I think it would be needed to update the
following document or redesign the function to be consistent with the following
document.


Hmm. It seems to me Fujita-san's comment is right.


Indeed postgresql_fdw touches baserestrictinfo in GetForeignRelSize, but 
it's because of optimization for better width estimate.  The doc 
Fujita-san pointed says that:



The actual identification of such a clause should happen during
GetForeignPaths, since it would affect the cost estimate for the
path.


I understood this description says that you need to touch baserestrict 
info *before* GetForeignPlan to estimate costs of optimized path.  I 
don't feel that this description prohibits FDW to touch baserestrictinfo 
in GetForeignRelSize, but mentioning it clearly might be better.


Regards,
--
Shigeru HANADA


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FDW for PostgreSQL

2012-11-06 Thread Kohei KaiGai
2012/11/6 Shigeru HANADA shigeru.han...@gmail.com:
 Sorry for delayed response.

 On Sun, Oct 21, 2012 at 3:16 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote:

 2012/10/11 Etsuro Fujita fujita.ets...@lab.ntt.co.jp:

 I've reviewed your patch quickly.  I noticed that the patch has been
 created in
 a slightly different way from the guidelines:
 http://www.postgresql.org/docs/devel/static/fdw-planning.html  The
 guidelines
 say the following, but the patch identifies the clauses in
 baserel-baserestrictinfo in GetForeignRelSize, not GetForeignPaths.
 Also, it
 has been implemented so that most sub_expressions are evaluated at the
 remote
 end, not the local end, though I'm missing something.  For postgresql_fdw
 to be
 a good reference for FDW developers, ISTM it would be better that it be
 consistent with the guidelines.  I think it would be needed to update the
 following document or redesign the function to be consistent with the
 following
 document.

 Hmm. It seems to me Fujita-san's comment is right.


 Indeed postgresql_fdw touches baserestrictinfo in GetForeignRelSize, but
 it's because of optimization for better width estimate.  The doc Fujita-san
 pointed says that:


 The actual identification of such a clause should happen during
 GetForeignPaths, since it would affect the cost estimate for the
 path.


 I understood this description says that you need to touch baserestrict info
 *before* GetForeignPlan to estimate costs of optimized path.  I don't feel
 that this description prohibits FDW to touch baserestrictinfo in
 GetForeignRelSize, but mentioning it clearly might be better.

Hanada-san,

Isn't it possible to pick-up only columns to be used in targetlist or
local qualifiers,
without modification of baserestrictinfo?
Unless we put WHERE clause on EXPLAIN statement for cost estimation on
GetForeignRelSize, all we have to know is list of columns to be fetched using
underlying queries. Once we construct a SELECT statement without WHERE
clause on GetForeignRelSize stage, it is never difficult to append it later
according to the same criteria being implemented at classifyConditions.

I'd like to see committer's opinion here.
Please give Hanada-san some directions.
-- 
KaiGai Kohei kai...@kaigai.gr.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FDW for PostgreSQL

2012-11-06 Thread Tom Lane
Kohei KaiGai kai...@kaigai.gr.jp writes:
 2012/11/6 Shigeru HANADA shigeru.han...@gmail.com:
 Indeed postgresql_fdw touches baserestrictinfo in GetForeignRelSize, but
 it's because of optimization for better width estimate.  The doc Fujita-san
 pointed says that:

 The actual identification of such a clause should happen during
 GetForeignPaths, since it would affect the cost estimate for the
 path.

 I understood this description says that you need to touch baserestrict info
 *before* GetForeignPlan to estimate costs of optimized path.  I don't feel
 that this description prohibits FDW to touch baserestrictinfo in
 GetForeignRelSize, but mentioning it clearly might be better.

 Isn't it possible to pick-up only columns to be used in targetlist or
 local qualifiers, without modification of baserestrictinfo?

What the doc means to suggest is that you can look through the
baserestrictinfo list and then record information elsewhere about
interesting clauses you find.  If the FDW is actually *modifying* that
list, I agree that seems like a bad idea.  I don't recall anything in
the core system that does that, so it seems fragile.  The closest
parallel I can think of in the core system is indexscans pulling out
restriction clauses to use as index quals.  That code doesn't modify
the baserestrictinfo list, only make new lists with some of the same
entries.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FDW for PostgreSQL

2012-11-06 Thread 花田 茂

On 2012/11/07, at 1:35, Tom Lane t...@sss.pgh.pa.us wrote:
 
 Isn't it possible to pick-up only columns to be used in targetlist or
 local qualifiers, without modification of baserestrictinfo?
 
 What the doc means to suggest is that you can look through the
 baserestrictinfo list and then record information elsewhere about
 interesting clauses you find.  If the FDW is actually *modifying* that
 list, I agree that seems like a bad idea.
 
Kaigai-san might have misunderstood that postgresql_fdw changes
baserestrictinfo, since it did so in old implementation.

ClassifyConditions creates new lists, local_conds and remote_conds,
which have cells which point RestrictInfo(s) in baserestrictinfo.
It doesn't copy RestrictInfo for new lists, but I think it's ok
because baserestrictinfo list itself and RestrictInfo(s) pointed by
it are never modified by postgresql_fdw.

I don't recall anything in
 the core system that does that, so it seems fragile.  The closest
 parallel I can think of in the core system is indexscans pulling out
 restriction clauses to use as index quals.  That code doesn't modify
 the baserestrictinfo list, only make new lists with some of the same
 entries.

Thanks for the advise.  I found relation_excluded_by_constraints
which is called by set_rel_size() creates new RestrictInfo list from
baserestrictinfo, and this seems like what postgresql_fdw does in
GetForeignRelSize, from the perspective of relation size estimation.

Regards,
--
Shigeru HANADA
shigeru.han...@gmail.com






-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FDW for PostgreSQL

2012-11-06 Thread Tom Lane
=?iso-2022-jp?B?GyRCMlZFRBsoQiAbJEJMUBsoQg==?= shigeru.han...@gmail.com 
writes:
 ClassifyConditions creates new lists, local_conds and remote_conds,
 which have cells which point RestrictInfo(s) in baserestrictinfo.
 It doesn't copy RestrictInfo for new lists, but I think it's ok
 because baserestrictinfo list itself and RestrictInfo(s) pointed by
 it are never modified by postgresql_fdw.

That's good.  I think there are actually some assumptions that
RestrictInfo nodes are not copied once created.  You can link them into
new lists all you want, but don't copy them.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FDW for PostgreSQL

2012-10-23 Thread Alvaro Herrera
Shigeru HANADA escribió:

 Please examine attached v2 patch (note that is should be applied onto
 latest dblink_fdw_validator patch).

Tom committed parts of the dblink_fdw_validator patch, but not the
removal, so it seems this patch needs to be rebased on top of that
somehow.  I am not able to say what's the best resolution for that
conflict, however.  But it seems to me that maybe you will need to
choose a different name for the validator after all, to support binary
upgrades.

There are some other comments downthread that a followup patch probably
needs to address too, as well.  I am marking this patch Returned with
Feedback.  Please submit an updated version to CF3.

Thanks.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FDW for PostgreSQL

2012-10-20 Thread Kohei KaiGai
2012/10/11 Etsuro Fujita fujita.ets...@lab.ntt.co.jp:
 Hi Hanada-san,

 Please examine attached v2 patch (note that is should be applied onto
 latest dblink_fdw_validator patch).

 I've reviewed your patch quickly.  I noticed that the patch has been created 
 in
 a slightly different way from the guidelines:
 http://www.postgresql.org/docs/devel/static/fdw-planning.html  The guidelines
 say the following, but the patch identifies the clauses in
 baserel-baserestrictinfo in GetForeignRelSize, not GetForeignPaths.  Also, it
 has been implemented so that most sub_expressions are evaluated at the remote
 end, not the local end, though I'm missing something.  For postgresql_fdw to 
 be
 a good reference for FDW developers, ISTM it would be better that it be
 consistent with the guidelines.  I think it would be needed to update the
 following document or redesign the function to be consistent with the 
 following
 document.

Hmm. It seems to me Fujita-san's comment is right.

Even though the latest implementation gets an estimated number of rows
using EXPLAIN with qualified SELECT statement on remote side, then, it is
adjusted with selectivity of local qualifiers, we shall be able to obtain same
cost estimation because postgresql_fdw assumes all the pushed-down
qualifiers are built-in only.

So, is it available to move classifyConditions() to pgsqlGetForeignPlan(),
then, separate remote qualifiers and local ones here?
If we call get_remote_estimate() without WHERE clause, remote side
will give just whole number of rows of remote table. Then, it can be adjusted
with selectivity of all the RestrictInfo (including both remote and local).

Sorry, I should suggest it at the beginning.

This change may affects the optimization that obtains remote columns
being in-use at local side. Let me suggest an expression walker that
sets member of BitmapSet for columns in-use at local side or target list.
Then, we can list up them on the target list of the remote query.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FDW for PostgreSQL

2012-10-11 Thread Etsuro Fujita
Hi Hanada-san,

 Please examine attached v2 patch (note that is should be applied onto
 latest dblink_fdw_validator patch).

I've reviewed your patch quickly.  I noticed that the patch has been created in
a slightly different way from the guidelines:
http://www.postgresql.org/docs/devel/static/fdw-planning.html  The guidelines
say the following, but the patch identifies the clauses in
baserel-baserestrictinfo in GetForeignRelSize, not GetForeignPaths.  Also, it
has been implemented so that most sub_expressions are evaluated at the remote
end, not the local end, though I'm missing something.  For postgresql_fdw to be
a good reference for FDW developers, ISTM it would be better that it be
consistent with the guidelines.  I think it would be needed to update the
following document or redesign the function to be consistent with the following
document.

As an example, the FDW might identify some restriction clauses of the form
foreign_variable= sub_expression, which it determines can be executed on the
remote server given the locally-evaluated value of the sub_expression. The
actual identification of such a clause should happen during GetForeignPaths,
since it would affect the cost estimate for the path.

Thanks,

Best regards,
Etsuro Fujita



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FDW for PostgreSQL

2012-10-03 Thread Kohei KaiGai
Hanada-san,

I tried to check this patch. Because we also had some discussion
on this extension through the last two commit fests, I have no
fundamental design arguments.

So, let me drop in the implementation detail of this patch.

At the postgresql_fdw/deparse.c,

* Even though deparseVar() is never invoked with need_prefix=true,
  I doubt why Var reference needs to be qualified with relation alias.
  It seems to me relation alias is never used in the remote query,
  so isn't it a possible bug?

* deparseFuncExpr() has case handling depending on funcformat
  of FuncExpr. I think all the cases can be deparsed using explicit
  function call, and it can avoid a trouble when remote host has
  inconsistent cast configuration.

At the postgresql_fdw/connection.c,

* I'm worry about the condition for invocation of begin_remote_tx().
  + if (use_tx  entry-refs == 1)
  +begin_remote_tx(entry-conn);
  + entry-use_tx = use_tx;
  My preference is: if (use_tx  !entry-use_tx), instead.
  Even though here is no code path to make a problem obvious,
  it may cause possible difficult-to-find bug, in case when caller
  tried to get a connection being already acquired by someone
  but no transaction needed.

At the postgresql_fdw/postgresql_fdw.c,

* When pgsqlGetForeignPaths() add SQL statement into
  fdw_private, it is implemented as:
  + /* Construct list of SQL statements and bind it with the path. */
  + fdw_private = lappend(NIL, makeString(fpstate-sql.data));
  Could you use list_make1() instead?

* At the bottom half of query_row_processor(), I found these
  mysterious two lines.
MemoryContextSwitchTo(festate-scan_cxt);
MemoryContextSwitchTo(festate-temp_cxt);
  Why not switch temp_cxt directly?

At the sgml/postgresql-fdw.sgml,

* Please add this version does not support sub-transaction handling.
  Especially, all we can do is abort top-level transaction in case when
  an error is occurred at the remote side within sub-transaction.

I hope to take over this patch for committer soon.

Thanks,

2012/9/14 Shigeru HANADA shigeru.han...@gmail.com:
 Hi all,

 I'd like to propose FDW for PostgreSQL as a contrib module again.
 Attached patch is updated version of the patch proposed in 9.2
 development cycle.

 For ease of review, I summarized what the patch tries to achieve.

 Abstract
 
 This patch provides FDW for PostgreSQL which allows users to access
 external data stored in remote PostgreSQL via foreign tables.  Of course
 external instance can be beyond network.  And I think that this FDW
 could be an example of other RDBMS-based FDW, and it would be useful for
 proof-of-concept of FDW-related features.

 Note that the name has been changed from pgsql_fdw which was used in
 last proposal, since I got a comment which says that most of existing
 FDWs have name ${PRODUCT_NAME}_fdw so postgresql_fdw or
 postgres_fdw would be better.  For this issue, I posted another patch
 which moves existing postgresql_fdw_validator into contrib/dblink with
 renaming in order to reserve the name postgresql_fdw for this FDW.
 Please note that the attached patch requires dblink_fdw_validator.patch
 to be applied first.
 http://archives.postgresql.org/pgsql-hackers/2012-09/msg00454.php

 Query deparser
 ==
 Now postgresql_fdw has its own SQL query deparser inside, so it's free
 from backend's ruleutils module.

 This deparser maps object names when generic options below were set.

   nspname of foreign table: used as namespace (schema) of relation
   relname of foreign table: used as relation name
   colname of foreign column: used as column name

 This mapping allows flexible schema design.

 SELECT optimization
 ===
 postgresql_fdw always retrieves as much columns as foreign table from
 remote to avoid overhead of column mapping.  However, often some of them
 (or sometimes all of them) are not used on local side, so postgresql_fdw
 uses NULL literal as such unused columns in SELECT clause of remote
 query.  For example, let's assume one of pgbench workloads:

 SELECT abalance FROM pgbench_accounts WHERE aid = 1;

 This query generates a remote query below.  In addition to bid and
 filler, aid is replaced with NULL because it's already evaluated on
 remote side.

 SELECT NULL, NULL, abalance, NULL FROM pgbench_accounts
  WHERE (aid OPERATOR(pg_catalog.=) 1);

 This trick would improve performance notably by reducing amount of data
 to be transferred.

 One more example.  Let's assume counting rows.

 SELCT count(*) FROM pgbench_accounts;

 This query requires only existence of row, so no actual column reference
 is in SELECT clause.

 SELECT NULL, NULL, NULL, NULL FROM pgbench_accounts;

 WHERE push down
 ===
 postgresql_fdw pushes down some of restrictions (IOW, top level elements
 in WHERE clause which are connected with AND) which can be evaluated on
 remote side safely.  Currently the criteria safe is declared as
 whether an 

Re: [HACKERS] FDW for PostgreSQL

2012-10-03 Thread Merlin Moncure
On Fri, Sep 14, 2012 at 9:25 AM, Shigeru HANADA
shigeru.han...@gmail.com wrote:
 Hi all,

 I'd like to propose FDW for PostgreSQL as a contrib module again.
 Attached patch is updated version of the patch proposed in 9.2
 development cycle.

very nice.

   - binary transfer (only against servers with same PG major version?)

Unfortunately this is not enough -- at least not without some
additional work. The main problem is user defined types, especially
composites.  Binary wire format sends internal member oids which the
receiving server will have to interpret.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers