Re: [HACKERS] New hook after raw parsing, before analyze

2014-02-16 Thread David Beck
 There is a hook post_parse_analyze_hook but I think it comes too
 late as it comes after the analyze step which is when Postgres looks
 up the schema information for every relation mentioned in the query.
 What you would need is a post_parse_hook which would work on the raw
 parse tree before the analyze step. That doesn't seem terribly
 controversial to add though there may be some technical details. The
 API would of course be completely unstable from major release to major
 release -- the parse tree gets knocked around quite a bit.

Good to know that parse tree is not stable. My original proposal was exactly 
the one you describe. I called it post_raw_parse_hook and wanted to call it 
from exec_simple_query(), right after pg_parse_query().
May be not the best place or way to achieve this? 

It was a few liner patch, I wonder if it ever has a chance to get into Postgres.

 And I have to wonder if you aren't going the long way around to do
 something that can be done more simply some other way. If you have
 150k objects I wonder if your objects aren't all very similar and
 could be handled by a single Postgres schema object. Either a single
 FDW object or a simple function.

I need to do a bit more research on that side. I was hoping to spend less time 
on understanding that legacy system...

 As a side note, you should evaluate carefully what lazily creating
 objects will buy you. Perhaps just creating 150k objects would be
 cheaper than maintaining this code. In particular since the user
 *might* access all 150k you still have to worry about the worst case
 anyway and it might be cheaper to just engineer for it in the first
 place.

I have a similar problem with the legacy system. The set of 150k objects may 
change in between releases. My gut feeling is that it is easier to understand 
and keep track of an open source project like Postgres.

Best regards and thanks for all insights and ideas, David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New hook after raw parsing, before analyze

2014-02-15 Thread David Beck
2014.02.15. dátummal, 0:46 időpontban Greg Stark st...@mit.edu írta:

 On Fri, Feb 14, 2014 at 9:16 PM, David Beck db...@starschema.net wrote:
 Another point I liked in mysql is the possibility to write info schema 
 plugins: 
 http://dev.mysql.com/doc/refman/5.1/en/writing-information-schema-plugins.html
 Like a virtual catalog. Is there anything similar in Postgres?
 
 The documentation you linked to describes how to provide
 information_schema plugins but not why you would want to do such a
 thing. I'm not seeing why this would be useful. The information_schema
 schema is described by the standard so creating new views in it isn't
 needed very often and the schema for the existing views doesn't change
 very often. I can see why a plugin might want to add rows to the views
 but that doesn't seem to be what this feature is about.

Another reason I was thinking about dynamic catalog and/or query rewrite is the 
project I work on is a data integration platform. Right now it is in the 
feasibility study phase and Postgres+extension looks to be the strongest option.

The legacy system we want to interface with has over 150k table like objects. 
Our platform’s task is to provide a relational view on top of them.

I know that it is unlikely the users to use all 150k tables. I would expect may 
be 10-100 are used in practice, but I didn’t want to figure out which 100, 
neither want to create all 150k catalog entries in advance.

I was also dreaming about the possibility to transfer the small enough objects 
to Postgres tables in the background and spare the communication with the 
legacy system and let Postgres do the joins on these.

The solution I was thinking about is this:

- when the query arrives a smart rewrite would know 1) what tables are local 2) 
what tables need new catalog entries 3) what can be joined on the other side
- the rewriter would potentially add SQL statements in the beginning of the 
query for creating the missing FDW catalog entries if needed
- the FDW would be handled by the same extension so they can easily talk to 
each other about the status of the objects, so the rewriter would know if the 
background transfer of the small table is completed and should do the rewrite 
accordingly

I know these are pretty far from the functionality and traditional operation of 
an RDBMS… but if you look at the FDW examples like do a select on a Google Imap 
mailbox, it is not that far from Postgres

Best regards, David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New hook after raw parsing, before analyze

2014-02-14 Thread David Beck
Thanks for the reply. There are two things I think I’ve been misunderstood:

1, the point is to do the rewrite without and before catalog access
2, I do want to push the join to the source and equally important pushing the 
where conditions there

Best regards, David


2014.02.13. dátummal, 21:22 időpontban Tom Lane t...@sss.pgh.pa.us írta:

 David Beck db...@starschema.net writes:
 I have table like data structures in the source system for the FDW I work on.
 These tables are sometimes too big and the source system is able to filter 
 and join them with limitations, thus it is not optimal to transfer the data 
 to Postgres.
 At the same time I want the users to think in terms of the original tables.
 
 The idea is to rewrite the SQL queries like this:
 
  “SELECT * FROM tableA a, tableB b WHERE a.id=b.id AND a.col1=1234 AND 
 b.col2=987”
 
 to:
 
  “SELECT * FROM fdw_tableA_tableB ab WHERE ab.col1=1234 AND ab.col2=987”
 
 TBH this sounds like a spectacularly bad idea, especially in the place and
 way you propose to do it.  You can't even do catalog access safely where
 you've put that hook, not to mention that there are many other places
 where queries can be submitted.  But more generally, an FDW should not
 operate in the way you're describing.
 
 We do lack support for pushing joins to the foreign server, and that needs
 to be addressed; but we need to do it in the planner, not by kluging the
 query somewhere upstream of that.
 
   regards, tom lane



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New hook after raw parsing, before analyze

2014-02-14 Thread David Beck
Let me rephrase this:

Let’s remove my motivations and use cases from this conversation….

Why is that a bad idea of rewriting the query before it reaches 
transform/analyze (without ever accessing the catalog)?

If that flexibility is acceptable to you, where would be the best place to put 
it in?

Thanks, David


2014.02.14. dátummal, 10:30 időpontban David Beck db...@starschema.net írta:

 Thanks for the reply. There are two things I think I’ve been misunderstood:
 
 1, the point is to do the rewrite without and before catalog access
 2, I do want to push the join to the source and equally important pushing the 
 where conditions there
 
 Best regards, David
 
 
 2014.02.13. dátummal, 21:22 időpontban Tom Lane t...@sss.pgh.pa.us írta:
 
 David Beck db...@starschema.net writes:
 I have table like data structures in the source system for the FDW I work 
 on.
 These tables are sometimes too big and the source system is able to filter 
 and join them with limitations, thus it is not optimal to transfer the data 
 to Postgres.
 At the same time I want the users to think in terms of the original tables.
 
 The idea is to rewrite the SQL queries like this:
 
 “SELECT * FROM tableA a, tableB b WHERE a.id=b.id AND a.col1=1234 AND 
 b.col2=987”
 
 to:
 
 “SELECT * FROM fdw_tableA_tableB ab WHERE ab.col1=1234 AND ab.col2=987”
 
 TBH this sounds like a spectacularly bad idea, especially in the place and
 way you propose to do it.  You can't even do catalog access safely where
 you've put that hook, not to mention that there are many other places
 where queries can be submitted.  But more generally, an FDW should not
 operate in the way you're describing.
 
 We do lack support for pushing joins to the foreign server, and that needs
 to be addressed; but we need to do it in the planner, not by kluging the
 query somewhere upstream of that.
 
  regards, tom lane
 



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New hook after raw parsing, before analyze

2014-02-14 Thread David Beck
I think I’m gonna need to dig into the planner to fully understand your points. 
Thank you for the insights. I was more into putting the knowledge of the legacy 
system into the an extension and my codebase. Now I see better use of the 
planner would help. Thank you.

What inspired me is the scriptable query rewrite in 
http://dev.mysql.com/downloads/mysql-proxy/ 
The hook I proposed would be a lot nicer in Postgres because the raw parsing is 
already done at this point while in mysql-proxy that has to be done manually.

Another point I liked in mysql is the possibility to write info schema plugins: 
http://dev.mysql.com/doc/refman/5.1/en/writing-information-schema-plugins.html
Like a virtual catalog. Is there anything similar in Postgres?

Thank you, David


2014.02.14. dátummal, 18:06 időpontban Greg Stark st...@mit.edu írta:

 On Fri, Feb 14, 2014 at 2:28 PM, David Beck db...@starschema.net wrote:
 Why is that a bad idea of rewriting the query before it reaches 
 transform/analyze (without ever accessing the catalog)?
 
 If that flexibility is acceptable to you, where would be the best place to 
 put it in?
 
 Well if there are two foreign tables and the planner could push the
 join work down to the fdw then the planner should be able to
 accurately represent that plan and cost it without having the user
 have to create any catalog structures. That's what the planner does
 for every other type of plan node.
 
 What you're describing would still be useful for materialized views.
 In that case the user is creating the materialized view and it is a
 real thing in the catalogs that won't disappear on the planner. Even
 then it would be ideal if the planner could decide to use the
 materialized view late enough that it can actually determine if it's
 superior rather than rewriting the query before it gets to that point.
 That would be much more flexible for users too who might not write the
 query in a way that exactly matches the materialized view.
 
 -- 
 greg



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] New hook after raw parsing, before analyze

2014-02-13 Thread David Beck
Hello Hackers,

I work on a foreign data wrapper for a legacy system. I generally find the hook 
system very useful and flexible way to extend Postgres.
The post parse analyze hook almost fits what I need, but I have a few use cases 
where I would need to tap right into the parsed queries but before any catalog 
based validation is done.
Please find the attached trivial patch for this new hook.

One of the use cases I have is this:

I have table like data structures in the source system for the FDW I work on.
These tables are sometimes too big and the source system is able to filter and 
join them with limitations, thus it is not optimal to transfer the data to 
Postgres.
At the same time I want the users to think in terms of the original tables.

The idea is to rewrite the SQL queries like this:

  “SELECT * FROM tableA a, tableB b WHERE a.id=b.id AND a.col1=1234 AND 
b.col2=987”

to:

  “SELECT * FROM fdw_tableA_tableB ab WHERE ab.col1=1234 AND ab.col2=987”


This rewritten query would be handled by the FDW table that I previously added 
to the catalog.

The reason I want this new hook is that I don’t want tableA and tableB to be in 
the catalog.

Looking forward to hear your thoughts, opinions, comments.

Best regards, David





post_raw_parse.diff
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New hook after raw parsing, before analyze

2014-02-13 Thread David Beck
Thanks for the link.

I want flexibility. Here is a situation: my hook knows the size of tableA and 
tableB on the legacy side. It should be able to decide wether to offload the 
join/filter onto the legacy side or not. At the same time it can start 
transferring the data to real Postgres tables in the background. When the data 
is here in the local DB it can change the rewrite to use the local tables 
instead for the new queries.

Answering your question:

I don’t know what you mean by pseudo table. The “fdw_tableA_tableB” is a real 
FDW table with a catalog entry, with real catalog columns. tableA and tableB 
are non-real tables in my setup. My understanding is the 
raw_parse-transform-analyze triple first checks the catalog in the transform 
phase. That’s why I want the hook to be after raw_parse but before transform, 
so the transform phase won’t see tableA, tableB and their columns, thus I don’t 
expect errors.

Thank you, David


Please find my answers inline.

2014.02.13. dátummal, 13:28 időpontban Kohei KaiGai kai...@kaigai.gr.jp írta:

 See the discussion of Custom-Scan API.
 https://commitfest.postgresql.org/action/patch_view?id=1282
 
 I believe my third patch is what you really want to do...
 
 This rewritten query would be handled by the FDW table that I previously 
 added to the catalog.
 
 The reason I want this new hook is that I don't want tableA and tableB to be 
 in the catalog.
 
 I'd like to see why you wants the pseudo table fdw_tableA_tableB to
 be in the catalog,
 instead of the tableA and tableB. In addition, parser shall raise
 an error if referenced
 columns (as a part of tableA or tableB) are not in-catalog because
 of name lookup
 error.
 
 Thanks,
 
 
 2014-02-13 19:01 GMT+09:00 David Beck db...@starschema.net:
 Hello Hackers,
 
 I work on a foreign data wrapper for a legacy system. I generally find the 
 hook system very useful and flexible way to extend Postgres.
 The post parse analyze hook almost fits what I need, but I have a few use 
 cases where I would need to tap right into the parsed queries but before any 
 catalog based validation is done.
 Please find the attached trivial patch for this new hook.
 
 One of the use cases I have is this:
 
 I have table like data structures in the source system for the FDW I work on.
 These tables are sometimes too big and the source system is able to filter 
 and join them with limitations, thus it is not optimal to transfer the data 
 to Postgres.
 At the same time I want the users to think in terms of the original tables.
 
 The idea is to rewrite the SQL queries like this:
 
  SELECT * FROM tableA a, tableB b WHERE a.id=b.id AND a.col1=1234 AND 
 b.col2=987
 
 to:
 
  SELECT * FROM fdw_tableA_tableB ab WHERE ab.col1=1234 AND ab.col2=987
 
 
 This rewritten query would be handled by the FDW table that I previously 
 added to the catalog.
 
 The reason I want this new hook is that I don't want tableA and tableB to be 
 in the catalog.
 
 Looking forward to hear your thoughts, opinions, comments.
 
 Best regards, David
 
 
 
 
 
 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers
 
 
 
 
 -- 
 KaiGai Kohei kai...@kaigai.gr.jp



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers