[HACKERS] WIP patch for LATERAL subqueries

2012-08-05 Thread Tom Lane
I've been idly amusing myself by trying to hack up support for
SQL-standard LATERAL subqueries.  I've got something that turns over,
more or less:

regression=# select * from int4_tbl a, lateral (select unique1,unique2 from 
tenk1 b where a.f1 = unique1) x;
 f1 | unique1 | unique2 
+-+-
  0 |   0 |9998
(1 row)

regression=# explain select * from int4_tbl a, lateral (select unique1,unique2 
from tenk1 b where a.f1 = unique1) x;
QUERY PLAN  
   
---
 Nested Loop  (cost=0.00..42.55 rows=5 width=12)
   -  Seq Scan on int4_tbl a  (cost=0.00..1.05 rows=5 width=4)
   -  Index Scan using tenk1_unique1 on tenk1 b  (cost=0.00..8.28 rows=1 
width=8)
 Index Cond: (a.f1 = unique1)
(4 rows)

but there's a good deal of work left to do, some of which could use some
discussion.

Feature/semantics issues:

Currently the patch only implements the syntax called out in the standard,
namely that you can put LATERAL in front of a derived table, which is
to say a parenthesized sub-SELECT in FROM.  It strikes me that it might be
worth allowing LATERAL with a function-in-FROM as well.  So basically
LATERAL func(args) alias
would be an allowed abbreviation for
LATERAL (SELECT * FROM func(args)) alias
Since the standard doesn't have function-in-FROM, it has nothing to say
about whether this is sane or not.  The argument for this is mainly that
SRFs are one of the main use-cases for LATERAL (replacing SRF-in-the-
SELECT-list usages), so we might as well make it convenient.  Any opinions
pro or con about that?

While fooling around in the planner I realized that I have no idea what
outer-level aggregates mean in a LATERAL subquery, and neither does
Postgres:
regression=# select 1 from tenk1 a, lateral (select * from int4_tbl b where f1 
= max(a.unique1)) x;
ERROR:  plan should not reference subplan's variable
I don't see anything prohibiting this in SQL:2008, but ordinarily this
would be taken to be an outer-level aggregate, and surely that is not
sensible in the LATERAL subquery.  For the moment it seems like a good
idea to disallow it, though I am not sure where is a convenient place
to test for such things.  Has anyone got a clue about whether this is
well-defined, or is it simply an oversight in the spec?

Parser issues:

I'm reasonably happy with the grammar patch, though tempted to refactor
it to reduce the amount of duplication (and would be more tempted if we
add LATERAL function calls).  I'm thinking that an opt_alias production
could be used to eliminate the duplication, and am also strongly tempted
to move the error for no subselect alias out of the grammar and into
transformRangeSubselect.

Note that I made LATERAL be col_name_keyword.  It can no longer be allowed
as a function name because this would be formally ambiguous:
LATERAL ((SELECT x FROM t)) t(x)
Is that a call on a function named LATERAL with a scalar-subquery
argument, or is it a LATERAL subquery with extra parentheses?  However,
there seems no point in making it fully reserved.  The table_ref
productions would still have to be repeated, because even with LATERAL
fully reserved, we can't combine them using an opt_lateral production.
On seeing ( at the start of a FROM item, the parser doesn't know enough
to decide whether it should reduce opt_lateral to empty, which would be
the appropriate thing if the ( starts a sub-select but not if it is,
say, a parenthesized JOIN tree.  We could only avoid that by allowing
opt_lateral before every type of table_ref and then throwing explicit
errors for the disallowed cases, which doesn't end up making the grammar
simpler.

Although lateral cross-references work okay for the successive-FROM-items
case, they don't work at all yet for JOIN cases:

regression=#  select * from int4_tbl a join lateral (select unique1,unique2 
from tenk1 b where f1 = unique1) x on true;
ERROR:  column f1 does not exist
LINE 1: ...ateral (select unique1,unique2 from tenk1 b where f1 = uniqu...
 ^

regression=#  select * from int4_tbl a join lateral (select unique1,unique2 
from tenk1 b where a.f1 = unique1) x on true;
ERROR:  invalid reference to FROM-clause entry for table a
LINE 1: ...ateral (select unique1,unique2 from tenk1 b where a.f1 = uni...
 ^
HINT:  There is an entry for table a, but it cannot be referenced from this 
part of the query.

The reason that the separate-FROM-items case works is that
transformFromClause pushes each FROM-clause item into p_relnamespace and
p_varnamespace immediately after parsing it, making those names visible
during parsing of subsequent FROM items.  However, transformFromClauseItem
doesn't push the left-hand item into the lists before parsing the
right-hand item.

Now, the way this is 

Re: [HACKERS] WIP patch for LATERAL subqueries

2012-08-05 Thread Andrew Dunstan


On 08/05/2012 05:58 PM, Tom Lane wrote:

I've been idly amusing myself by trying to hack up support for
SQL-standard LATERAL subqueries.  I've got something that turns over,
more or less:



Awesome!!



Currently the patch only implements the syntax called out in the standard,
namely that you can put LATERAL in front of a derived table, which is
to say a parenthesized sub-SELECT in FROM.  It strikes me that it might be
worth allowing LATERAL with a function-in-FROM as well.  So basically
LATERAL func(args) alias
would be an allowed abbreviation for
LATERAL (SELECT * FROM func(args)) alias
Since the standard doesn't have function-in-FROM, it has nothing to say
about whether this is sane or not.  The argument for this is mainly that
SRFs are one of the main use-cases for LATERAL (replacing SRF-in-the-
SELECT-list usages), so we might as well make it convenient.  Any opinions
pro or con about that?



Pro. As you say this is the main use case, and the longer syntax just 
seems unnecessary fluff.


I'll comment on the rest of you email later, but this is just great 
news. Hardly a month goes by that I don't wish for LATERAL.



cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for LATERAL subqueries

2012-08-05 Thread Tom Lane
I wrote:
 While fooling around in the planner I realized that I have no idea what
 outer-level aggregates mean in a LATERAL subquery, and neither does
 Postgres:
 regression=# select 1 from tenk1 a, lateral (select * from int4_tbl b where 
 f1 = max(a.unique1)) x;
 ERROR:  plan should not reference subplan's variable
 I don't see anything prohibiting this in SQL:2008, but ordinarily this
 would be taken to be an outer-level aggregate, and surely that is not
 sensible in the LATERAL subquery.  For the moment it seems like a good
 idea to disallow it, though I am not sure where is a convenient place
 to test for such things.  Has anyone got a clue about whether this is
 well-defined, or is it simply an oversight in the spec?

On further reflection I think this is indeed disallowed by spec.  The
outer query is clearly the aggregation query of the aggregate, and the
aggregate appears inside that query's FROM list, therefore it's no good;
see SQL:2008 6.9 set function specification syntax rules 6 and 7.
(I missed this before because it's not under the aggregate function
heading.)

So the problem here is just that parseCheckAggregates neglects to grovel
through subqueries-in-FROM looking for aggregates of the current level.
Since AFAICS the case cannot arise without LATERAL, this isn't really a
pre-existing bug.

I find it fairly annoying though that parseCheckAggregates (and likewise
parseCheckWindowFuncs) have to dig through previously parsed query trees
to look for misplaced aggregates; so adding even more of that is grating
on me.  It would be a lot cleaner if transformAggregateCall and
transformWindowFuncCall could throw these errors immediately.  The
reason they can't is lack of context about what portion of the query we
are currently parsing.  I'm thinking it'd be worthwhile to add an enum
field to ParseState that shows whether we're currently parsing the
associated query level's target list, WHERE clause, GROUP BY clause,
etc.  The easiest way to ensure this gets set for all cases should be to
add the enum value as another argument to transformExpr(), which
would then save it into the ParseState for access by subsidiary
expression transformation functions.

Thoughts?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for LATERAL subqueries

2012-08-05 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 On 08/05/2012 05:58 PM, Tom Lane wrote:
 Currently the patch only implements the syntax called out in the standard,
 namely that you can put LATERAL in front of a derived table, which is
 to say a parenthesized sub-SELECT in FROM.  It strikes me that it might be
 worth allowing LATERAL with a function-in-FROM as well.

 Pro. As you say this is the main use case, and the longer syntax just 
 seems unnecessary fluff.

After some experimentation it seems that this only works if we promote
LATERAL to a fully reserved keyword.  Apparently the reason is that
given non-reserved LATERAL followed by an identifier, it's not clear
without additional lookahead whether we have LATERAL func_name ...
or the LATERAL is a table name and the identifier is an alias.  And the
parser has to make a shift/reduce decision before it can look beyond the
identifier.  (Without the LATERAL func_name syntax, there's no ambiguity
because LATERAL in its keyword meaning must be immediately followed by a
left paren.)

Since LATERAL has been a reserved word in every SQL spec since SQL:99,
I don't feel too bad about making it fully reserved for us too, but
nonetheless this is a cost of adding this syntax.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP Patch: Use sortedness of CSV foreign tables for query planning

2012-08-05 Thread Etsuro Fujita
Hi Robert,

 From: Robert Haas [mailto:robertmh...@gmail.com]

 On Thu, Aug 2, 2012 at 7:01 AM, Etsuro Fujita
 fujita.ets...@lab.ntt.co.jp wrote:
  The following is a comment at fileGetForeignPaths() in contrib/file_fdw.c:
 
  /*
   * If data file was sorted, and we knew it somehow, we could insert
   * appropriate pathkeys into the ForeignPath node to tell the planner
   * that.
   */
 
  To do this, I would like to propose new generic options for a file_fdw
foreign
  table to specify the sortedness of a data file.  While it is best to allow
 to
  specify the sortedness on multiple columns, the current interface for the
  generic options dose not seems to be suitable for doing it.  As a
compromise,
 I
  would like to propose single-column sortedness options and insert
appropriate
  pathkeys into the ForeignPath node based on these information:
 
 I am not sure it is a good idea to complicate file_fdw with frammishes
 of marginal utility.  I guess I tend to view things like file_fdw as a
 mechanism for getting the data into the database, not necessarily
 something that you actually want to keep your data in permanently and
 run complex queries against.

I think file_fdw is useful for managing log files such as PG CSV logs.  Since
often, such files are sorted by timestamp, I think the patch can improve the
performance of log analysis, though I have to admit my demonstration was not
realistic.

 It seems like that's the direction we're
 headed in here - statistics, indexing, etc.  I am all in favor of
 having some kind of pluggable storage engine as an alternative to our
 heap, but I'm not sure a flat-file is a good choice.

As you pointed out, I would like to allow indexing to be done for CSV foreign
tables, but that is another problem.  The submitted patch or the above comment
is not something toward indexing, so to say, an optimization of the current
file_fdw module.

Thanks,

Best regards,
Etsuro Fujita


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for LATERAL subqueries

2012-08-05 Thread Pavel Stehule
2012/8/6 Tom Lane t...@sss.pgh.pa.us:
 Andrew Dunstan and...@dunslane.net writes:
 On 08/05/2012 05:58 PM, Tom Lane wrote:
 Currently the patch only implements the syntax called out in the standard,
 namely that you can put LATERAL in front of a derived table, which is
 to say a parenthesized sub-SELECT in FROM.  It strikes me that it might be
 worth allowing LATERAL with a function-in-FROM as well.

 Pro. As you say this is the main use case, and the longer syntax just
 seems unnecessary fluff.

 After some experimentation it seems that this only works if we promote
 LATERAL to a fully reserved keyword.  Apparently the reason is that
 given non-reserved LATERAL followed by an identifier, it's not clear
 without additional lookahead whether we have LATERAL func_name ...
 or the LATERAL is a table name and the identifier is an alias.  And the
 parser has to make a shift/reduce decision before it can look beyond the
 identifier.  (Without the LATERAL func_name syntax, there's no ambiguity
 because LATERAL in its keyword meaning must be immediately followed by a
 left paren.)

 Since LATERAL has been a reserved word in every SQL spec since SQL:99,
 I don't feel too bad about making it fully reserved for us too, but
 nonetheless this is a cost of adding this syntax.

+1

Pavel


 regards, tom lane

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Performance Improvement by reducing WAL for Update Operation

2012-08-05 Thread Amit Kapila
From: Bruce Momjian [mailto:br...@momjian.us] 
Sent: Saturday, August 04, 2012 8:06 PM
On Sat, Aug  4, 2012 at 05:21:06PM +0300, Heikki Linnakangas wrote:
 On 04.08.2012 11:01, Amit Kapila wrote:
Missed one point which needs to be handled is pg_upgrade
 
 I don't think there's anything to do for pg_upgrade. This doesn't
 change the on-disk data format, just the WAL format, and pg_upgrade
 isn't sensitive to WAL format changes.

Correct.

Thanks Bruce and Heikki for this information. 

I need your feedback on the below design point, as it will make my further
work on this performance issue more clear.
Also let me know if the explanation below is not clear, I shall try to use
some examples to explain my point.

Currently the solution for fixed length columns cannot handle the case of
variable length columns and NULLS. The reason is for fixed length columns
there is no need of diff technology between old and new tuple, however for
other cases it will be required.
For fixed length columns, if we just note the OFFSET, LENGTH, VALUE of
changed columns of new tuple in WAL, it will be sufficient to do the replay
of WAL. However to handle other cases we need to use diff mechanism.

Can we do something like if the changed columns are fixed length and doesn't
contain NULL's, then store [OFFSET, LENGTH, VALUE] format in WAL and for
other cases store diff format.

This has advantage that for Updates containing only fixed length columns
don't have to pay penality of doing diff between new and old tuple. Also we
can do the whole work in 2 parts, one for fixed length columns and second to
handle other cases.


With Regards,
Amit Kapila.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Windows Streaming replication -- Windows 2008 servers

2012-08-05 Thread chinnaobi
Hi All,

It would be helpful if someone tell me, how to verify the streaming
replication started in standby server successfully, just after the service
is started ?? Like knowing the exit code of the service started...

I know we can do it by verifying the below functions after the service is
started ..
select pg_last_xlog_replay_location()
select pg_last_xlog_current_location()
select pg_last_xlog_receive_location()

Regards,
Reddy.



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Windows-Streaming-replication-Windows-2008-servers-tp5718921.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.