[HACKERS] WIP Patch: Use sortedness of CSV foreign tables for query planning

2012-08-02 Thread Etsuro Fujita
The following is a comment at fileGetForeignPaths() in contrib/file_fdw.c:

/*
 * If data file was sorted, and we knew it somehow, we could insert
 * appropriate pathkeys into the ForeignPath node to tell the planner
 * that.
 */

To do this, I would like to propose new generic options for a file_fdw foreign
table to specify the sortedness of a data file.  While it is best to allow to
specify the sortedness on multiple columns, the current interface for the
generic options dose not seems to be suitable for doing it.  As a compromise, I
would like to propose single-column sortedness options and insert appropriate
pathkeys into the ForeignPath node based on these information:

 sorted: Boolean option to specify whether data file is sorted by a column.
 key: Specifies the name of a column by which data file is sorted.  Required
when the above option is set to true.
 direction: Specifies the sort order: asc or desc.  The default is asc.
 nulls: Specifies that nulls sort before or after non-nulls: first or last.
first is the default when direction option is set desc.  When direction option
is not set desc, last is the default.

Attached is a WIP patch implementing this feature.  I would like to demonstrate
the usefulness of the patch.  Experimental results are shown below.  Here, data
in /home/pgsql/relation.csv is sorted by aid in ascending order.

postgres=# CREATE EXTENSION file_fdw;
CREATE EXTENSION
postgres=# CREATE SERVER fs FOREIGN DATA WRAPPER file_fdw;
CREATE SERVER
postgres=# CREATE FOREIGN TABLE ft (aid INTEGER, bid INTEGER) SERVER fs OPTIONS
(filename '/home/pgsql/relation.csv', format 'csv', delimiter ',');
CREATE FOREIGN TABLE
postgres=# ANALYZE ft;
ANALYZE
postgres=# SELECT count(*) FROM ft;
  count
--
 5000
(1 row)

postgres=# SELECT count(DISTINCT aid) FROM ft;
  count
-
 100
(1 row)

postgres=# EXPLAIN ANALYZE SELECT DISTINCT aid FROM ft ORDER BY aid;
 QUERY PLAN



 Unique  (cost=14187375.19..14437375.19 rows=1092929 width=4) (actual time=48952
.602..62788.934 rows=100 loops=1)
   -  Sort  (cost=14187375.19..14312375.19 rows=5000 width=4) (actual time=
48952.601..56635.448 rows=5000 loops=1)
 Sort Key: aid
 Sort Method: external sort  Disk: 684272kB
 -  Foreign Scan on ft  (cost=0.00..5059137.00 rows=5000 width=4) (
actual time=0.073..18324.062 rows=5000 loops=1)
   Foreign File: /home/pgsql/relation.csv
   Foreign File Size: 48500
 Total runtime: 63019.868 ms
(8 rows)

postgres=# ALTER FOREIGN TABLE ft OPTIONS ( ADD  sorted 'true', key 'aid' );
ALTER FOREIGN TABLE
postgres=# EXPLAIN ANALYZE SELECT DISTINCT aid FROM ft ORDER BY aid;
  QUERY PLAN


--
 Unique  (cost=0.00..5184137.00 rows=1092929 width=4) (actual time=0.074..23124.
195 rows=100 loops=1)
   -  Foreign Scan on ft  (cost=0.00..5059137.00 rows=5000 width=4) (actual
 time=0.070..17633.821 rows=5000 loops=1)
 Foreign File: /home/pgsql/relation.csv
 Foreign File Size: 48500
 Total runtime: 23213.909 ms
(5 rows)

Any comments and suggestions are welcomed.

Thanks,

Best regards,
Etsuro Fujita


file-fdw-pathkeys-0802.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Help me develop new commit_delay advice

2012-08-02 Thread Amit Kapila
 From: Peter Geoghegan [mailto:pe...@2ndquadrant.com] 
 Sent: Wednesday, August 01, 2012 8:49 PM

On 1 August 2012 15:14, Amit Kapila amit.kap...@huawei.com wrote:
 I shall look into this aspect also(setting commit_delay based on raw
sync).
 You also suggest if you want to run the test with different
configuration.

 Well, I was specifically interested in testing if half of raw sync
 time was a widely useful setting, across a variety of different,
 though representative I/O subsystems. Unfortunately, without some
 context about raw sync speed to go along with your numbers, I cannot
 advance or disprove that idea.

Raw sync speed data
--
2 seconds per test 
O_DIRECT supported on this platform for open_datasync and open_sync. 

Compare file sync methods using one 8kB write: 
(in wal_sync_method preference order, except fdatasync 
is Linux's default) 
open_datasync n/a 
fdatasync 165.506 ops/sec 
fsync 166.647 ops/sec 
fsync_writethroughn/a 
open_sync 164.654 ops/sec 

165.506 * 8KB operations can perform in one sec. 
so 1 * 8KB operation takes 6.042 msec.

 It would also have been nice to see a baseline number of 0 too, to get
 an idea of how effective commit_delay may now be. However, that's
 secondary.

In the data sent yesterday commit_delay=0 was there.


With Regards,
Amit Kapila.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Issue in Behavior of Interval Datatype

2012-08-02 Thread Amit Kapila
select (interval '56:48'  minute to second);
result:00:56:48
select (interval '-56:48'  minute to second);
result:-56:48:00
select (interval '+56:48'  minute to second);
result:56:48:00



When user uses ‘+’ or ‘-‘ symbol, then minute to second range is getting
ignored.



I have checked the code and found that in function DecodeInterval(), for
timezone case (DTK_TZ) it uses INTERVAL_FULL_RANGE irrespective of range
passed by user.

However if use the range passed as argument in function DecodeInterval(),
the result of using ‘+’ or ‘-‘ is same as without using it.

Is there any particular reason for ignoring the range for DTK_TZ case in
DecodeInterval() function?





With Regards,

Amit Kapila





[HACKERS] Reminder: 9.2beta3 wraps tonight

2012-08-02 Thread Tom Lane
Per recent discussion, today is the tarball wrap day for beta3.

I plan to wrap the tarballs around 2200 UTC.  I will have the
libpq single-row mode patch in before then, and if time permits
I'll see if I understand how to fix SPGiST now.  It'd be nice if
relevant people could review and commit the available patches for
other 9.2 open issues ...
http://wiki.postgresql.org/wiki/PostgreSQL_9.2_Open_Items

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] cataloguing NOT NULL constraints

2012-08-02 Thread Kevin Grittner
Alvaro Herrera alvhe...@commandprompt.com wrote:
 
 Just over a year ago, I posted a patch (based on a previous patch
 by Bernd Helmle) that attempted to add pg_constraint rows for NOT
 NULL
 constraints.  

http://archives.postgresql.org/message-id/20110707213401.ga27...@alvh.no-ip.org
 That patch was rather long and complex, as it tried to handle all
 the hairy issues directly with a completely new 'contype' value
 for NOT NULL constraints; so the code had to deal with inheritance
 of constraints, pg_dump issues, and a lot of nitty-gritty.  In the
 end it was killed by a simple realization of Peter Eisentraut's:
 Why not just transform these into the equivalent CHECK
 constraints instead?  That ended up being discussing at length,
 and this patch, much smaller than the previous one,  is an attempt
 to do things that way.
 
 This patch is not final yet, because there are some issues still
 open; but the interesting stuff already works.  Simply declaring a
 column as NOT NULL creates a CHECK pg_constraint row; similarly,
 declaring a CHECK (foo IS NOT NULL) constraint sets the
 pg_attribute.attnotnull flag.  If you create a child table, the
 NOT NULL constraint will be inherited.
 
Don't forget the peculiarities of columns with record types. 
Semantically, these three things are different:
 
colname rectype not null
colname rectype check (colname is not null)
colname rectype check (not (colname is null))
 
test=# create table t (id int primary key, v1 a not null, v2 a check
(v2 is not null), v3 a check (not (v3 is null)));
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
t_pkey for table t
CREATE TABLE
test=# insert into t values (1, (1,1), (1,1), (1,1));
INSERT 0 1
test=# insert into t values (2, (1, null), (1, null), (1,1));
ERROR:  new row for relation t violates check constraint
t_v2_check
DETAIL:  Failing row contains (2, (1,), (1,), (1,1)).
test=# insert into t values (3, (1, null), (1,1), (1, null));
INSERT 0 1
test=# insert into t values (4, (null, null), (1,1), (1, null));
INSERT 0 1
test=# insert into t values (5, (null, null), (1,1), (null, null));
ERROR:  new row for relation t violates check constraint
t_v3_check
DETAIL:  Failing row contains (5, (,), (1,1), (,)).
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] cataloguing NOT NULL constraints

2012-08-02 Thread Kevin Grittner
Kevin Grittner kevin.gritt...@wicourts.gov wrote:
 
 Don't forget the peculiarities of columns with record types. 
 
I forgot to include the type creation in the example:
 
test=# create type a as (a1 int, a2 int);
CREATE TYPE
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reminder: 9.2beta3 wraps tonight

2012-08-02 Thread Robert Haas
On Thu, Aug 2, 2012 at 10:25 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Per recent discussion, today is the tarball wrap day for beta3.

Was this discussion on some closed mailing list?  I don't remember seeing it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reminder: 9.2beta3 wraps tonight

2012-08-02 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Thu, Aug 2, 2012 at 10:25 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Per recent discussion, today is the tarball wrap day for beta3.

 Was this discussion on some closed mailing list?  I don't remember seeing it.

http://archives.postgresql.org/pgsql-hackers/2012-07/msg00897.php

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] avoid unnecessary failure to open restored WAL files

2012-08-02 Thread Fujii Masao
Hi,

In HEAD and 9.2, the following scenario happens in archive recovery.

1. The archived WAL file is restored onto the temporary file name
RECOVERYXLOG.
2. The restored WAL file is renamed to the correct file name like
00010002.
3. The startup process tries to open the temporary file even though
it's already been renamed
and doesn't exist. This always fails.
4. The startup process retries to open the correct file as a WAL file
in pg_xlog directory instead
of the archived file. This succeeds.

The above failure of file open is unnecessary, so I think we can avoid
that. Attached patch
changes the startup process so that it opens the correct restored WAL
file after restoring the
archived WAL file.

Regards,

-- 
Fujii Masao


file_open_failure_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP Patch: Use sortedness of CSV foreign tables for query planning

2012-08-02 Thread Kevin Grittner
Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote:
 
 I would like to propose single-column sortedness options and
 insert appropriate pathkeys into the ForeignPath node based on
 these information:
 
  sorted: Boolean option to specify whether data file is sorted by
  a column.
  key: Specifies the name of a column by which data file is sorted.
  Required when the above option is set to true.
  direction: Specifies the sort order: asc or desc.  The default is
  asc.
  nulls: Specifies that nulls sort before or after non-nulls: first
  or last.  first is the default when direction option is set
  desc.  When direction option is not set desc, last is the
  default.
 
For character-based columns, don't we need to know the collation?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Help me develop new commit_delay advice

2012-08-02 Thread Josh Berkus
Peter,

For some reason I didn't receive the beginning of this thread.  Can you
resend it to me, or (better) post it to the pgsql-performance mailing list?

I have a linux system where I can test both on regular disk and on SSD.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Help me develop new commit_delay advice

2012-08-02 Thread Peter Geoghegan
On 29 July 2012 16:39, Peter Geoghegan pe...@2ndquadrant.com wrote:
 Many of you will be aware that the behaviour of commit_delay was
 recently changed. Now, the delay only occurs within the group commit
 leader backend, and not within each and every backend committing a
 transaction:

I've moved this to the pgsql-performance list. Please continue the
discussion there.

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: SPGiST versus hot standby - question about conflict resolution rules

2012-08-02 Thread Tom Lane
Noah Misch n...@leadboat.com writes:
 On Mon, Mar 12, 2012 at 10:50:36PM -0400, Tom Lane wrote:
 There is one more (known) stop-ship problem in SPGiST, which I'd kind of
 like to get out of the way now before I let my knowledge of that code
 get swapped out again.  This is that SPGiST is unsafe for use by hot
 standby slaves.
 The problem comes from redirect tuples, which are short-lifespan
 objects that replace a tuple that's been moved to another page.
 A redirect tuple can be recycled as soon as no active indexscan could
 be in flight from the parent index page to the moved tuple.  SPGiST
 implements this by marking each redirect tuple with the XID of the
 creating transaction, and assuming that the tuple can be recycled once
 that XID is below the OldestXmin horizon (implying that all active
 transactions started after it ended).  This is fine as far as
 transactions on the master are concerned, but there is no guarantee that
 the recycling WAL record couldn't be replayed on a hot standby slave
 while there are still HS transactions that saw the old state of the
 parent index tuple.
 
 Now, btree has a very similar problem with deciding when it's safe to
 recycle a deleted index page: it has to wait out transactions that could
 be in flight to the page, and it does that by marking deleted pages with
 XIDs.

 Attempting to write an explanation for that btree code led me conclude that
 the code is incorrect.  (FWIW, I caused that wrongness.)  I will start a
 separate thread to fix it.

After reviewing this thread and the one at
http://archives.postgresql.org/message-id/20120421165202.ga13...@tornado.leadboat.com
I believe that the SPGiST issue should be fixed as follows:

* It's okay for transactions inserting redirection tuples to use their
own XID as the marker.

* In spgvacuum.c, the test in vacuumRedirectAndPlaceholder to decide if
a redirection tuple is recyclable should use
TransactionIdPrecedes(dt-xid, RecentGlobalXmin)
rather than comparing against OldestXmin as it does now.  (This allows
some consequent simplification since the module need not pass around
an OldestXmin parameter.)

* vacuumRedirectAndPlaceholder must also compute the newest XID among
the redirection tuples it removes from the page, and store that in
a new field of XLOG_SPGIST_VACUUM_REDIRECT WAL records.

* In spgRedoVacuumRedirect, call ResolveRecoveryConflictWithSnapshot
with the newest-redirect XID.

I had first thought that we could avoid a change in WAL contents by
having spgRedoVacuumRedirect compute the cutoff XID for itself by
examining the removed tuples, but that doesn't work: XLogInsert might
for instance have decided to store a full-page image, in which case the
information isn't available by inspection of the old page contents.
But we still have to enforce the interlock against hot standby xacts.

In principle we should bump the XLOG page magic number for this change
in WAL contents, but I'm inclined not to because it'd cause pain for
beta testers, and there are probably very few people who'd have live
XLOG_SPGIST_VACUUM_REDIRECT records in play when they switch to beta3.

Comments?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [patch] libpq one-row-at-a-time API

2012-08-02 Thread Marko Kreen
On Thu, Aug 2, 2012 at 8:19 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Marko Kreen mark...@gmail.com writes:
 On Wed, Aug 1, 2012 at 6:18 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 So I'm working from the first set of patches in your message
 20120721194907.ga28...@gmail.com.

 Great!

 Here's an updated draft patch.  I've not looked at the docs yet so
 this doesn't include that, but I'm reasonably happy with the code
 changes now.  The main difference from what you had is that I pushed
 the creation of the single-row PGresults into pqRowProcessor, so that
 they're formed immediately while scanning the input message.  That
 other method with postponing examination of the rowBuf does not work,
 any more than it did with the original patch, because you can't assume
 that the network input buffer is going to hold still.  For example,
 calling PQconsumeInput after parseInput has absorbed a row-data message
 but before calling PQgetResult would likely break things.

 In principle I suppose we could hack PQconsumeInput enough that it
 didn't damage the row buffer while still meeting its API contract of
 clearing the read-ready condition on the socket; but it wouldn't be
 simple or cheap to do so.

Any code using single-row-mode is new.  Any code calling consumeInput
before fully draining rows from buffer is buggy, as it allows unlimited growth
of network buffer, which breaks whole reason to use single-row mode.

It was indeed bug in previous patch that it did not handle this situation,
but IMHO it should be handled with error and not with allowing such code
to work.

Previously, whatever sequence the fetch functions were called, the apps
max memory usage was either 1x resultset size, or max 2x resultset size,
if it messed the sequence somehow.  But no more.  So it app knew
that resultset was big, it needed to split it up.

Now with single-row-mode, the apps can assume their max memory usage
is 1*row size + network buffer, lets simplify that to 2x row size.
But no more.  And now they can process huge resultsets assuming
their memory usage will be no more than 2x row size.

And now libpq allows such apps to go from 2x row size to full resultset
size with unfortunate coding.  Why?


I have opinions about reorg details too, but that's mostly matter of taste,
so rather unimportant compared to question whether we should allow
code to break the guarantees the single-row-mode gives.

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [patch] libpq one-row-at-a-time API

2012-08-02 Thread Tom Lane
Marko Kreen mark...@gmail.com writes:
 On Thu, Aug 2, 2012 at 8:19 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 In principle I suppose we could hack PQconsumeInput enough that it
 didn't damage the row buffer while still meeting its API contract of
 clearing the read-ready condition on the socket; but it wouldn't be
 simple or cheap to do so.

 Any code using single-row-mode is new.  Any code calling consumeInput
 before fully draining rows from buffer is buggy, as it allows unlimited growth
 of network buffer, which breaks whole reason to use single-row mode.

Sorry, that argument will not fly.  The API specification for
PQconsumeInput is that it can be called at any time to drain the kernel
input buffer, without changing the logical state of the PGconn.  It's
not acceptable to insert a parenthetical oh, but you'd better not try
it in single-row mode caveat.

The reason I find this of importance is that PQconsumeInput is meant to
be used in an application's main event loop for the sole purpose of
clearing the read-ready condition on the connection's socket, so that
it can wait for other conditions.  This could very well be decoupled
from the logic that is involved in testing PQisBusy and
fetching/processing actual results.  (If we had not intended to allow
those activities to be decoupled, we'd not have separated PQconsumeInput
and PQisBusy in the first place.)  Introducing a dependency between
these things could lead to non-reproducible, timing-dependent, very
hard-to-find bugs.

By the same token, if somebody were trying to put single-row-mode
support into an existing async application, they'd be fooling with the
parts that issue new queries and collect results, but probably not with
the main event loop.  Thus, I do not believe your argument above that
any code using single-row mode is new.  The whole app is not going to
be new.  It could be very hard to guarantee that there is no path of
control that invokes PQconsumeInput before the new data is actually
processed, because there was never before a reason to avoid extra
PQconsumeInput calls.

As I said, this is the exact same objection I had to the original scheme
of exposing the row buffer directly.  Putting a libpq function in front
of the row buffer doesn't solve the problem if that function is
implemented in a way that's still vulnerable to misuse or race conditions.

 And now libpq allows such apps to go from 2x row size to full resultset
 size with unfortunate coding.  Why?

This argument is completely nonsensical.  The contract for
PQconsumeInput has always been that it consumes whatever the kernel has
available.  If you don't extract data from the library before calling it
again, the library's internal buffer may grow to more than the minimum
workable size, but that's a tradeoff you may be willing to make to
simplify your application logic.  I do not think that it's an
improvement to change the API spec to say either that you can't call
PQconsumeInput in certain states, or that you can but it won't absorb
data from the kernel.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [patch] libpq one-row-at-a-time API

2012-08-02 Thread Marko Kreen
On Fri, Aug 3, 2012 at 12:01 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Marko Kreen mark...@gmail.com writes:
 On Thu, Aug 2, 2012 at 8:19 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 In principle I suppose we could hack PQconsumeInput enough that it
 didn't damage the row buffer while still meeting its API contract of
 clearing the read-ready condition on the socket; but it wouldn't be
 simple or cheap to do so.

 Any code using single-row-mode is new.  Any code calling consumeInput
 before fully draining rows from buffer is buggy, as it allows unlimited 
 growth
 of network buffer, which breaks whole reason to use single-row mode.

 Sorry, that argument will not fly.  The API specification for
 PQconsumeInput is that it can be called at any time to drain the kernel
 input buffer, without changing the logical state of the PGconn.  It's
 not acceptable to insert a parenthetical oh, but you'd better not try
 it in single-row mode caveat.

Well, if the old API contract must be kept, then so be it.

 The reason I find this of importance is that PQconsumeInput is meant to
 be used in an application's main event loop for the sole purpose of
 clearing the read-ready condition on the connection's socket, so that
 it can wait for other conditions.  This could very well be decoupled
 from the logic that is involved in testing PQisBusy and
 fetching/processing actual results.  (If we had not intended to allow
 those activities to be decoupled, we'd not have separated PQconsumeInput
 and PQisBusy in the first place.)  Introducing a dependency between
 these things could lead to non-reproducible, timing-dependent, very
 hard-to-find bugs.

 By the same token, if somebody were trying to put single-row-mode
 support into an existing async application, they'd be fooling with the
 parts that issue new queries and collect results, but probably not with
 the main event loop.  Thus, I do not believe your argument above that
 any code using single-row mode is new.  The whole app is not going to
 be new.  It could be very hard to guarantee that there is no path of
 control that invokes PQconsumeInput before the new data is actually
 processed, because there was never before a reason to avoid extra
 PQconsumeInput calls.

I've thought about this.  On first glance indeed, if async app
has both reader and writer registered in event loop, it might be
simpler to keep reading from source socket, event if writer stalls.

But this is exactly the situation where error from consumeInput would help.
Imagine fast source and slow target (common scenario - a database query
for each row).  Reading from source *must* be stopped to get
predictable memory usage,.

 As I said, this is the exact same objection I had to the original scheme
 of exposing the row buffer directly.  Putting a libpq function in front
 of the row buffer doesn't solve the problem if that function is
 implemented in a way that's still vulnerable to misuse or race conditions.

 And now libpq allows such apps to go from 2x row size to full resultset
 size with unfortunate coding.  Why?

 This argument is completely nonsensical.  The contract for
 PQconsumeInput has always been that it consumes whatever the kernel has
 available.  If you don't extract data from the library before calling it
 again, the library's internal buffer may grow to more than the minimum
 workable size, but that's a tradeoff you may be willing to make to
 simplify your application logic.  I do not think that it's an
 improvement to change the API spec to say either that you can't call
 PQconsumeInput in certain states, or that you can but it won't absorb
 data from the kernel.

Old patch had a nice property that we could replace PQgetResult()
with something else.  To allow that and also PQconsumeInput(),
we could store offsets in rowBuf, and then skip realign in PQconsumeInput().

Not sure if the replacement reason is worth keeping, but the
offsets may be useful even now as they might give additional
speedup as they decrease the per-column storage
from 16 to 8 bytes on 64-bit architectures.

May be better left for 9.3 though...

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP pgindent replacement

2012-08-02 Thread Bruce Momjian
On Tue, Jun 21, 2011 at 08:27:45PM -0400, Andrew Dunstan wrote:
 
 Attached is a WIP possible replacement for pgindent. Instead of a
 shell script invoking a mishmash of awk and sed, some of which is
 pretty impenetrable, it uses a single engine (perl) to do all the
 pre and post indent processing. Of course, if your regex-fu and
 perl-fu is not up the scratch this too might be impenetrable, but
 all but a couple of the recipes are reduced to single lines, and I'd
 argue that they are all at least as comprehensible as what they
 replace.
 
 Attached also is a diff file showing what it does differently from
 the existing script. I think that these are all things where the new
 script is more correct than the existing script. Most of the changes
 come into two categories:
 
* places where the existing script fails to combine the function
  return type and the function name on a single line in function
  prototypes.
* places where unwanted blank lines are removed by the new script
  but not by the existing script.
 
 Features include:
 
* command line compatibility with the existing script, so you can do:
  find ../../.. -name '*.[ch]' -type f -print | egrep -v -f
  exclude_file_patterns | xargs -n100 ./pgindent.pl typedefs.list
* a new way of doing the same thing much more nicely:
  ./pgindent.pl --search-base=../../.. --typedefs=typedefs.list
  --excludes=exclude_file_patterns
* only passes relevant typedefs to indent, not the whole huge list
* should in principle be runnable on Windows, unlike existing script
  (I haven't tested yet)
* no semantic tab literals; tabs are only generated using \t and
  tested for using \t, \h or \s as appropriate. This makes debugging
  the script much less frustrating. If something looks like a space
  it should be a space.
 
 In one case I used perl's extended regex mode to comment a fairly
 hairy regex. This should probably be done a bit more, maybe for all
 of them.
 
 If anybody is so inclined, this could be used as a basis for
 removing the use of bsd indent altogether, as has been suggested
 before, as well as external entab/detab.

Thirteen months after Andrew posted this WIP, I have restructured and
tested this code, and it is now ready to replace the pgindent shell
script as pgindent.pl, attached.

I have tested this version by re-running the 9.1 and 9.2 pgindent runs
and comparing the output, and it is just like Andrew said --- it is the
same, except for the two improvements he mentioned.

A Perl version of pgindent has several advantages:

*  more portable;  less dependent on utility command variances
*  able to run on Windows, assuming someone makes entab and
   pg_bsd_indent Windows binaries
*  able to fix more limitations of pgindent

I will add documentation about the arguments.

Many thanks to Andrew for his fine work on this.  Any objections?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
#!/usr/bin/perl

use strict;
use warnings;

use Cwd qw(abs_path getcwd);
use File::Find;
use File::Spec qw(devnull);
use File::Temp;
use IO::Handle;
use Getopt::Long;
use Readonly;

# Update for pg_bsd_indent version
Readonly my $INDENT_VERSION = 1.1;
Readonly my $devnull= File::Spec-devnull;

# Common indent settings
my $indent_opts =
  -bad -bap -bc -bl -d0 -cdb -nce -nfc1 -di12 -i4 -l79 -lp -nip -npro -bbb;

# indent-dependant settings
my $extra_opts = ;

my ($typedefs_file, $code_base, $excludes, $indent, $build);

my %options = (
typedefs=s  = \$typedefs_file,
code-base=s = \$code_base,
excludes=s  = \$excludes,
indent=s= \$indent,
build   = \$build,);
GetOptions(%options) || die bad command line;

run_build($code_base) if ($build);

# command line option wins, then first non-option arg,
# then environment (which is how --build sets it) ,
# then locations. based on current dir, then default location
$typedefs_file ||= shift if @ARGV  $ARGV[0] !~ /\\.[ch]$/;
$typedefs_file ||= $ENV{PGTYPEDEFS};

# build mode sets PGINDENT and PGENTAB
$indent ||= $ENV{PGINDENT} || $ENV{INDENT} || pg_bsd_indent;
my $entab = $ENV{PGENTAB} || entab;

# no non-option arguments given. so do everything in the current directory
$code_base ||= '.' unless @ARGV;

# if it's the base of a postgres tree, we will exclude the files
# postgres wants excluded
$excludes ||= $code_base/src/tools/pgindent/exclude_file_patterns
  if $code_base  -f $code_base/src/tools/pgindent/exclude_file_patterns;

# globals
my @files;
my $filtered_typedefs_fh;


sub check_indent
{
system(entab  $devnull);
if ($?)
{
print STDERR
Go to the src/tools/entab directory and do 'make' and 'make install'.\n,
  This will put the 'entab' command in your path.\n,
  Then run $0 again.\n;

Re: [HACKERS] WIP Patch: Use sortedness of CSV foreign tables for query planning

2012-08-02 Thread Etsuro Fujita
Hi Kevin,

  I would like to propose single-column sortedness options and
  insert appropriate pathkeys into the ForeignPath node based on
  these information:

 For character-based columns, don't we need to know the collation?

I forgot to add the collate option.  I'll add it at the next version of the
patch.

Thanks,

Best regards,
Etsuro Fujita


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Word-smithing doc changes

2012-08-02 Thread Bruce Momjian
On Wed, Nov 30, 2011 at 09:47:40PM -0800, Greg Smith wrote:
 On 11/30/2011 10:20 AM, Greg Stark wrote:
 Given your confusion it's clear that we have to explain that it will
 wait one by one for each transaction that was started before the index
 was created to finish.
 
 When the index was created is a fuzzy thing though.  It looked to me
 like it makes this check at the start of Phase 2.  If I read when
 the index was created in the manual, I would assume that meant the
 instant at which CREATE INDEX CONCURRENTLY started.  I don't think
 that's actually the case though; it's actually delayed to the
 beginning of Phase 2 start, which can easily be hours later for big
 indexes.  Please correct me if I'm reading that wrong.
 
   I don't think we need to explain how that's
 implemented. If we do it should be set aside in some way, something
 like (see virtual transaction id locks inhref...).
 
 Fair enough.  There is this wording in the pg_locks documentation:
 When one transaction finds it necessary to wait specifically for
 another transaction, it does so by attempting to acquire share lock
 on the other transaction ID (either virtual or permanent ID
 depending on the situation). That will succeed only when the other
 transaction terminates and releases its locks.
 
 Linking to that instead of trying to duplicate it is a good idea.
 Another good, related idea might be to expand the main Concurrency
 Control chapter to include this information.  When I re-read
 Explicit Locking again:
 http://developer.postgresql.org/pgdocs/postgres/explicit-locking.html
 it strikes me it would be nice to add Transaction Locks as an full
 entry there.  The trivia around how those work is really kind of
 buried in the pg_locks section.
 
 I just want to keep in mind that the reader here is trying to
 understand how to use create index concurrently, not understand how
 Postgres's locking infrastructure works in general.
 
 To the extent they can be ignorant of the locking infrastructure,
 that's true.  This operation is complicated enough that I don't
 think we can hide too many of the details from the users, while
 still letting them assess the risk and potential duration
 accurately.

The concurrent index documentation under discussion above was never
updated, so I took a stab at it, attached.

Greg, I looked at adding a mention of the virtual transaction wait to
the explicit-locking section as you suggested, and found those were
all user-visible locking, while this is internal locking.  I did find a
clear description of transaction id locking in the pg_locks system view
docs, so I just referenced that.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
new file mode 100644
index 2ab6470..fe2debf
*** a/doc/src/sgml/ref/create_index.sgml
--- b/doc/src/sgml/ref/create_index.sgml
*** CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ]
*** 395,408 
  
 para
  In a concurrent index build, the index is actually entered into the
! system catalogs in one transaction, then the two table scans occur in a
! second and third transaction.  All active transactions at the time the
! second table scan starts, not just ones that already involve the table,
! have the potential to block the concurrent index creation until they
! finish.  When checking for transactions that could still use the original
! index, concurrent index creation advances through potentially interfering
! older transactions one at a time, obtaining shared locks on their virtual
! transaction identifiers to wait for them to complete.
 /para
  
 para
--- 395,407 
  
 para
  In a concurrent index build, the index is actually entered into the
! system catalogs in one transaction, then the two table scans occur in
! second and third transactions.  Any transaction active when the
! second table scan starts, not just those that have already referenced
! the table, can block concurrent index creation until it completes.
! When waiting for all old transactions, concurrent index creation
! serially waits for each old transaction to complete, as outlined in
! section xref linkend=view-pg-locks.
 /para
  
 para

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers